Random Forest - fastai Tricks

Random Forest gives you in most cases a better result than Decision Trees
Random Forests only care about order. No scaling and complex preprocessing needed. Makes them good to quickly build a first model, as explained here. Additionally they are hard to mess up and give good insides. Especially useful to tabular data.
One particularly nice feature of random forests is they can tell us which independent variables were the most important in the model, as explained here. Remove low-important features, see here.
Explain Random Forest in the context of bagging. Choose a random set of both rows AND columns in each Decision Trees.
Adding more trees and check if accuracy improves.
- rule of thumb: not more then 100 trees
A deep model interpretation is possible. RF tells you how confident you can be, which columns are the best, how do predictions vary, how can we explain a particular prediction,…
Partial Dependence: What is the relationship between a column and the target variable?
Explainability with Tree Interpreter.
Can you overfit a RF? Too deep trees can make RF overfit. More trees can not make the tree overfit. You should make sure that you have enough trees.
Does adding more predictors (trees) to Random Forest increase the risk of overfitting?
- No! That is the beauty of this algorithm. It is extremely robust to overfitting - training more models on subsets of the training set and taking their mean does not lead to overfitting but generally delivers improved performance (beyond a certain point the gains become very small or cease all together).
The limitations of Random Forrest! → Out of domain data!
- In the context of a Random Forest, how to find out of domain data?