- Random Forest gives you in most cases a better result than Decision Trees
- Random Forests only care about order. No scaling and complex preprocessing needed. Makes them good to quickly build a first model, as explained here. Additionally they are hard to mess up and give good insides. Especially useful to tabular data.
- One particularly nice feature of random forests is they can tell us which independent variables were the most important in the model, as explained here. Remove low-important features, see here.
- Explain Random Forest in the context of bagging. Choose a random set of both rows AND columns in each Decision Trees.
- Adding more trees and check if accuracy improves.
- rule of thumb: not more then 100 trees
- A deep model interpretation is possible. RF tells you how confident you can be, which columns are the best, how do predictions vary, how can we explain a particular prediction,…
- Partial Dependence: What is the relationship between a column and the target variable?
- Explainability with Tree Interpreter.
- Can you overfit a RF? Too deep trees can make RF overfit. More trees can not make the tree overfit. You should make sure that you have enough trees.
- Does adding more predictors (trees) to Random Forest increase the risk of overfitting?
- No! That is the beauty of this algorithm. It is extremely robust to overfitting - training more models on subsets of the training set and taking their mean does not lead to overfitting but generally delivers improved performance (beyond a certain point the gains become very small or cease all together).
- The limitations of Random Forrest! → Out of domain data!