• Random Forest gives you in most cases a better result than Decision Trees
  • Random Forests only care about order. No scaling and complex preprocessing needed. Makes them good to quickly build a first model, as explained here. Additionally they are hard to mess up and give good insides. Especially useful to tabular data.
  • One particularly nice feature of random forests is they can tell us which independent variables were the most important in the model, as explained here. Remove low-important features, see here.
  • Explain Random Forest in the context of bagging. Choose a random set of both rows AND columns in each Decision Trees.
  • Adding more trees and check if accuracy improves.
    • rule of thumb: not more then 100 trees
  • A deep model interpretation is possible. RF tells you how confident you can be, which columns are the best, how do predictions vary, how can we explain a particular prediction,…
  • Partial Dependence: What is the relationship between a column and the target variable?
  • Explainability with Tree Interpreter.
  • Can you overfit a RF? Too deep trees can make RF overfit. More trees can not make the tree overfit. You should make sure that you have enough trees.
  • Does adding more predictors (trees) to Random Forest increase the risk of overfitting?
    • No! That is the beauty of this algorithm. It is extremely robust to overfitting - training more models on subsets of the training set and taking their mean does not lead to overfitting but generally delivers improved performance (beyond a certain point the gains become very small or cease all together).
  • The limitations of Random Forrest! → Out of domain data!
    • In the context of a Random Forest, how to find out of domain data?