Tabular data with fastai

‣ → good for tabular data

Kaggle examples and artificial data

Kaggle Tabular Data Playgroud Series

Synthetic Data

Tracking your workflow

As a rule, you also have to take care ti maintain reproducibility and to save all the models (from every fold), the list of parameters used, all the fold predictions, all the out-of-fold predictions, and all predictions from models trained on all the data. You can use a simple .txt file or Excel, but there exists ways that are more sophisticated:

Golden Rules Tabular Data

The most effective models for tabular data still are

  1. LightGBM
  2. XGBoost
  3. Random Forest

If you are extremly worried about overfitting → use Random Forest

If you have a GPU → maybe use XGBoost

Otherwise, use LightGBM