How to start a new ML project?
<aside> 💡 First step when you construct your first model on a new problem
→ make sure your model can overfit!
</aside>
There are models, like Decision Trees, who don’t need much preprocessing. Scaling, dummy variables, outliers,… is not necessary as explained in this fastai tutorial.
<aside> 💡 Don’t overthink data preprocessing!
</aside>
Fast way to drastically improve your data quality:
Many people assume the high loss samples are the only bad apples, but many data anomalies (missing data, encoding errors, etc) can cause ~0 loss as well.
Zuerst muss ich den Datensatz verstehen!