How to understand your data?

Handling missing data is a crucial step in data preprocessing, ensuring robust analyses and accurate model training. Missing data can be highly annoying to deal with. Some algorithms can't deal with it, so your first instinct might just be to impute the values. It turns out though, that sometimes you're able to 'ignore' these missing values. It all depends on the machine learning algorithm that you're interested in using. Vincent explains why tree based algorithms can handle missing data and why it often is better to use them instead of wrong imputation → calmcode video

Untitled

<aside> 💡 Remember, the key is to select a method that aligns with your data characteristics and analysis objectives. And don’t forget to split the data before imputing your values

</aside>

Article - main feature engineering techniques

Untitled

Here are some common methods for imputing missing values:

Untitled