Pandas Pipeline for data preprocessing steps
How to understand your data?
Here are some common preprocessing steps prior to feeding the data to a machine learning model:
Split your set before anything, if possible!
- Replace / remove missing data Imputation - (missing data)
- Remove / re-scale Scaling variables
- Outlier
- Encode categorical features Categorical features (kategorische features)
- Discretization of continuous features
- Normalize numeric feats
- Variable transformation
- Stratify partitions based on target variable
- Resample / rebalance partitions
More in this tweet:
https://twitter.com/rasbt/status/1592969233713201152