Pandas Pipeline for data preprocessing steps

How to understand your data?

Here are some common preprocessing steps prior to feeding the data to a machine learning model:

Split your set before anything, if possible!

  1. Replace / remove missing data Imputation - (missing data)
  2. Remove / re-scale Scaling variables
  3. Outlier
  4. Encode categorical features Categorical features (kategorische features)
  5. Discretization of continuous features
  6. Normalize numeric feats
  7. Variable transformation
  8. Stratify partitions based on target variable
  9. Resample / rebalance partitions

More in this tweet:

https://twitter.com/rasbt/status/1592969233713201152