XGBoost and imbalanced datasets

Application

Handling Imbalanced Classes

Handling Imbalanced Data with Logistic Regression

Libraries

General Strategies

What can we do when working with imbalanced datasets?

👉 undersample the majority class(es)

👉 over-sample the minority class(es)

👉 create synthetic examples

Advice from Santiago (tweet):

<aside> 💡 Avoid any method that artificially changes the distribution of your data. Don't "fix" your imbalanced data.

</aside>