Feature Encoding Videos

Feature Encoding Articles

Category Encoders

When you're working with scikit-learn you'll often need to deal with categorical data. The way you deal with this type of data really matters. Calmcode zeigt, wie wir die kategorischen Features noch besser nutzen, um noch bessere Prognosen zu machen (dirty cat).

Why is discretization useful?

Several regression and classification models, like decision trees and Naive Bayes, perform better with discrete values. Decision trees make decisions based on discrete attribute partitions. A decision tree assesses all feature values while training to determine the ideal cut-point. As a result, the more values the feature has, the longer the training time of the decision tree. Therefore, the discretization of continuous features can speed up the training process.

Feature Encoding Methods in Machine Learning

Binary Encoding

One-Hot Encoding

Label Encoding

Embeddings

Choosing the Right Encoding Method

  1. Consider the nature of your categorical variable:
  2. Consider the cardinality (number of unique values):
  3. Consider your model type: