<aside> 💡 Code examples in Python Feature Engineering Cookbook
</aside>
In equal-frequency binning, we sort the data values of a continuous variable into bins that contain the same number of observations. The quantiles are used to determine the bin edges. The resulting intervals may not have equal width, and that’s OK.
For example, if we want to sort the variable income into 5 intervals of equal frequency, we would determine the 20th, 40th, 60th, 80th, and 100th quantiles to find out the limits of the bins.
The beauty of equal-frequency binning is that it improves the value spread of skewed variables.
Equal-frequency binning has some advantages and disadvantages, which I describe below:
As with equal-width discretization, in equal-frequency, the number of bins is determined arbitrarily by the user and its value might need to be optimize to extract the maximum value from a variable.