How does L2 regularization (weight decay) work? → fastai explanation
Apply L2 regularization and explain it → fastai video
Ridge Regression fundamentals explained in this calmcode video

<aside> 💡 Ridge and Lasso Regularization are frequently combined to get the best of both worlds.

</aside>

Ridge in Simple Words:

Ridge is like a teacher who discourages students from showing off too much. When a feature tries to become too influential in predicting outcomes, Ridge pulls it back, making sure no single feature dominates the model. Unlike Lasso, Ridge doesn't completely eliminate features - it just makes them less extreme. It keeps all features in the model but reduces their impact. This works particularly well when many features are somewhat useful and work together, especially when features are related to each other (correlated).

Ridge vs. Lasso

Ridge works better when most of the variables are useful and Lasso works better when a lot of the variables are useless. A combination of both can be the best of both worlds. More details on that in StatQuest p.176.

Ridge vs. Lasso

The Problem

How does it work?

General comments