Summary

We start with a very small tree (just a single leaf). We calculate residuals. Then we try not to predict the target variable, but the residuals of the previous tree. Each tree is scaled by the learning rate. With each tree the residuals become smaller (link). Vincent explains it with more Code here:

Untitled

Checkout this video to understand ‣!!!

Compare Gradient Boost with AdaBoost

AdaBoost

See AdaBoost

AdaBoost starts by making a stump (link). It uses an iterative approach to learn from the mistakes of weak classifiers. AdaBoost scales the trees: The larger the stump, the better the performance of this stump.

Untitled

AdaBoost continues to make stumps until it has made the number of stumps you asked for, or it has a perfect fit.

Gradient Boost

Starts by making a single leaf, instead of a tree or stump (link). This leaf represents an initial guess (e.g. the average) for the Weights (see dataset) of all the samples. Then Gradient Boost builds a tree based on the errors.

Untitled

Build the first tree to predict weight

We start with building a first tree (which is just a single leaf). We calculate the residuals of this tree.

Untitled

The next thing we do is build a tree based on the errors (residuals) from the first tree. The trick is that we try to predict the residuals instead of the original weights (link)! The tree for this is a bit longer then the one we started with. But its still very simple, as you can see in the video.