<aside> 💡 What we would like is some kind of function that is so flexible that it could be used to solve any given problem, just by varying its weights. Amazingly enough, this function actually exists! It's the neural network, which we already discussed. That is, if you regard a neural network as a mathematical function, it turns out to be a function which is extremely flexible depending on its weights. A mathematical proof called the universal approximation theorem shows that this function can solve any problem to any level of accuracy, in theory. For any arbitrarily function, we can approximate it as a bunch of lines joined together; to make it closer to a wiggly function, we just have to use shorter lines. This is known as the universal approximation theorem.
</aside>
Details zur Aussage oben → Claude
<aside> 💡
A larger (more layers and parameters) version of a deep learning architecture model will always be able to give us better training loss, but it can suffer more from overfitting, because it has more parameters to overfit with.
</aside>