When I interviewed at Amazon they asked me to explain Gradient Descent.

I’m happy where I am now, but let’s talk about the algorithm and how machine learning actually influenced computational physics,

… and give you some fun facts for your next interview!

Gradient Descent? Who's that?!

Gradient descent is the optimisation algorithm at the core of neural networks and SVMs.

The big idea is that we can calculate how wrong the prediction of our neural net is, using our loss.

Then, we can calculate the gradient and propagate it through the network using the chain rule. After all, deep learning is the most lucrative application of the chain rule in the world.

Now, the problem is that we’d have to calculate the loss on all of our data to get the best gradient. Something we figured out in physics a long time ago.

The next evolution of descent

Imagine if you trained chatGPT and had to fit half a trillion(!) samples into RAM to train your chatbot, though…

Instead, we figured out that we can just take small steps calculated on a single sample. That makes the loss incredibly noisy, but suddenly the problem is even calculable!

But we know some outliers might kick us out of our hard-found minimum, so instead, we apply what we call a “learning rate” and only take a fraction of the step the gradient suggests.

That way, we meander to our goal step by step!

Now physics optimisation enjoys SGD

Computational physics actually started adopting stochastic gradient descent! You get initial results much quicker, and you can course-correct them along the way!

We can even use some of the tricks ML uses!

Momentum, for example. If we squint hard enough, using momentum gives a very similar trajectory to the Conjugate Gradient method, which is considered pretty efficient in numerical optimisation. Momentum stabilises our convergence even in tricky optimisation landscapes.

Personally, I find it very fun to learn about how we optimise our models and how it changed through time.

Nowadays, almost everyone uses adaptive gradient methods like Adam. But they’re still siblings of our good ol’ gradient