Dynamic Learning Rates-Train Your Neural Networks 10 times Faster

Most Machine Learning algorithms today rely on a process known as Stochastic Gradient Descent, or SGD. SGD is an iterative process based on the idea of minimizing a cost function. Roughly speaking, at each step SGD finds the direction to move your weights that most decreases your cost function, then moves your weights by a certain amount in that direction.

That “certain amount” is known as theĀ learning rate, and it’s a key parameter whenever you’re training an algorithm on a large dataset. Set the learning rate too high and your algorithm simply won’t converge, or you’ll suffer from very high bias. Set the learning rate too low, and your algorithm will take forever to train.

The most commonly suggested approach is to start with a high learning rate that decreases over time. You might have an if statement that drops the learning rate by 20% if your model goes through a few iterations where the cost doesn’t drop.

The problem with a purely decreasing learning rate is that you often only need a small learning rate for a few iterations. So a decreasing learning rate ends up being very inefficient. A better method is to have a learning rate that increases by a small amount after each iteration where the cost goes down, and drops sharply any time the cost goes up.

This can easily make your algorithm train 10 times as quickly. For example, I ran one algorithm with 10,000 iterations and a dynamic learning rate starting at .02. After 1,000 iterations the learning rate was at .06. From 1,000-1,100 the learning rate dropped to about .01. Then the learning rate went back up to .12 by iteration 2,000 and steadily increased until stopping at 10,000 with a final cost of .32. Using a static or decreasing-only learning rate, I had to train to 100,000 iterations to get the cost down to .38!

So next time you’re training a Neural Network or other SGD-based algorithm, consider using a dynamically increasing learning rate. It just might turn an overnight task into something that takes half an hour.

Categories: Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *