Part of #684 E.g. - [x] ~gradient descent with [momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum)~ #1460 - [x] ~adaptive learning rate gradient descent, e.g. [AdaGrad](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad)~ #1468 (Note both the links are for stochastic gradient descent, but we'd be implementing a simpler, non-stochastic one) Won't be brilliant, but can be very informative, so largely an education thing this (or just fun)