Gradient Descent

Contours

Untitled

maximum rate of change is always perpendicular to contours

Momentum-based Gradient Descent

Intuition

If I am repeatedly being asked to move in the same direction then I should probably gain some confidence and start taking bigger steps in that direction.

Just as a ball gains momentum while rolling down a slope.

Update Rule

$$ u_t = \beta u_{t-1} + \Delta w_t \\ u_0 = \Delta w_0, u_{-1} = 0 \\ w_{t+1} = w_t - \eta u_t \\ \text{with, } u_{-1} = 0, w_0 = rand() \text{ and } 0 \le \beta < 1 $$

In addition to the current update, also look at history of updates.

Untitled

Code

Observations