Contents in the post based on the free Coursera Machine Learning course, taught by Andrew Ng.
1. Backpropagation Algorithm
We need a Backpropagation Algorithm to minimize the cost function.
To be more specific, We have to find parameters Θ to minimize J(Θ). And to do so, we will utilize gradient descent or one of the advanced optimization algorithms.
1.1 Need code to compute
- J(Θ)
- (= , ignoring λ; if λ=0)
2. Gradient Descent
2.1 Simultaneous update
To understand the necessity of Backpropagation, I recommend you to remind the process of gradient descent. Look at the equation above, We could find out that we need to calculate the value of . We could know the value of through forward propagation, and we could also find out the value of .
① Forward propagation
② Backpropagation
: element-wise multiplication operation
Ex) In the case of layer L = 4
And there is no because the first layer is the input layer. So there couldn’t be errors.
③ Use accumulator
Ex)
④ Compute