Generalizations of backpropagation exist for other artificial neural networks (ANNs), and for functions generally — a class of algorithms referred to generically as “backpropagation”.
In fitting a neural network, backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually.
This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used.
In forward propagation we apply sigmoid activation function to get an output between 0 and 1, if Z<0.5 then neurons will not get activated, else activate.
In back-propagation if the predicted y=1 but the actual y=0 then our neural network is wrong and loss=1, to minimize the loss we adjust the weights so y-hat=y and loss=0 (slope).
Gradient descent is on of the best optimizers in back-propagation to update the weights and minimize the loss function on each iteration (epoch). (with momentum, mini batch, RMS prop, Adam).
Hyperparameter optimization techniques to select the appropriate learning rate 0.001 run forward and backward propagation simultaneously until cost function is not getting minimized.