The Backpropagation algorithm for a math student

A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition o...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of ... International Joint Conference on Neural Networks s. 01 - 09
Hlavní autoři:	Damadi, Saeed, Moharrer, Golnaz, Cham, Mostafa, Shen, Jinglai
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 18.06.2023
Témata:	Complexity theory Computer architecture Convolutional neural networks Iterative algorithms Jacobian matrices Stochastic processes Transformers
ISSN:	2161-4407
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then be represented as the matrix product of these Jacobian matrices. This approach is valid because the chain rule can be applied to a composition of vector-valued functions, and the use of Jacobian matrices allows for the incorporation of multiple inputs and outputs. By providing concise mathematical justifications, the results can be made understandable and useful to a broad audience from various disciplines.
ISSN:	2161-4407
DOI:	10.1109/IJCNN54540.2023.10191596