Recurrent Neural Network (RNN)

RNN is a marriage of Neural Network(NN) and Hidden Markov Model (HMM). I did not learn about RNN but only NN and HMM back in college but it is quite easy to understand with the background knowledge of NN and HMM. I am learning RNN through Natural Language Processing (NLP) problem.

Firstly, I have modularized my Neural Network code from the previous post such that I only have to provide a Layer class with input, output, error and gradient definition

In NLP, it is common to use Softmax standardization and Cross Entropy (CE) loss function. Softmax standardization basically turns values into probabilities and CE measures the difference between 2 vectors, i.e.: In our case labels and probabilities. With values turned into probabilities through Softmax, we could understand better the magnitude of the distance. An optimal model could be identified using Maximum Likelihood. Maximum Likelihood involves multiplications and could run into numerical stability problem when there are a lot of values to multiply with. Hence, Log Likelihood would be the better alternative. Since CE penalizes misclassifications better than Squared Error (SE) due underlying definition, CE is used instead of SE. Referring to the previous post, CE hence will be SE whereas Softmax would be a function between “gate” and output of the unit.

\begin{aligned} E_{total} = -\frac{1}{C}\sum_{c}y_c\log\hat{y}_c \end{aligned}
\begin{aligned} \frac{\partial E_{total}}{\partial w_{h_{11}z_1}} = \frac{\partial E_{total}}{\partial \hat{y}_1}\frac{\partial \hat{y}_1}{\partial \hat{y}_{rel, 1}}\frac{\partial \hat{y}_{rel, 1}}{\partial z_1}\frac{\partial z_1}{\partial w_{h_{11}z_1}} \\ \end{aligned}

where

\begin{aligned} \frac{\partial E_{1}}{\partial \hat{y}_1} = \frac{1}{C} \frac{y_1}{\hat{y}_1} \end{aligned}
\begin{aligned} \hat{y}_{1} = \frac{e^{\hat{y}_{rel, 1}}}{\sum_{c}e^{\hat{y}_{rel, c}}} \end{aligned}
\begin{aligned} & \frac{\partial \hat{y}_1}{\partial \hat{y}_{rel, 1}} & & = \frac{e^{\hat{y}_{rel, 1}}e^{\hat{y}_{rel, 1}} + e^{\hat{y}_{rel, 1}}e^{\hat{y}_{rel, 1}}}{\sum_{c}(e^{\hat{y}_{rel, c}})^2} \\ &&& = \frac{2(e^{\hat{y}_{rel, 1}})^2}{\sum_{c}(e^{\hat{y}_{rel, c}})^2} \\ &&& = 2\frac{e^{\hat{y}_{rel, 1}}}{\sum_{c}e^{\hat{y}_{rel, c}}} \\ &&& = 2\hat{y}_1 \\ \end{aligned}
\begin{aligned} \hat{y}_{rel, 1} = tanhz_1 \end{aligned}
\begin{aligned} & \frac{\partial \hat{y}_{rel, 1}}{\partial z_1} & & = {sech}^2z_1\\ &&& = 1 - {tanh}^2z_1\\ &&& = 1 - {\hat{y}_{rel, 1}}^2\\ \end{aligned}
\begin{aligned} z_1 = w_{h_{1}z_1}y_{h_{1}} \end{aligned}
\begin{aligned} \frac{\partial z_1}{\partial w_{h_{11}z_1}} = y_{h_{11}} \end{aligned}

To be continued….

One thought on “Recurrent Neural Network (RNN)

Leave a Reply