Derivative of Softmax Loss Function

Derivative of Softmax Loss Function

A softmax classifier:
\[ p_j = \frac{\exp{o_j}}{\sum_{k}\exp{o_k}} \]
It has been used in a loss function of the form
\[ L = - \sum_{j} y_j \log p_j \]
where o is a vector. We need the derivative of \(L\) with respect to \(o\). We can get the partial of \(o_i\) :
\[ \frac{\partial{p_j}}{\partial{o_i}} = p_i (1-p_i), \quad i = j \\ \frac{\partial{p_j}}{\partial{o_i}} = - p_i p_j, \quad i \ne j \]
Hence the derivative of Loss with respect to \(o\) is:

\[ \begin{align} \frac{\partial{L}}{\partial{o_i}} & = - \sum_k y_k \frac{\partial{\log p_k}}{\partial{o_i}} \\ & = - \sum_k y_k \frac{1}{p_k} \frac{\partial{p_k}}{\partial{o_i}} \\ & = -y_i(1-p_i) - \sum_{k\ne i} y_k \frac{1}{p_k} (-p_kp_i) \\ & = -y_i + y_i p_i + \sum_{k\ne i} y_k p_i \\ & = p_i (\sum_k y_k) - y_i \\ \end{align} \]
Given that \(\sum_k y_k = 1\) as \(y\) is a vector with only one non-zero element, which is 1. By other words, this is a classification problem.
\[ \frac{\partial L}{\partial o_i} = p_i - y_i \]

Reference

Derivative of Softmax loss function

猜你喜欢

转载自www.cnblogs.com/fengyubo/p/10572642.html
今日推荐