代码:
$$\begin{
aligned}
KPI&=(N+S)W \\
PI&=N+S \\
I&=W
\end{
aligned}$$
$$\begin{
aligned}
loss&=(y_i-Q(s,a;\theta))^2 \\
&=(r+\gamma \max Q(s^{
'},a^{'};\theta^{
-})-Q(s,a;\theta)) ^2\\
\end{
aligned}$$ $y
效果如下:
K P I = ( N + S ) W P I = N + S I = W \begin{aligned} KPI&=(N+S)W \\ PI&=N+S \\ I&=W \end{aligned} KPIPII=(N+S)W=N+S=W
l o s s = ( y i − Q ( s , a ; θ ) ) 2 = ( r + γ max Q ( s ′ , a ′ ; θ − ) − Q ( s , a ; θ ) ) 2 \begin{aligned} loss&=(y_i-Q(s,a;\theta))^2 \\ &=(r+\gamma \max Q(s^{'},a^{'};\theta^{-})-Q(s,a;\theta)) ^2\\ \end{aligned} loss=(yi−Q(s,a;θ))2=(r+γmaxQ(s′,a′;θ−)−Q(s,a;θ))2