反向传播(英语:Backpropagation,缩写为 BP )是“误差反向传播”的简称,是一种与最优化方法(如梯度下降法)结合使用的,用来训练人工神经网络的常见方法。该方法对网络中所有权重计算损失函数的梯度。这个梯度会反馈给最优化方法,用来更新权值以最小化损失函数。
假设,你有这样一个网络层

第一层是输入层,包含两个神经元 $i1$,$i2$,和截距项$b1$;第二层是隐含层,包含两个神经元$h1$,$h2$和截距项$b2$,第三层是输出$o1$,$o2$,每条线上标的$wi$是层与层之间连接的权重,激活函数我们默认为 sigmoid 函数。
现在对他们赋上初值,如下图:
其中,
输入数据 $i1=0.05$,$i2=0.10$;
输出数据 $o1=0.01$,$o2=0.99$;
初始权重
$w1=0.15$,$w2=0.20$, $w3=0.25$,$w4=0.30$;
$w5=0.40$,$w6=0.45$, $w7=0.50$,$w8=0.55$;
目标:给出输入数据$i1$,$i2$(0.05 和 0.10),使输出尽可能与原始输出$o1$,$o2$(0.01 和 0.99)接近。
计算神经元$h1$的输入加权和:
net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1
net_{h1} = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775
计算神经元$h1$的输出$o1$:(此处用到激活函数为 sigmoid 函数)
out_{h1} = \frac{1}{1+e^{-net_{h1}}} = 0.5932
同理,可计算神经元 $h2$ 的输出 $o2$
out_{h2} = 0.5968
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
out_{o1} = \frac{1}{1+e^{-net_{o1}}} = 0.7514
同样的,计算神经元 o2 的输出
out_{o2} = 0.7730
接下来,就可以进行反向传播的计算了
E_{total} = E_{o1} + E_{o2}
分别计算$o1$,$o2$的误差
E_{o1} = \frac{1}{2} (target_{o1} - out_{o1})^2 = 0.2748
E_{o2} = \frac{1}{2} (target_{o2} - out_{o2})^2 = 0.0235
E_{total} = E_{o1} + E_{o2} = 0.2983
以权重参数$w5$为例,如果我们想知道$w5$对整体误差产生了多少影响,可以用整体误差对$w5$求偏导求出(链式法则)
\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})} + \frac {\partial (net_{o1} )}{\partial (w_{5})}
下面的图可以更直观的看清楚误差是怎样反向传播的

我们分别计算每个式子的值:
计算 $\frac {\partial (E_{total} )}{\partial (out_{o1})}$
E_{total} = \frac {1}{2}(target_{o1} - out_{o1} )^2 +\frac {1}{2}(target_{o2} - out_{o2} )^2
\frac {\partial (E_{total} )}{\partial (out_{o1})} = - (target_{o1} - out{o1} ) = 0.7414
计算 $ \frac {\partial ( out_{o1} )}{\partial (net_{o1})} $
out_{o1} = \frac{1}{1+e^{-net_{o1}}}
\frac {\partial ( out_{o1} )}{\partial (net_{o1})} = out_{o1}(1 - out_{o1} ) = 0.1868
计算 $ \frac {\partial ( net_{o1} )}{\partial (w_{5})}$
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
\frac {\partial ( net_{o1} )}{\partial (w_{5})} = out_{h1} = 0.5932
最后三者相乘
\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (w_{5})} = 0.082
看看上面的公式,我们发现:
\frac {\partial (E_{total} )}{\partial (w_{5})} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})*out_{h1}
为了表达方便,用$\delta _{o1}$来表示输出层的误差
\delta _{o1} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})}
\delta _{o1} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})
\frac {\partial (E_{total} )}{\partial (w_{5})} = \delta _{o1} *out_{h1}
更新$w_5$的值:
w_5^+ = w_5 - \eta * \frac {\partial (E_{total} )}{\partial (w_{5})} = 0.3589
同理,更新 $w_6$,$w_7$,$w_8$
w_6^+ = 0.4086
w_7^+ = 0.5113
w_8^+ = 0.5614
我们可以依照上述的方法计算 $w_1$, $w_2$, $w_3$, $w_4$,方法其实与上面说的差不多,但是有个地方需要变一下。
在上文计算总误差对 w5 的偏导时,是从:
$out_{o1}$ -> $net_{o1}$ -> $w_5$
但是在隐含层之间的权值更新时,是从:
$out_{h1}$ -> $net_{h1}$ -> $w_1$

计算 $\frac {\partial (E_{total} )}{\partial (out_{h1})}$
\frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})}
先计算$\frac {\partial (E_{o1} )}{\partial (out_{h1})}$
\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})}
\frac {\partial (E_{o1} )}{\partial (net_{o1})} = \frac {\partial (E_{o1} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} = 0.1385
net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1
\frac {\partial (net_{o1} )}{\partial (out_{h1})} = w_5= 0.40
\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})} = 0.138 * 0.4 = 0.055
同理,计算出
\frac {\partial (E_{o2} )}{\partial (out_{h1})} = -0.019
两者相加,得到总值
\frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})} = 0.036
再计算 $\frac {\partial (out_{h1} )}{\partial (net_{h1})}$
out_{h1} = \frac{1}{1+e^{-net_{h1}}}
\frac {\partial (out_{h1} )}{\partial (net_{h1})} = out_{h1} *(1-out_{h1}) = 0.2413
再计算$ \frac {\partial (net_{h1} )}{\partial (w_{1})} $
net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1
\frac {\partial (net_{h1} )}{\partial (w_{1})} = i_1 =0.05
最后,三者相乘
\frac {\partial (E_{total} )}{\partial (w_{1})} = \frac {\partial (E_{total} )}{\partial (out_{h1})} * \frac {\partial (out_{h1} )}{\partial (net_{h1})} * \frac {\partial (net_{h1} )}{\partial (w_{1})}
\frac {\partial (E_{total} )}{\partial (w_{1})} = 0.036 * 0.2413 * 0.05 = 0.000438
我们更新$w_1$的值
w_1^+ = w_1 - \eta * \frac {\partial (E_{total} )}{\partial (w_{1})} = 0.1498
同理,更新 $w_2$,$w_3$,$w_4$
w_2^+ = 0.1996
w_3^+ = 0.2498
w_4^+ = 0.2995
这样误差反向传播法就完成了,最后我们再把更新的权值重新计算,不停地迭代.
完整代码( PC 端查看): http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app
—————————————————————————————————————————————————————————————————————— Mo (网址:momodel.cn )是一个支持 Python 的人工智能在线建模平台,能帮助你快速开发训练并部署 AI 应用。期待你的加入。
1
nical 2019-01-21 19:23:54 +08:00
厉害了,很有帮助
|
2
MoModel OP @nical 不好意思很多公式都乱码了,请直接用 PC 端打开 http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app 查看源码
|
3
MoModel OP |