机器学习笔记-线性模型

1、基本形式

对于d个属性描述的示例x=(x_1;x_2;\cdots;x_d),其中x_i是x在第i个属性上的取值。线程模型是通过属性的线性组合来进行预测的函数,即

f(x)=w_1x_1+w_2x_2+\cdots+w_dx_d+b

向量形式表示为

                f(x)=w^Tx+b

其中w=(w_1;w_2;\cdots;w_d)

2、线性回归

给定数据集D={(x_1,y_1), (x_2, y_2), \cdots, (x_m, y_m)},其中x_i=(x_{i1};x_{i2};\cdots;x_{id}),y_i\in \mathbb{R}

一元线程回归:

                        f(x_i)=wx_i + b,使得f(x_i)\simeq y_i

通过均方误差来确定w和b,试图让均方误差最小化。

(w^*, b^*)=arg min \sum_{i=1}^m (f(x_i)-y_i)^2 =arg min \sum_{i=1}^m (y_i - wx_i - b)^2

求解w和b使E_{(w,b)}=\sum_{i=1}^m (y_i - wx_i - b)^2最小化过程,称为线性回归模型的最小二乘"参数估计",可以对w和b求导,得到 :

\begin{align*} \frac{\partial{E_{w,b}}}{\partial w} &= \frac{\partial}{\partial w}\left[ \sum_{i=1}^m (y_i - wx_i - b)^2\right ] \\ & = \sum_{i=1}^m \frac{\partial}{\partial w} \left [ (y_i - wx_i - b)^2 \right ] \\ &= \sum_{i=1}^m \left[ 2 \cdot \left(y_i - wx_i - b \right ) \cdot \left(-x_i \right )\right ] \\ &= \sum_{i=1}^m \left[ 2 \cdot \left(wx_i^2 - x_iy_i + bx_i \right ) \right ] \\ &= 2 \cdot \left(w \sum_{i=1}^m x_i^2 - \sum_{i=1}^m x_iy_i + b\sum_{i=1}^m x_i\right ) \\ &= 2 \left( w \sum_{i=1}^m x_i^2 - \sum_{i=1}^m (y_i-b) x_i \right ) \end{align*}

\begin{align*} \frac{\partial E_{(w,b)}}{\partial b} &= \frac{\partial}{\partial b} \left [\sum_{i=1}^m\left(y_i - wx_i - b \right )^2 \right ] \\ &= \sum_{i=1}^m \frac{\partial}{\partial b} \left[\left(y_i-wx_i - b \right )^2 \right ] \\ &= \sum_{i=1}^m \left[2 \cdot \left(y_i - wx_i - b \right ) \cdot \left (-1 \right )\right ] \\ &= \sum_{i=1}^m \left[2 \cdot \left(b + wx_i - y_i \right ) \right ] \\ &= 2 \cdot \left(mb + w\sum_{i=1}^m x_i - \sum_{i=1}^m y_i \right ) \\ &= 2 \cdot \left(mb - \sum_{i=1}^m \left(y_i - wx_i \right ) \right ) \end{align*}

令导数为0,有:

b=\frac{1}{m}\sum_{i=1}^m\left(y_i - wx_i \right ) = \bar{y} - w\bar{x}

w\sum_{i=1}^m x_i^2 = \sum_{i=1}^m y_ix_i - \sum_{i=1}^m bx_i

w\sum_{i=1}^m x_i^2 = \sum_{i=1}^m y_ix_i - \sum_{i=1}^m \left(\bar{y} - w \bar{x} \right ) x_i
w\sum_{i=1}^m x_i^2 = \sum_{i=1}^m y_ix_i - \bar{y} \sum_{i=1}^m x_i + w\bar{x}\sum_{i=1}^m x_i

w\left(\sum_{i=1}^m x_i^2 - \bar{x} \sum_{i=1}^m x_i\right ) = \sum_{i=1}^my_ix_i - \bar{y} \sum_{i=1}^m x_i

w=\frac{\sum_{i=1}^m y_ix_i - \bar{y}\sum_{i=1}^m x_i}{\sum_{i=1}^m x_i^2 - \bar{x} \sum_{i=1}^m x_i}

w=\frac{\sum_{i=1}^m y_i \left(x_i-\bar{x} \right )}{\sum_{i=1}^m x_i^2 - \frac{1}{m}\left(\sum_{i=1}^m x_i \right )^2}

更一般的情形是数据集D,样本由d个属性描述。

        f(\boldsymbol {x_i})=\boldsymbol w^T \boldsymbol {x_i} + b,使得f(\boldsymbol {x_i}) \simeq y_i,也称为多元线性回归

同上使用最小二乘法。用\hat {w} = \left(\boldsymbol w;b \right ),数据集 D用m\times \left(d+1 \right )矩阵\boldsymbol {X}表示,其中每行对应一个示例,该行前d个元素对应于示例的d个属性,最后一个元素恒置为1。即

\boldsymbol {X} = \left( \begin{matrix} x_{11} & x_{12} & \cdots & x_{1d} & 1\\\ x_{21} & x_{22} & \cdots & x_{2d} & 1 \\\ \vdots & \vdots & \ddots & \vdots & 1 \\\ x_{m1} & x_{m2} & \cdots & x_{md} & 1 \end{matrix} \right ) =\left( \begin{matrix} \boldsymbol {x_1^T} & 1\\\ \boldsymbol {x_2^T} & 1\\\ \vdots & \vdots \\\ \boldsymbol {x_m^T} & 1 \end{matrix} \right )

y用列向量表示为:

\boldsymbol {y} = \left(y_1;y_2;\cdots;y_m \right ),则最小二乘表示有

\hat {\boldsymbol w}^* = arg min (\boldsymbol y - \boldsymbol X \hat {\boldsymbol w})^T(\boldsymbol y - \boldsymbol X \hat {\boldsymbol w})

E_{\hat w} = \left(\boldsymbol y - \boldsymbol X \hat {\boldsymbol w} \right ) ^T\left(\boldsymbol y - \boldsymbol X \hat {\boldsymbol w} \right ),对\hat {\boldsymbol w}求导。

E_{\hat w} = \left( \boldsymbol y ^T - \hat {\boldsymbol w}^T \boldsymbol X^T\right )\left( \boldsymbol y - \boldsymbol X \hat {\boldsymbol w}\right ) = \boldsymbol y^T \boldsymbol y - \hat {\boldsymbol w}^T\boldsymbol X^T\boldsymbol y - \boldsymbol y ^T\boldsymbol X \hat {\boldsymbol w}+ \hat {\boldsymbol w}^T \boldsymbol X ^T \boldsymbol X \hat {\boldsymbol w},对\hat {\boldsymbol w}求导

\frac {\partial E_{\hat w}}{\partial {\hat {\boldsymbol w}}} = \frac {\partial \boldsymbol y ^T \boldsymbol y}{\partial {\hat {\boldsymbol w}}} - \frac {\partial \hat {\boldsymbol w}^T\boldsymbol X^T\boldsymbol y}{\partial {\hat {\boldsymbol w}}}- \frac {\partial \boldsymbol y ^T \boldsymbol X \hat {\boldsymbol w}}{\partial \hat {\boldsymbol w}} + \frac {\partial \hat {\boldsymbol w}^T \boldsymbol X ^T \boldsymbol X \hat {\boldsymbol w}}{\partial \hat {\boldsymbol w}}

由矩阵微分公式\frac {\partial \boldsymbol a^T \boldsymbol x}{\partial \boldsymbol x} = \frac {\partial \boldsymbol x ^T \boldsymbol a}{\partial \boldsymbol x} = \boldsymbol a\frac {\partial \boldsymbol x^T \boldsymbol A \boldsymbol x}{\partial \boldsymbol x} = \left(\boldsymbol A + \boldsymbol A ^T \right ) \boldsymbol x可知

\frac {\partial E_{\hat w}}{\partial \hat {\boldsymbol w}} = 0 - \boldsymbol X ^T\boldsymbol y -\boldsymbol X ^T\boldsymbol y + \left(\boldsymbol X^T \boldsymbol T + \boldsymbol X^T\boldsymbol X\right )\hat {\boldsymbol w}=2\boldsymbol X^T\left(\boldsymbol X\hat{\boldsymbol w} - \boldsymbol y\right )

令上式等于0,有

\hat {\boldsymbol w}^*=\left(\boldsymbol X^T \boldsymbol X \right )^{-1}\boldsymbol X^T \boldsymbol y

广义线性模型:

y=g^{-1}\left(\boldsymbol w^T \boldsymbol x + b \right )

3、对数几率回归

单位阶跃函数:

y=\begin{cases} 0 & \text{z\textless 0} \\ 0.5 & \text{z = 0} \\ 1 & \text{z\textgreater 0} \end{cases}

对数几率函数:

y=\frac{1}{1+e^{-z}}

是 Sigmoid函数,将z值转化为一个接近0或1的y值。将对数几率函数作为g^{-1}(.)代入广义线性模型:

y=\frac{1}{1+e^{-(\boldsymbol w^T \boldsymbol x + b)}}

ln \frac{y}{1-y} = \boldsymbol w ^T \boldsymbol x + b

将y视为样本x作为正例的可能性,1-y作为其反例可能性,两者的比值

\frac {y}{1-y}称为几率。对几率取对数则得到对数几率。

将y视为类后验概率估计p\left(y=1|\boldsymbol x \right )=\frac{e^{\boldsymbol w^T \boldsymbol x + b}}{1 + e^{\boldsymbol w ^T \boldsymbol x + b}},则1-y=p\left(y=0|\boldsymbol x \right ) = \frac{1}{1 + e^{\boldsymbol w^T \boldsymbol x + b}}

对数似然估计为

                        \ell \left(\boldsymbol w, b \right ) = \sum_{i=1}^m\ln p\left(y_i|\boldsymbol x_i; \boldsymbol w, b \right )

\beta = \left(\boldsymbol w; b \right )\hat {\boldsymbol x} = \left(\boldsymbol x;1 \right ),则\boldsymbol w ^T \boldsymbol x + b = \boldsymbol \beta^T \boldsymbol {\hat x}

p_1\left(\boldsymbol {\hat {x}};\boldsymbol \beta \right ) = p\left(y=1| \boldsymbol {\hat{x}; \boldsymbol \beta} \right )p_0\left(\boldsymbol {\hat {x}};\boldsymbol \beta \right ) = p\left(y=0| \boldsymbol {\hat{x}; \boldsymbol \beta} \right ) = 1-p_1\left(\boldsymbol {\hat x} ; \boldsymbol \beta\right ),由于y等于0或者1,有:

        p\left(y_i|\boldsymbol x_i; \boldsymbol w, b \right ) = y_ip_1\left(\boldsymbol {\hat x_i}; \boldsymbol \beta \right ) + \left(1-y_i \right ) p_0\left(\boldsymbol {\hat x_i} ; \boldsymbol \beta \right )

\ell\left(\boldsymbol \beta \right ) = \sum_{i=1}^m \ln \left( y_ip_1 \left( \boldsymbol {\hat x_i};\boldsymbol \beta \right ) + \left(1 - y_i \right ) p_0 \left( \boldsymbol {\hat x_i}; \boldsymbol \beta \right ) \right ),其中

p_1\left(\boldsymbol {\hat x_i}; \boldsymbol \beta \right ) = \frac{e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}}}{1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}}}p_0\left(\boldsymbol {\hat x_i}; \boldsymbol \beta \right ) = \frac{1}{1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}}}

\begin{align*} \ell \left(\boldsymbol \beta \right ) &= \sum_{i=1}^m \ln \left( \frac{y_i e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}} + 1 - y_i}{1+e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}}} \right ) \\ &= \sum_{i=1}^m \left( \ln \left( y_ie^{\boldsymbol \beta \boldsymbol {\hat x_i}} + 1 - y_i \right ) - \ln \left( 1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}} \right ) \right ) \end{align*}

由于y_i=0或者1,有

\ell \left(\boldsymbol \beta \right ) = \begin{cases} \sum_{i=1}^m \left(-\ln \left(1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}} \right ) \right ) & \text y_i=0 \\ \sum_{i=1}^m\left( \boldsymbol \beta ^T \boldsymbol {\hat x_i} - \ln \left(1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}} \right ) \right ) & \text y_i = 1\end{cases}​​​​​​​

综合可得:

        \ell \left(\boldsymbol \beta \right ) = \sum_{i=1}^m \left( y_i \boldsymbol \beta^T \boldsymbol {\hat x_i} - \ln \left(1 + e^{\boldsymbol \beta ^T \boldsymbol {\hat x_i}} \right ) \right )​​​​​​​

参考资料:

学习机器学习应该看哪些书籍? - 知乎

GitHub - datawhalechina/pumpkin-book: 《机器学习》(西瓜书)公式推导解析,在线阅读地址:https://datawhalechina.github.io/pumpkin-book

猜你喜欢

转载自blog.csdn.net/wuli2496/article/details/120712827
今日推荐