SVM(四):支持向量回归

4 支持向量回归

4.1 问题定义

给定训练样本 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x m , y m ) } , y i R D= \{(\boldsymbol x_1,y_1), (\boldsymbol x_2,y_2), ..., (\boldsymbol x_m,y_m)\},y_i \in \Bbb R ,希望学得形如 f ( x ) = w T x + b f(\boldsymbol x) = \boldsymbol w^T \boldsymbol x + b 的模型,使 f ( x ) y f(\boldsymbol x)与y 尽可能接近,其中 w \boldsymbol w b \boldsymbol b 是待确定参数。

支持向量回归(SVR,Support Vector Regression)假设能够容忍 f ( x ) f(\boldsymbol x) y y 之间最多 ϵ \epsilon 的偏差,即仅当 f ( x ) f(\boldsymbol x) y y 之间的差别绝对值大于 ϵ \epsilon 时才计算损失。这相当于以
f ( x ) = w T x + b f(\boldsymbol x) = \boldsymbol w^T \boldsymbol x + b 为中心,构建了一个宽带 2 ϵ 2\epsilon 的间隔带,若训练样本落入此间隔带,则被认为预测正确。
在这里插入图片描述
SVR优化目标如下:
min      1 2 w 2 + C i = 1 m l ϵ ( y i ( w T x i + b ) 1 ) \min\;\; \frac{1}{2}||\boldsymbol w||^2 +C\sum\limits_{i=1}^{m} {l}_{\epsilon} (y_i(\boldsymbol w^T \boldsymbol x_i + b)-1)

其中 C > 0 C>0 是正则化常数, l ϵ {l}_{\epsilon} ϵ \epsilon -不敏感损失( ϵ \epsilon -insensitive loss)损失函数:
l ϵ ( z ) = { 0 i f z ϵ z ϵ o t h e r w i s e l_{\epsilon}(z) = \begin{cases} 0 & {if } |z| \leq \epsilon \\ |z|-\epsilon & {otherwise } \end{cases}
在这里插入图片描述

4.2 对偶问题

引入松弛变量 ξ i > 0 , ξ i > 0 \xi_i^{\lor}>0, \xi_i^{\land}>0 (两边松弛变量可能不同),优化目标变为:
min    1 2 w 2 2 + C i = 1 m ( ξ i + ξ i ) s . t .      y i ( w T x i + b ) ϵ + ξ i ,    w T x i + b y i    ϵ + ξ i , ξ i 0 ,      ξ i 0    ( i = 1 , 2 , . . . , m ) \begin{aligned} \min & \; \frac{1}{2}||w||_2^2 + C\sum\limits_{i=1}^{m}(\xi_i^{\lor}+ \xi_i^{\land}) \\ s.t. \;\; &y_i - (\boldsymbol w^T \boldsymbol x_i + b) \leq \epsilon + \xi_i^{\land}, \\& \; \boldsymbol w^T \boldsymbol x_i + b - y_i \; \leq \epsilon + \xi_i^{\lor}, \\&\xi_i^{\lor} \geq 0, \;\; \xi_i^{\land} \geq 0 \;(i = 1,2,..., m) \end{aligned}

引入拉格朗日乘子 μ i 0 , μ i 0 , α i 0 , α i 0 \mu_i^{\lor} \geq 0, \mu_i^{\land} \geq 0, \alpha_i^{\lor} \geq 0, \alpha_i^{\land} \geq 0

L ( w , b , α , α , ξ i , ξ i , μ , μ ) = 1 2 w 2 2 + C i = 1 m ( ξ i + ξ i ) i = 1 m μ i ξ i i = 1 m μ i ξ i + i = 1 m α i ( f ( x i ) y i ϵ ξ i ) + i = 1 m α i ( y i f ( x i ) ϵ ξ i ) L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi_i^{\lor}, \boldsymbol \xi_i^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land}) = \frac{1}{2}||w||_2^2 + C\sum\limits_{i=1}^{m}(\xi_i^{\lor}+ \xi_i^{\land}) - \sum\limits_{i=1}^{m}\mu_i^{\lor}\xi_i^{\lor} - \sum\limits_{i=1}^{m}\mu_i^{\land}\xi_i^{\land}+ \sum\limits_{i=1}^{m}\alpha_i^{\lor}(f(\boldsymbol x_i)-y_i -\epsilon - \xi_i^{\lor}) + \sum\limits_{i=1}^{m}\alpha_i^{\land}(y_i - f(\boldsymbol x_i) -\epsilon - \xi_i^{\land}) 其中,
f ( x i ) = w T x i + b f(\boldsymbol x_i) = \boldsymbol w^T \boldsymbol x_i + b

优化目标 min w , b , ξ i , ξ i    max μ i , μ i , α i , α i    L ( w , b , α , α , ξ , ξ , μ , μ ) \min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; \max_{\mu_i^{\lor}, \mu_i^{\land}, \alpha_i^{\lor}, \alpha_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})

满足KTT条件,对偶问题为: max μ i , μ i , α i , α i    min w , b , ξ i , ξ i    L ( w , b , α , α , ξ , ξ , μ , μ ) \max_{\mu_i^{\lor}, \mu_i^{\land}, \alpha_i^{\lor}, \alpha_i^{\land}}\; \min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})
首先通过对 w , b , ξ i , ξ i \boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land} 求偏导,计算极小值:
L w = 0    w = i = 1 m ( α i α i ) x i \frac{\partial L}{\partial \boldsymbol w} = 0 \;\Rightarrow \boldsymbol w = \sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})\boldsymbol x_i L b = 0    i = 1 m ( α i α i ) = 0 \frac{\partial L}{\partial b} = 0 \;\Rightarrow \sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0 L ξ i = 0    C = α + μ \frac{\partial L}{\partial \xi_i^{\lor}} = 0 \;\Rightarrow C = \alpha^{\lor} + \mu^{\lor} L ξ i = 0    C = α + μ \frac{\partial L}{\partial \xi_i^{\land}} = 0 \;\Rightarrow C= \alpha^{\land}+ \mu^{\land}

代回至
L ( w , b , α , α , ξ , ξ , μ , μ ) L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land})

min w , b , ξ i , ξ i    L ( w , b , α , α , ξ , ξ , μ , μ ) = i = 1 m y i ( α i α i ) ϵ ( α i + α i ) 1 2 i = 1 m j = 1 m ( α i α i ) ( α j α j ) x i T x j s . t .    i = 1 m ( α i α i ) = 0 0 α i , α i C    ( i = 1 , 2 , . . . m ) \begin{aligned} \min_{\boldsymbol w,b,\xi_i^{\lor}, \xi_i^{\land}}\; L(\boldsymbol w,b,\boldsymbol \alpha^{\lor}, \boldsymbol \alpha^{\land}, \boldsymbol \xi^{\lor}, \boldsymbol \xi^{\land}, \boldsymbol \mu^{\lor}, \boldsymbol \mu^{\land}) & = \sum\limits_{i=1}^{m}y_i(\alpha_i^{\land}- \alpha_i^{\lor}) - \epsilon(\alpha_i^{\land} + \alpha_i^{\lor})- \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})(\alpha_j^{\land} - \alpha_j^{\lor}) \boldsymbol x_i^T \boldsymbol x_j \\ s.t. \; &\sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0 \\ & 0 \leq \alpha_i^{\lor},\alpha_i^{\land} \leq C \; (i =1,2,...m) \end {aligned}

原问题最终转换为如下形式的对偶问题:
max α , α      i = 1 m y i ( α i α i ) ϵ ( α i + α i ) 1 2 i = 1 m j = 1 m ( α i α i ) ( α j α j ) x i T x j s . t .    i = 1 m ( α i α i ) = 0 0 α i , α i C    ( i = 1 , 2 , . . . m ) \begin{aligned} \max_{\boldsymbol \alpha^{\land},\boldsymbol \alpha^{\lor}}\;\; & \sum\limits_{i=1}^{m}y_i(\alpha_i^{\land}- \alpha_i^{\lor}) - \epsilon(\alpha_i^{\land} + \alpha_i^{\lor})- \frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor})(\alpha_j^{\land} - \alpha_j^{\lor}) \boldsymbol x_i^T \boldsymbol x_j \\ s.t. \; &\sum\limits_{i=1}^{m}(\alpha_i^{\land} - \alpha_i^{\lor}) = 0 \\ & 0 \leq \alpha_i^{\lor},\alpha_i^{\land} \leq C \; (i =1,2,...m) \end {aligned}

此时,优化函数仅有 α , α \boldsymbol \alpha^{\land},\boldsymbol \alpha^{\lor} 做为参数,可采用SMO(Sequential Minimal Optimization)求解,进而得出 w , b \boldsymbol w,b

发布了14 篇原创文章 · 获赞 17 · 访问量 798

猜你喜欢

转载自blog.csdn.net/apr15/article/details/104830038
今日推荐