Boosted Tree

Definition:

y ^ = k = 1 K f k ( x )

In which f k ( x ) is one of K regression trees.

Loss:

L o s s = i = 1 n L ( y i , y ^ i )

Add some regularization:

L o s s = i = 1 n L ( y i , y ^ i ) + k = 1 K Ω ( f k )

Additive Training:

y ^ ( 1 ) = 0

y ^ ( t ) = y ^ ( t 1 ) + f t ( x )

L o s s ( t ) = i = 1 n L ( y i , y ^ i ( t ) ) + k = 1 t Ω ( f k )

= i = 1 n L ( y i , y ^ i ( t 1 ) + f t ( x i ) ) + k = 1 t 1 Ω ( f k ) + Ω ( f t )

= i = 1 n L ( y i , y ^ i ( t 1 ) + f t ( x i ) ) + Ω ( f t ) + C

i = 1 n [ L ( y i , y ^ i ( t 1 ) ) + f t ( x i ) L y ^ i ( t 1 ) + 1 2 f t 2 ( x i ) L 2 y ^ i ( t 1 ) ] + Ω ( f t ) + C

= i = 1 n [ L ( y i , y ^ i ( t 1 ) ) + f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C

= i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C

Loss at time t is:

L o s s ( t ) = i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C

Use:

f t ( x ) = w q ( x ) , q : R d { 1 , 2 , . . . , M } , w i R

Ω ( f ) = 1 2 λ i = 1 M w j 2 + γ M

We get:

L o s s ( t ) = i = 1 n [ f t ( x i ) G i + 1 2 f t 2 ( x i ) H i ] + Ω ( f t ) + C

= i = 1 n [ w q ( x i ) G i + 1 2 w q ( x i ) 2 H i ] + 1 2 λ j = 1 M w j 2 + γ M + C

With I j = { i | q ( x i ) = j } :

i = 1 n w q ( x i ) G i = j = 1 M [ w j i I j G i ]

i = 1 n 1 2 w q ( x i ) 2 H i = j = 1 M w j 2 i I j 1 2 H i

So:

L o s s ( t ) = j = 1 M [ w j i I j G i + w j 2 i I j 1 2 H i + 1 2 λ w j 2 ] + γ M + C

= j = 1 M [ w j i I j G i + 1 2 w j 2 ( λ + i I j H i ) ] + γ M + C

With G j = i I j G i , H j = i I j H i :

L o s s ( t ) = j = 1 M [ w j G j + 1 2 w j 2 ( λ + H j ) ] + γ M + C

Finally:

w j = a r g m i n ( w j G j + 1 2 w j 2 ( λ + H i ) ) = G j λ + H i

O b j ( t ) = m i n ( L o s s ( t ) ) = 1 2 j = 1 M G j 2 H j + λ + γ M + C

So for each iteration t of training, greedily seach for a regression tree f t ( x i ) = w q ( x i ) with w j = G j λ + H i with minumum O b j ( t ) and add it to model.

猜你喜欢

转载自blog.csdn.net/gaofeipaopaotang/article/details/81392449
今日推荐