Definition:
yˆ=∑k=1Kfk(x)
In which
fk(x)
is one of K regression trees.
Loss:
Loss=∑i=1nL(yi,yˆi)
Add some regularization:
Loss=∑i=1nL(yi,yˆi)+∑k=1KΩ(fk)
Additive Training:
yˆ(1)=0
yˆ(t)=yˆ(t−1)+ft(x)
Loss(t)=∑i=1nL(yi,yˆ(t)i)+∑k=1tΩ(fk)
=∑i=1nL(yi,yˆ(t−1)i+ft(xi))+∑k=1t−1Ω(fk)+Ω(ft)
=∑i=1nL(yi,yˆ(t−1)i+ft(xi))+Ω(ft)+C
≈∑i=1n[L(yi,yˆ(t−1)i)+ft(xi)∂L∂yˆ(t−1)i+12f2t(xi)∂L2∂yˆ(t−1)i]+Ω(ft)+C
=∑i=1n[L(yi,yˆ(t−1)i)+ft(xi)Gi+12f2t(xi)Hi]+Ω(ft)+C
=∑i=1n[ft(xi)Gi+12f2t(xi)Hi]+Ω(ft)+C′
Loss at time t is:
Loss(t)=∑i=1n[ft(xi)Gi+12f2t(xi)Hi]+Ω(ft)+C′
Use:
ft(x)=wq(x),q:Rd→{1,2,...,M},wi∈R
Ω(f)=12λ∑i=1Mw2j+γM
We get:
Loss(t)=∑i=1n[ft(xi)Gi+12f2t(xi)Hi]+Ω(ft)+C′
=∑i=1n[wq(xi)Gi+12w2q(xi)Hi]+12λ∑j=1Mw2j+γM+C′
With
Ij={i|q(xi)=j}
:
∑i=1nwq(xi)Gi=∑j=1M[wj∑i∈IjGi]
∑i=1n12w2q(xi)Hi=∑j=1Mw2j∑i∈Ij12Hi
So:
Loss(t)=∑j=1M[wj∑i∈IjGi+w2j∑i∈Ij12Hi+12λw2j]+γM+C′
=∑j=1M[wj∑i∈IjGi+12w2j(λ+∑i∈IjHi)]+γM+C′
With
G′j=∑i∈IjGi,H′j=∑i∈IjHi
:
Loss(t)=∑j=1M[wjG′j+12w2j(λ+H′j)]+γM+C′
Finally:
w∗j=argmin(wjG′j+12w2j(λ+H′i))=−G′jλ+H′i
Obj(t)=min(Loss(t))=−12∑j=1MG′2jH′j+λ+γM+C′
So for each iteration t of training, greedily seach for a regression tree
ft(xi)=wq(xi)
with
wj=−G′jλ+H′i
with minumum
Obj(t)
and add it to model.