Recommendation multitasking learning -ESMM

This article describes Ali papers published in SIGIR'18 ESMM "Entire Space Multi-Task Model : An E ff ective Approach for Estimating Post-Click Conversion Rate". This paper proposes the use of multi-task learning solutions CVR (conversion rate) estimated time of sample selection bias and data sparseness problem.

  • background

In the recommendation system, applications such as online advertising, CVR estimate is more important than the estimated CTR, CTR estimates focus on CTR estimates that predict users will not click, but the ultimate goal is to consume user clicks. Traditional estimates CVR similar tasks usually estimated CTR of technology, however, there are two major problems with this approach: 1) sample selection bias; 2) training data sparseness

1. Sample selection bias

Is transformed after each click, traditional CVR prediction model trained on the clicked data, but using the whole sample space Figure in the reasoning. Training samples and the actual data distribution does not obey the same, do not meet the assumptions of machine learning in the training data and test data independent and identically distributed. Intuitively, the user will have transformed not always been user click operation, if only to sample the click training, will lead CVR study be biased. Specific results may be found in the original paper [1]

2. Training sparse data

Obviously the training data sparseness problem, click the sample accounts for only a small part of the entire sample space, and less transformed sample, highly sparse training data makes learning model becomes quite difficult.

  • ESMM

First clear CTR, CVR, CTCVR. CTR represents hits, CVR indicate assume conversion after the user clicks, CTCVR means that the user clicks and successfully converted.

How to solve the problem? ESMM introduced clickthrough rate (CTR) and click conversion rate (CTCVR) as a secondary task, the CVR as an intermediate variable. Their relationship is as follows
\ [\ underbrace {p (y = 1, z = 1 | x)} _ {p CTCVR} = \ underbrace {p (y = 1 | x)} _ {p CTR} \ times \ underbrace { p (z = 1 | y =
1, x)} _ {p CVR} \] As can be seen, \ (the pCTR \) and \ (pCTCVR \) is obtained over the entire study sample space, but a different label, \ (pCVR \) is an intermediate variable, this would resolve the problem of sample selection bias. Model architecture follows

As can be seen in ESMM in, CVR and CTR tasks share Embedding parameters. This parameter sharing mechanism in the ESMM CVR network can never clicked sample study, to some extent alleviated the problem of data sparsity.

至此,两个问题已经被解决了,下面看下ESMM如何学习,模型的损失函数如下:
\[ \begin{aligned}L\left(\theta_{c v r}, \theta_{c t r}\right) &=\sum_{i=1}^{N} l\left(y_{i}, f\left(x_{i} ; \theta_{c t r}\right)\right)+\sum_{i=1}^{N} l\left(y_{i} \& z_{i}, f\left(x_{i} ; \theta_{c t r}\right) \times f\left(x_{i} ; \theta_{c v r}\right)\right)\end{aligned} \]
第一项是CTR预估的损失,点击label为1否则为0;第二项是CTCVR预估的损失,点击且转化的label为1,否则为0。

这里还有一个问题,既然\(pCTCVR\)可以由\(pCTR\)\(pCVR\)相乘得到,那么从理论上说,转化为除法也可以,即
\[ p(z=1 | y=1, x)=\frac{p(y=1, z=1 | x)}{p(y=1 | x)} \]
这样可以分别训练两个模型,CTR和CVCTR,最终也可以得到CVR。论文对这种情况做了实验分析,发现实际上\(pCTR\)的值较小,会引起数值不稳定。

  • discussion
  1. 巧妙的引入了两个辅助任务解决CVR预估,解决了样本选择偏置问题,共享参数可以缓解数据稀疏
  2. 多任务学习在其他领域已经有很多应用,但在推荐中,更应该结合具体的数据和任务设计。例如,在跨领域任务中,不同领域的环境样本是不一样的,但用户信息可能有重复,那么针对这种场景怎么设计多任务学习,样本信息的利用需要更多得考虑。

references:

[1] Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. SIGIR (2018).

[2] https://github.com/alibaba/x-deeplearning/wiki/%E5%85%A8%E7%A9%BA%E9%97%B4%E5%A4%9A%E4%BB%BB%E5%8A%A1%E6%A8%A1%E5%9E%8B(ESMM)

Guess you like

Origin www.cnblogs.com/gongyanzh/p/12162480.html