【Torch-RecHub学习】DIN实现

1. 本文简述

github.com/datawhalech… DataWhale团队发布的开源推荐系统工具包。目前实现了常用的LR、FM、MLP等经典模型的组件，但仍缺少应用GCN的组件。

本文学习使用torch-rechub工具包实现经典的DIN模型。

2. DIN简述

2018年阿里在CCFA类会议KDD上发表

创新点：引入具有局部激活单元（自己设计的类Attention），对序列物品赋予权重学习用户的动态兴趣

创新点：两个训练工业级深度网络的方法(a) mini-batch aware regularizer(b)data adaptive activation function

论文下载：《Deep Interest Network for Click-Through Rate Prediction》

2.1 DIN的base model

上图为DIN的base model

基线做法（Embedding+MLP）：把高维稀疏特征映射为低维的Embedding，然后把Embedding转化为固定长度的向量，拼接这些向量，传入全连接层FC。

缺陷：对于给定的用户，无论候选广告是什么，这个表示向量保持不变，维数有限的用户表示向量会成为表达用户多样性兴趣的瓶颈。单纯扩大Embedding vector的大小会大大增加学习参数的大小。在有限的训练数据下会导致过拟合，对于工业在线系统是不能接受的。

2.2 引入局部激活单元local activation unit

目的：在有限的维度中用一个向量表示用户的不同兴趣

上图为DIN模型，和Activation unit的具体结构

把从Embedding层得到的含用户历史行为的Item_Embedding向量与广告AD_Embedding向量做外积out product，结合原始的Item_Emb和AD_Emb拼接后输入PRelu得到相关性分数 Attention机制的思想，表示两者之间的相关性，但是分数和不为1

2.3 补充

pooling层：把用户的历史行为Embedding变为定长向量，因为FC的输入需要固定长度
concat层：拼接所有的特征Embedding，作为MLP的输入
context features：相关的上下文特征

3.代码实现

class DIN(nn.Module):

    def __init__(self, features, history_features, target_features, mlp_params, attention_mlp_params):
        super().__init__()
        self.features = features
        self.history_features = history_features
        self.target_features = target_features
        self.num_history_features = len(history_features)
        self.all_dims = sum([fea.embed_dim for fea in features + history_features + target_features])

        self.embedding = EmbeddingLayer(features + history_features + target_features)
        self.attention_layers = nn.ModuleList(
            [ActivationUnit(fea.embed_dim, **attention_mlp_params) for fea in self.history_features])
        self.mlp = MLP(self.all_dims, activation="dice", **mlp_params)

    def forward(self, x):
        embed_x_features = self.embedding(x, self.features)  #(batch_size, num_features, emb_dim)
        embed_x_history = self.embedding(
            x, self.history_features)  #(batch_size, num_history_features, seq_length, emb_dim)
        embed_x_target = self.embedding(x, self.target_features)  #(batch_size, num_target_features, emb_dim)
        attention_pooling = []
        for i in range(self.num_history_features):
            attention_seq = self.attention_layers[i](embed_x_history[:, i, :, :], embed_x_target[:, i, :])
            attention_pooling.append(attention_seq.unsqueeze(1))  #(batch_size, 1, emb_dim)
        attention_pooling = torch.cat(attention_pooling, dim=1)  #(batch_size, num_history_features, emb_dim)

        mlp_in = torch.cat([
            attention_pooling.flatten(start_dim=1),
            embed_x_target.flatten(start_dim=1),
            embed_x_features.flatten(start_dim=1)
        ],
                           dim=1)  #(batch_size, N)

        y = self.mlp(mlp_in)
        return torch.sigmoid(y.squeeze(1))


class ActivationUnit(nn.Module):

    def __init__(self, emb_dim, dims=[36], activation="dice", use_softmax=False):
        super(ActivationUnit, self).__init__()
        self.emb_dim = emb_dim
        self.use_softmax = use_softmax
        self.attention = MLP(4 * self.emb_dim, dims=dims, activation=activation)

    def forward(self, history, target):
        seq_length = history.size(1)
        target = target.unsqueeze(1).expand(-1, seq_length, -1)  #batch_size,seq_length,emb_dim
        att_input = torch.cat([target, history, target - history, target * history],
                              dim=-1)  # batch_size,seq_length,4*emb_dim
        att_weight = self.attention(att_input.view(-1, 4 * self.emb_dim)) #(batch_size*seq_length,4*emb_dim)
        att_weight = att_weight.view(-1, seq_length)  #(batch_size*seq_length, 1) -> (batch_size,seq_length)
        if self.use_softmax:
            att_weight = att_weight.softmax(dim=-1)
        # (batch_size, seq_length, 1) * (batch_size, seq_length, emb_dim)
        output = (att_weight.unsqueeze(-1) * history).sum(dim=1)  #(batch_size,emb_dim)
        return output
复制代码