点击量预测(CTR)——FM(Factorization Machines)理论与实践

FM(Factorization Machines)

FM(Factorization Machines)常用于CTR预测,在LR(Logistic Regression)模型的基础上,加上了特征交叉组合,一般用在二维特征交叉。

FM的paper地址如下https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
FM主要特点:在LR模型上,加入了特征交叉组合。
FM模型的优点

  1. 在数据非常稀疏的情况下进行合理的参数估计;
  2. FM可以看作是在LR模型的基础上添加了二维特征交叉组合,其模型的时间复杂度是线性的;
  3. FM是一个通用模型,它可以用于任何特征为实值的情况,如,基于MF、SVD、PTTF、FPMC等。

FM模型缺点

  1. 每个特征只引入了一个隐向量,不同类型特征之间交叉没有区分性。FFM模型正是这一点作为切入点进行改进的。

算法原理

LR(逻辑回归)

在一般的线性模型中,只单独考虑了各个特征,其表达能力不强,没有考虑特征与特征之间的相互联系,无法进行特征交叉、特征筛选等一系列操作。
一般的线性模型公式:
LR公式
为了表述特征间的相关性,增强其表达能力,引入了特征交叉组合,FM模型正是以特征交叉为切入点改进的。

FM(Factorization Machines)

FM模型公式
FM二阶部分的数学形式,与POLY2相比,其主要区别是用两个向量的内积
( w j 1 ∙ w j 2 ) (w_{j1}\bullet w_{j2}) (wj1wj2)取代了单一的权重系数 w h ( j 1 , j 2 ) w_{h(j1,j2)} wh(j1,j2)
FM二阶部分改为如下形式:
FM二阶交叉部分
在本质上,FM引入隐向量的做法,与矩阵分解用隐向量代表用户和物品的做法异曲同工,FM将矩阵分解中单纯的用户、物品隐向量扩展到了所有的特征上。

公式改写

为了使FM能更好地解决数据稀疏性的问题,引入了隐向量的概念。与POLY2相比,FM虽然丢失了某些具体特征组合的精准记忆能力,但是泛化能力大大提升了。
FM二阶部分的公式改写
引入隐向量的好处:

  1. 二阶项的参数量由原来的 n ( n − 1 ) 2 \frac{n(n-1)}{2} 2n(n1)降到kn,提高了模型的推断速度。
  2. 原先参数之间并无关联关系,但是现在通过隐向量可以建立关系。

模型求解

通过梯度下降的方法,求解FM模型里面的参数w0、wi、vi,f。最终模型各参数的梯度表达式如下:
参数求解

代码实现

采取的数据是movielens 100.为了操作的方便,只为了展示FM实现的过程,只选取了uid、itemId作为输入特征,rating作为lable。

数据集

u.item: 电影信息数据

 movie id | movie title | release date | video release date |IMDb URL |unknown | Action | Adventure | Animation |Children's | Comedy | Crime |Documentary | Drama | Fantasy |Film-Noir | Horror | Musical | Mystery |Romance | Sci-Fi |Thriller | War | Western
 
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0

u.user: 用户信息数据

user id | age | gender | occupation | zip code

1|24|M|technician|85711
2|53|F|other|94043
3|23|M|writer|32067

ua.base: 训练数据集
ua.test: 测试数据集

user id | item id | rating | timestamp
1	1	5	874965758
1	2	3	876893171
1	3	4	878542960

数据处理

将uid和itemId使用one-hot编码,将rating作为输出标签,其评分等级为[0-5],大于3为1(表示用户感兴趣)小于3为0(表示用户不感兴趣)。

# 数据加载
def loadData():

    # user信息(只取uid)
    userInfo = pd.read_csv('../data/u.user', sep='\|', names=['uid', 'age', 'gender', 'occupation','zip code'])
    uid_ = userInfo['uid']
    userId_dum = pd.get_dummies(userInfo['uid'], columns=['uid'], prefix='uid_')
    userId_dum['uid']=uid_

    # item信息(只取itemId)
    header = ['item_id', 'title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown', 'Action', 'Adventure', 'Animation', 'Children',
              'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi',
              'Thriller', 'War', 'Western']
    ItemInfo = pd.read_csv('../data/u.item', sep='|', names=header, encoding = "ISO-8859-1")
    ItemInfo = ItemInfo.drop(columns=['title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown'])
    item_id_ = ItemInfo['item_id']
    item_Id_dum = pd.get_dummies(ItemInfo['item_id'], columns=['item_id'], prefix='item_id_')
    item_Id_dum['item_id']=item_id_

    # 训练数据
    trainData = pd.read_csv('../data/ua.base', sep='\t', names=['uid', 'item_id', 'rating', 'time'])
    trainData = trainData.drop(columns=['time'])

    trainData['rating']=trainData.rating.apply(lambda x:1 if int(x)>3 else 0)

    Y_train=pd.get_dummies(trainData['rating'],columns=['rating'],prefix='y_')

    X_train = pd.merge(trainData, userId_dum, how='left')
    X_train = pd.merge(X_train, item_Id_dum, how='left')
    X_train=X_train.drop(columns=['uid','item_id','rating'])


    # 测试数据
    testData = pd.read_csv('../data/ua.test', sep='\t', names=['uid', 'item_id', 'rating', 'time'])
    testData = testData.drop(columns=['time'])

    testData['rating']=testData.rating.apply(lambda x:1 if int(x)>3 else 0)
    Y_test=pd.get_dummies(testData['rating'],columns=['rating'],prefix='y_')

    X_test = pd.merge(testData, userId_dum, how='left')
    X_test = pd.merge(X_test, item_Id_dum, how='left')
    X_test=X_test.drop(columns=['uid','item_id','rating'])

    return X_train.values,Y_train.values,X_test.values,Y_test.values

FM模型


class FM():
    def __init__(self,vec_dim   ,learning_rate ,feature_length):
        """
        初始化参数
        :param vec_dim: 隐藏因子个数
        :param learning_rate: 学习率
        :param feature_length:特征数
        """
        self.vec_dim=vec_dim
        self.learning_rate=learning_rate
        self.feature_length=feature_length
    # 创建输入占位符
    def add_input(self):
        self.X = tf.placeholder(shape=[None, self.feature_length], dtype=tf.float32, name='input_X')
        self.Y = tf.placeholder(shape=[None, 2], dtype=tf.float32, name='input_y')
    # 创建计算规则
    def inference(self):
        with tf.variable_scope('linear_layer'):
            w0 = tf.get_variable(name='w0', shape=[2], dtype=tf.float32)
            self.w = tf.get_variable(name='w', shape=[self.feature_length, 2],dtype=tf.float32)
            self.linear_layer = tf.add(tf.matmul(self.X, self.w) , w0)
        with tf.variable_scope('interaction_layer'):
            self.v = tf.get_variable(name='v', shape=[self.feature_length, self.vec_dim],dtype=tf.float32)
            self.interaction_layer = tf.multiply(0.5,
                                                 tf.reduce_sum(
                                                     tf.subtract(
                                                         tf.pow(tf.matmul(self.X, self.v), 2),
                                                         tf.matmul(self.X, tf.pow(self.v, 2))),
                                                     1, keep_dims=True))
        self.y_out = tf.add(self.linear_layer, self.interaction_layer)


    # 损失函数计算
    def add_loss(self):
        self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=self.Y, logits=self.y_out))

    #计算accuracy
    def add_accuracy(self):
        # accuracy
        self.correct_prediction = tf.equal(tf.cast(tf.argmax(self.y_out,1), tf.float32), tf.cast(tf.argmax(self.Y,1), tf.float32))
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))

    #训练
    def train(self):
        optimizer = tf.train.FtrlOptimizer(self.learning_rate, l1_regularization_strength=2e-2,
                                           l2_regularization_strength=0)
        self.train_op = optimizer.minimize(self.loss)


    #构建图
    def build_graph(self):
        self.add_input()
        self.inference()
        self.add_loss()
        self.add_accuracy()
        self.train()

训练和测试

def train_model(sess, model, X_train,Y_train,batch_size, epochs=100):
    num = len(X_train) // batch_size+1
    for step in range(epochs):
        print("epochs{0}:".format(step+1))
        for i in range(num):
            index = np.random.choice(len(X_train), batch_size)
            batch_x = X_train[index]
            batch_y = Y_train[index]
            feed_dict = {
    
    model.X: batch_x,
                         model.Y: batch_y}
            sess.run(model.train_op, feed_dict=feed_dict)

            # print("Iteration {0}: with minibatch  training loss = {1}"
            #       .format(step+1, loss))

            if (i+1)%100==0:
                loss ,accuracy= sess.run([model.loss,model.accuracy], feed_dict=feed_dict)
                print("Iteration {0}: with minibatch training loss = {1} accuracy = {2}"
                      .format(step+1, loss,accuracy))

def test_model(sess,model,X_test,Y_test,batch_size):

    # num = len(X_test) // batch_size+1
    #
    # for i in range(num):
    #     index = np.random.choice(len(X_test), batch_size)
    #     batch_x = X_test[index]
    #     batch_y = np.transpose([Y_test[index]])
    #
    #     feed_dict = {model.X: batch_x,
    #                  model.Y: batch_y}
    #     y_out,loss= sess.run([model.y_out,model.loss], feed_dict=feed_dict)
    #
    #     print(loss)

        print(sess.run([model.loss], feed_dict={
    
    model.X: X_test, model.Y: Y_test}))

完整代码

import pandas as pd
import numpy as np
import tensorflow as tf

# 数据加载
def loadData():

    # user信息(只取uid)
    userInfo = pd.read_csv('../data/u.user', sep='\|', names=['uid', 'age', 'gender', 'occupation','zip code'])
    uid_ = userInfo['uid']
    userId_dum = pd.get_dummies(userInfo['uid'], columns=['uid'], prefix='uid_')
    userId_dum['uid']=uid_

    # item信息(只取itemId)
    header = ['item_id', 'title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown', 'Action', 'Adventure', 'Animation', 'Children',
              'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi',
              'Thriller', 'War', 'Western']
    ItemInfo = pd.read_csv('../data/u.item', sep='|', names=header, encoding = "ISO-8859-1")
    ItemInfo = ItemInfo.drop(columns=['title', 'release_date', 'video_release_date', 'IMDb_URL', 'unknown'])
    item_id_ = ItemInfo['item_id']
    item_Id_dum = pd.get_dummies(ItemInfo['item_id'], columns=['item_id'], prefix='item_id_')
    item_Id_dum['item_id']=item_id_

    # 训练数据
    trainData = pd.read_csv('../data/ua.base', sep='\t', names=['uid', 'item_id', 'rating', 'time'])
    trainData = trainData.drop(columns=['time'])

    trainData['rating']=trainData.rating.apply(lambda x:1 if int(x)>3 else 0)

    Y_train=pd.get_dummies(trainData['rating'],columns=['rating'],prefix='y_')

    X_train = pd.merge(trainData, userId_dum, how='left')
    X_train = pd.merge(X_train, item_Id_dum, how='left')
    X_train=X_train.drop(columns=['uid','item_id','rating'])


    # 测试数据
    testData = pd.read_csv('../data/ua.test', sep='\t', names=['uid', 'item_id', 'rating', 'time'])
    testData = testData.drop(columns=['time'])

    testData['rating']=testData.rating.apply(lambda x:1 if int(x)>3 else 0)
    Y_test=pd.get_dummies(testData['rating'],columns=['rating'],prefix='y_')

    X_test = pd.merge(testData, userId_dum, how='left')
    X_test = pd.merge(X_test, item_Id_dum, how='left')
    X_test=X_test.drop(columns=['uid','item_id','rating'])

    return X_train.values,Y_train.values,X_test.values,Y_test.values

class FM():
    def __init__(self,vec_dim   ,learning_rate ,feature_length):
        """
        初始化参数
        :param vec_dim: 隐藏因子个数
        :param learning_rate: 学习率
        :param feature_length:特征数
        """
        self.vec_dim=vec_dim
        self.learning_rate=learning_rate
        self.feature_length=feature_length
    # 创建输入占位符
    def add_input(self):
        self.X = tf.placeholder(shape=[None, self.feature_length], dtype=tf.float32, name='input_X')
        self.Y = tf.placeholder(shape=[None, 2], dtype=tf.float32, name='input_y')
    # 创建计算规则
    def inference(self):
        with tf.variable_scope('linear_layer'):
            w0 = tf.get_variable(name='w0', shape=[2], dtype=tf.float32)
            self.w = tf.get_variable(name='w', shape=[self.feature_length, 2],dtype=tf.float32)
            self.linear_layer = tf.add(tf.matmul(self.X, self.w) , w0)
        with tf.variable_scope('interaction_layer'):
            self.v = tf.get_variable(name='v', shape=[self.feature_length, self.vec_dim],dtype=tf.float32)
            self.interaction_layer = tf.multiply(0.5,
                                                 tf.reduce_sum(
                                                     tf.subtract(
                                                         tf.pow(tf.matmul(self.X, self.v), 2),
                                                         tf.matmul(self.X, tf.pow(self.v, 2))),
                                                     1, keep_dims=True))
        self.y_out = tf.add(self.linear_layer, self.interaction_layer)


    # 损失函数计算
    def add_loss(self):
        self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=self.Y, logits=self.y_out))

    #计算accuracy
    def add_accuracy(self):
        # accuracy
        self.correct_prediction = tf.equal(tf.cast(tf.argmax(self.y_out,1), tf.float32), tf.cast(tf.argmax(self.Y,1), tf.float32))
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))

    #训练
    def train(self):
        optimizer = tf.train.FtrlOptimizer(self.learning_rate, l1_regularization_strength=2e-2,
                                           l2_regularization_strength=0)
        self.train_op = optimizer.minimize(self.loss)


    #构建图
    def build_graph(self):
        self.add_input()
        self.inference()
        self.add_loss()
        self.add_accuracy()
        self.train()

def train_model(sess, model, X_train,Y_train,batch_size, epochs=100):
    num = len(X_train) // batch_size+1
    for step in range(epochs):
        print("epochs{0}:".format(step+1))
        for i in range(num):
            index = np.random.choice(len(X_train), batch_size)
            batch_x = X_train[index]
            batch_y = Y_train[index]
            feed_dict = {
    
    model.X: batch_x,
                         model.Y: batch_y}
            sess.run(model.train_op, feed_dict=feed_dict)

            # print("Iteration {0}: with minibatch  training loss = {1}"
            #       .format(step+1, loss))

            if (i+1)%100==0:
                loss ,accuracy= sess.run([model.loss,model.accuracy], feed_dict=feed_dict)
                print("Iteration {0}: with minibatch training loss = {1} accuracy = {2}"
                      .format(step+1, loss,accuracy))

def test_model(sess,model,X_test,Y_test,batch_size):

    # num = len(X_test) // batch_size+1
    #
    # for i in range(num):
    #     index = np.random.choice(len(X_test), batch_size)
    #     batch_x = X_test[index]
    #     batch_y = np.transpose([Y_test[index]])
    #
    #     feed_dict = {model.X: batch_x,
    #                  model.Y: batch_y}
    #     y_out,loss= sess.run([model.y_out,model.loss], feed_dict=feed_dict)
    #
    #     print(loss)

        print(sess.run([model.loss], feed_dict={
    
    model.X: X_test, model.Y: Y_test}))

if __name__ == '__main__':
    X_train,Y_train,X_test,Y_test=loadData()


    # print(np.shape(X_train))
    # print(np.shape(Y_train))
    # print(np.shape(X_test))
    print(Y_test)


    learning_rate = 0.001
    batch_size = 64
    vec_dim = 10
    feature_length = X_train.shape[1]
    model = FM(vec_dim  ,learning_rate ,feature_length)
    model.build_graph()

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())
        print('start training...')
        train_model(sess,model,X_train,Y_train,batch_size,epochs=10)
        print('start testing...')

        test_model(sess,model,X_test,Y_test,batch_size)

猜你喜欢

转载自blog.csdn.net/weixin_41044112/article/details/107772355