推荐系统 | 学习笔记:Field-aware Factorization Machines for CTR Prediction

ABSTRACT

  • First, we propose efficient implementations for training
    FFMs.
  • Then we comprehensively analyze FFMs and compare
    this approach with competing models. Experiments
    show that FFMs are very useful for certain classification
    problems.
  • Finally, we have released a package of FFMs for
    public use.

1. INTRODUCTION

Code used for experiments in this paper and the package LIBFFM are respectively available at:
http://www.csie.ntu.edu.tw/˜cjlin/ffm/exps
http://www.csie.ntu.edu.tw/˜cjlin/libffm


2. POLY2 AND FM

FMs can be better than Poly2 when the data set is sparse


3. FFM

  • In FMs, every feature has only one latent vector to learn the latent effect with any other features, however, in FFMs, each feature has several latent vectors.

  • 在这里插入图片描述

  • usually,
    k F F M < < k F M k_{FFM} << k_{FM}


3.1 Solving the Optimization Problem

在这里插入图片描述


3.2 Parallelization on Shared-memory Systems

In Section 4.4 we run extensive experiments to investigate the effectiveness of parallelization.


3.3 Adding Field Information

在这里插入图片描述

Categorical Features

在这里插入图片描述

Numerical Features

在这里插入图片描述

Single-field Features

在这里插入图片描述


4. EXPERIMENTS

  • we first provide the details about the experimental setting in Section 4.1.
  • Then, we investigate the impact of parameters.
  • in Section 4.3, we discuss this issue(FFM is sensitive to the number of epochs) in detail before proposing an early stopping trick.
  • The speedup of parallelization is studied in Section 4.4
  • in Sections 4.5-4.6, we compare FFMs with other models including Poly2
    and FMs.

4.1 Experiment Settings

Data Sets

在这里插入图片描述
在这里插入图片描述

Platform

Evaluation

Implementation

  • use SSE instructions to boost the efficiency of inner products
  • The parallelization discussed in Section 3.2 is implemented by OpenMP

4.2 Impact of Parameters

  • k does not affect the logloss much
  • If λ is too large, the model is not able to achieve a good performance. On the contrary, with a small λ, the model gets better results, but it easily over-
    fits the data.
  • if we apply a small η, FFMs will obtain its best performance slowly. with a large η, FFMs are able to quickly reduce the logloss, but then over-fitting occurs.

4.3 Early Stopping


4.4 Speedup


4.5 Comparison with LMs, Poly2, and FMs on Two CTR Competition Data Sets

  • FFMs outperform other models in terms of logloss, but it also requires
    longer training time than LMs and FMs.
  • though the logloss of LMs is worse than other models, it is significantly faster.
  • Poly2 is the slowest among all models
  • FM is a good balance between logloss and speed.

4.6 Comparison on More Data Sets

  • When a data set contains only numerical features, FFMs may not have an obvious advantage
  • If we use dummy fields, then FFMs do not out-perform FMs, a result indicating that the field information is not helpful.
  • On the other hand, if we discretize numerical features, though FFMs is the best among all models, the performance is much worse than that of using dummy fields.
  • FFMs should be effective for data sets that contain categorical features and are transformed to binary features.
  • If the transformed set is not sparse enough, FFMs seem to bring less benefit.
  • It is more difficult to apply FFMs on numerical data sets.

5. CONCLUSIONS AND FUTURE WORKS

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/cat_xing/article/details/88757788