Evolution of Dewu Community Recommendation Refined Arrangement Model

1. Background

Dewu Community is a fashion life community where a large number of young people obtain trend information and share their daily life. Among them, the personalized distribution of the information browsed by users is made by the recommendation system. At present, multiple scenarios in the Dewu community have access to recommendation algorithms, including dual-column stream recommendation on the homepage, immersive video recommendation, category tab recommendation stream, and live broadcast recommendation stream. In order to provide users with better service and experience, we A lot of optimization has been done for related services from the entire recommendation system dimension. Now mainstream recommendation systems are composed of multiple modules such as recall, rough ranking, fine ranking, and mechanism. This article mainly introduces some of the work and thinking we have done during the evolution of the fine ranking level.

1.png

2. Challenge and solution

In the process of interacting with the information flow, users will generate behaviors such as click, read, like, follow, bookmark, comment, and negative feedback, which are generally the core indicators of business concern and can also be used as signals for algorithm students to model. Among them, click is the entry point of a series of user behavior trajectories, which is relatively not sparse, and is often one of the most concerned goals in the initial stage of an information flow recommendation system. How to accurately model user interests has always been a hot topic in the process of recommender systems in the industry from fledgling to showing their talents over the years. A good business modeling paradigm in the industry is to do an iterative system optimization that serves business goals under certain resource constraints. For recommendation systems, it is necessary to consider the system engine, computing resources, model iteration and maintenance. The iterative nature of human resources, systems and models, and multi-team cooperation drive the entire system to continue moving toward business goals. Dismantling to the fine-tuning level, we need to solve the challenges brought about by multiple scenarios, multiple groups of people, and multiple targets in order to accurately estimate user interests. The following describes our specific solutions to these challenges in the Dewu community from multiple directions such as features, samples, multi-objective modeling, and new user cold start.

2.png

2.1 Features

The technical evolution of the single-objective CTR model can be observed from two different perspectives, one dimension is feature engineering, and the other dimension is model structure complexity. In the early days of the CTR model, limited by computing resources, the model structure was often relatively simple, and the LR model was the most widely used in the early days. At this stage, algorithm engineers spend more time manually designing features, so as to iterate for different business backgrounds to obtain index benefits.

The recommendation system refinement model is actually a model that estimates the probability of user behavior. We hope that the model can remember the user's historical behavior on the one hand (that is, the fitting ability), and on the other hand, it can be reasonably expanded based on historical data (that is, the generalization ability ). In the period of traditional machine learning, models such as LR, SVM, and GDBT already have good fitting capabilities and can perform extremely well on training data sets. But in actual business, the real difficulty lies in how to accurately predict future behavior based on past data. Everything is based on numbers. From a mathematical perspective, model modeling is essentially the abstraction and simulation of a part of the operating laws of the real world in the digital space. The accuracy of the representation of real behavior in the digital space largely determines the effect of modeling. Fortunately, with the development of deep learning, Embedding-based representation technology has become more and more mature, basically solving the representation bottleneck of modeling. And this mapping space is often called the feature vector space.

For the fine-tuning model of the recommendation system, the basic unit with a realistic concept in the vector space is the feature, which also reveals the importance of the feature-oriented work for the entire modeling. The design of the characteristics of each business scenario requires algorithm engineers to have sufficient understanding of the business and rich relevant experience. Feature engineering is also a task with a large weight of resource investment in the algorithm work, which requires continuous polishing and optimization. .

2.1.1 Feature Design

The features used by the model can be divided differently according to different angles. According to the source of features, it can be divided into user features, item features, context features, cross features, and cascaded model features; according to the feature structure, it can generally be distinguished according to Dense and Sparse; according to the timeliness of features, it is often divided into offline features and real-time features feature. For specific business scenarios, you can design the characteristics of each domain as a whole according to the source of the characteristics according to the table below, and continuously optimize and upgrade the characteristics in the iterative process.

Feishu 20230111143234.jpg

Each feature should be designed in conjunction with the business. For example, statistical features need to consider the aggregation time window, and sequence features need to consider the length of the sequence. These can be selected and selected according to the actual situation.

3.png

On the basis of design features, algorithm engineers also need to promote the upstream and downstream to open up data links, verify the quality of features, and introduce them into existing models for offline research. If the small traffic AB experiment has confidence benefits, the new version of features can take effect in full. A common feature mining method is based on content understanding algorithms, using natural language processing, computer vision, speech recognition, etc., to deeply mine content and produce high-quality features, so that the model can more easily capture user points of interest. According to business needs, in the process of continuous iteration, new effective features will be added continuously, and old invalid features will be gradually offline. In our business scenario, the number of features used by the model has also increased by 30% in the iterative process. , the distribution efficiency of the system has also been greatly improved. The importance of features to model estimation can be evaluated through auc-diff. For system stability, it is also necessary to monitor the coverage and value distribution of each feature online in real time to avoid abnormal data from affecting the market.

2.1.2 Feature Processing

All the features used in recommender systems can be divided into four categories according to their different feature structures and processing methods.

Numerical features, the original value of the feature is a continuous value within a certain interval, such as dynamic posterior CTR, video duration, number of likes, etc., usually processed as follows

  • It can increase the robustness to feature outliers, improve nonlinear capabilities, speed up algorithm processing performance, and facilitate feature crossover
  • Some information will be lost, and the jump of boundary discrete values ​​will affect the stability of model estimation
  • Equal-width binning, clustering binning, equal-frequency binning, decision tree binning, and chi-square binning can be used
  • Feature max-min normalization, normalization, etc.
  • Discretization of continuous features
  • Non-linear transformation, such as commonly used log(x+1), etc.

Single-valued discrete features, a sample has only one discrete value, such as mobile phone model, user gender, etc.

  • One-Hot Encoding
  • Check the LookUp table to get the vector representation

Multi-valued offline features, a sample can have multiple discrete values, such as user click sequences, Item tags, etc.

  • Artificially Generated Intersection Features
  • Check the LookUp table to get a multi-dimensional vector, which can be used to generate a unique vector representation after fusion by splicing, Pooling, Attention, etc.

KV feature, a sample Key can have multiple offline values ​​and corresponding Value values

  • After Key and Value are discretized, weighted use
  • After splicing Key and Value, discretization can be used

In the field of recommendation systems, among the various features in the above table, there are two types of features that are very recommended, and in different businesses, algorithm engineers often invest heavily in them, and basically obtain good returns.

2.1.3 High-dimensional sparse category features

The first one is the high-dimensional sparse category features. Due to their high-dimensional sparsity, such features have better linear separability in the vector space, so that the model is easier to remember samples. For a relatively mature recommendation system, the dimension of such features can reach hundreds of millions of dimensions, or even billions of dimensions.

In order for the model to successfully use such a large high-dimensional feature, it is necessary to do a lot of in-depth optimization work in algorithm joint engineering. The commonly used solution is the dynamic elastic feature (EmbeddingVariable), which can solve the problems of difficult to predict the size of the static feature vocabulary, feature conflicts, memory and IO redundancy, etc., and can pass feature admission, feature exit, and underlying hash table without Measures such as locking and fine-grained memory layout are used to improve storage and access efficiency. With the introduction of the dynamic elastic EV feature, there are good benefits in various scenarios of the Dewu community.

2.1.4 Cross Feature

The other is the famous cross feature. The cross feature is obtained through the cross combination of multiple features, which can effectively enhance the expressive ability of the model. Algorithm engineers have tried a lot of work on feature intersection over the years, which are generally divided into explicit intersection and implicit intersection.

Explicit intersection is based on prior knowledge. Algorithm engineers artificially construct intersection features. Generally, there are three types of intersection that can be used. Among them, the Cartesian product is more commonly used because of its better effect, but the Cartesian product may explode in dimension, so it is necessary to construct the Cartesian product according to the actual business data analysis situation. For example, in our scenario, each user has different preferences in different categories. In order to make the system pay more attention to this when providing services to users, we can try to introduce user preferences and dynamic categories into the model. The purpose of cross-features is to improve user experience.

Feishu 20230111143804.jpg

Implicit crossover is to let the model automatically learn crossover through the network structure. With the development of crossover technology, algorithm engineers often use implicit crossover, which can not only relieve the dependence on manual experience, but also continuously improve the model training process self-optimizing. In recent years, the classic works in this area are mainly FM, FFM, Wide&Deep, DeepFM, DCN and CAN and other structures. Among them, DeepFM can be used as a better benchmark in different recommendation scenarios because of its simple structure and outstanding effect. .

As a classic master of the feature cross structure, DeepFM can achieve end-to-end low-order and high-order feature cross fusion. Among them, the FM structure can carry out the intersection of low-order features to improve the memory ability of the model; the Deep structure can perform cross-fusion of high-order features to improve the generalization ability of the model. In the earliest days of the Dewu community, at the ranking level, the fine sorting model only modeled CTR, and the model architecture adopted the relatively mature DeepFM.

4.png

2.2 Sample

For a recommendation system, the model training samples and features determine the upper limit of the model effect, and a high-quality training sample set can effectively improve the prediction ability of the refined model. The generation of samples needs to rely on online logs. An excellent framework for producing sample streams involves many parties, including front-end buried points, recommendation engines, estimation services, data warehouses, and so on. In order to be responsible for business results, algorithm engineers need to monitor the quality of samples in addition to focusing on the model itself, and work with upstream and downstream to ensure the stability of high-quality sample production.

2.2.1 Real-time sample streaming architecture

In the early days of the Dewu community, the model training samples were spliced ​​based on offline feature tables and offline user behavior tables. In addition to obvious timeliness problems, online and offline inconsistencies in sample features may also occur, affecting the overall performance of the system. Distribution efficiency and distribution effect.

In order to solve the problem of high-quality sample production, we have built a real-time sample streaming framework by coordinating resources, designing and promoting multiple parties. Samples are produced through real-time sample flow, and the timeliness of samples is greatly improved, from days to minutes, which can support the launch of real-time models and lay a solid foundation for the rapid iteration of subsequent algorithm models.

The real-time data flow architecture can be summarized as production, attribution and splicing of three log streams.

  • The first data stream is the client log stream, which is based on the client’s buried point reporting the buried point information by triggering an event. The buried point data includes the (reqid, userid, itemid) triplet sent by the server to the client and other information. When users browse the information stream, they will continuously trigger behavior data such as exposure, clicks, and likes, so that the client log stream will continuously generate data.
  • The second data stream is the server engine log stream, which is the important information left by the engine during the process of getting the recommendation results and returning them to the client through the server and the entire recommendation engine according to the user request initiated by the client. It also includes (reqid, userid, itemid) triples, recommendation results, and positive ranking information.
  • The last data stream is the estimated log stream dropped by the estimated service. It is the engine that sends user portraits and recall or rough sorting results to the predictive machine, and the fine sorting model in the predictive machine is used for scoring. In this During the process, feature information such as item features and user features used by the model will be dumped. The data volume of the feature stream is also the largest among the three streams. It is often necessary to reduce the number of dumped items in the form of ACK, thereby effectively saving resources.

Three log streams can be effectively associated through (reqid, userid, itemid) triples to form a real-time attribution wide table. Among them, the client log stream provides the user's real feedback labels, the server engine log stream provides information about each link of the recommendation engine, and the estimated service log stream provides feature information used by the model, ensuring the consistency of online and offline features.

5.jpeg

In the process of producing real-time samples using a live sample stream, a classic problem will be encountered, which is "user delayed feedback". This is due to the fact that there is often a certain time difference between the reporting of data at the exposure point and the user's click on the dynamic and deeper interaction behaviors. For example, when a user watches a video, he or she will like and comment on the video after watching it for a few minutes. At this time, if we design the attribution unreasonably, this real-time sample will be a negative sample. Generally, when attributing user feedback tags, the attribution time window will be considered. The attribution window of the offline table can be understood as 1d, but the real-time calculation is implemented in the memory. Due to the consideration of cost, it is difficult to set a large window. It can be combined with the analysis of real business data to improve cost and timeliness. Find the right balance between performance and labeling accuracy. In our scenario, by selecting an appropriate threshold, the positive labels of the real-time sample table finally reached 95% of the offline table. For delayed samples, an effective solution is to design different sample back-up mechanisms to correct the sample distribution based on importance sampling.

6.png

2.2.2 Sampling

The CTR model is a binary classification model in order to estimate the probability of clicking on the exposures browsed by the user. When modeling intuitively, the user's click will be regarded as a positive sample, and the exposure and non-click will be regarded as a negative sample. However, due to the relatively sparse click behavior of users, this method of directly constructing the training sample set will cause a serious imbalance between positive and negative samples. In some scenarios, the ratio may be lower than 1:100, and the training effect is often not good enough.

In order to solve the problem of category imbalance, a common practice is to sample negative samples. Only negative samples sampled through a certain strategy can be used to train the model. There are many ways to implement negative sampling, generally according to the sampling quality, sampling deviation and sampling efficiency, and can be roughly divided into artificial rule sampling and model-based sampling. Among them, the commonly used artificial rule sampling is random negative sampling and popularity-based negative sampling. Model-based sampling essentially optimizes the quality of negative samples through model iteration. Generally, the ideas of Boosting and GAN confrontation learning are used to constantly mine strong negative samples. A better recent work is SRNS.

In our scenario, sampling is currently achieved by randomly discarding negative samples. There is a deviation between the pctr estimated by the model trained after sampling and the actual posterior click-through rate CTR, so when using the estimated pctr online, you need to use the following conversion formula to correct it first, and then use it when sorting. In addition to sampling, another solution that can be tried is to adjust the weight of the Loss of different samples during training, which can also alleviate the impact of category imbalance. However, the task of weight adjustment is relatively heavy, and it may be difficult to adjust to the ideal for a while. The effect and the estimated score are also difficult to restore.

9.jpeg

For a business scenario, multiple business indicators are often concerned. In addition to clicks, other important concerns are the subsequent behaviors of users after clicking. For e-commerce scenarios, it is generally deep-level user behaviors such as collecting products and placing orders, while for information flow scenarios, it is more about user interaction behaviors such as viewing time, likes, and comments. These conversion behaviors occur after the user clicks. If the interaction is modeled on the click sample space, direct use online will generate bias, which is called sample selection bias. When multi-objective joint modeling can be achieved by designing a specific model structure solve.

10.png

In the Dewu community scene, based on some problems encountered and discovered online, we have also done other explorations and practices at the sample level.

  • User conversion signals such as comments, attention, and sharing are generally relatively sparse. If the model is not trained enough, it is difficult to achieve good results if the model is modeled alone. Joint training with clicks will be biased by denser click signals. An effective solution is to aggregate signals of the same type and resample these signals to alleviate the impact of click signals.
  • Sample random negative samples are not friendly to low-active users, and may even cause the gradual loss of exposed and unclicked users. In the negative sampling, it is necessary to consider the exposed and unclicked samples of low-active users, and at the same time, try to add the exposed and unclicked sequences at the feature level.
  • The ideal sample is to retain as much as possible and extract effective information of the real scene based on prior knowledge while removing noise interference. One of the information that may be beneficial is the session of the user sample, so it is recommended to try to build a sample based on the user session.

2.3 Multi-target

Compared with modeling a single goal, modeling multiple business goals will encounter more challenges. One of the more common problems is that there will be a seesaw phenomenon between multiple indicators. In order to alleviate these problems, after years of practice and technological development in the industry, many excellent models ESSM, MMOE, PLE and ESCM have been accumulated. Among them, the more important and widely used models are ESSM and MMOE, which are used in many business scenarios. It has a good effect. In the Dewu community scene, the multi-objective modeling also draws on the ideas of related models.

11.png

2.3.1 Model structure

2.3.1.1 Home page dual column flow

With the development of the business, the recommended streaming model on the homepage of the Dewu community has been iteratively upgraded, and the model's ability to personalize has been continuously improved. Generally speaking, it can be divided into four stages.

The first stage is the early stage, only the user click-through rate is modeled, and the refined layer only has the CTR model. After several iterations during the period, from the initial DeepFM structure to the DLRM structure combined with business characteristics, the feature crossover capability has been significantly improved, and the DIN module for extracting deep interests of users has been added, and good benefits have been achieved.

  • CTR model

12.png

The second stage is to increase the separate modeling of user duration, hoping to improve the system's ability to estimate user duration, and the fine sorting layer will have a CTR model and a duration model. The first version of the duration model adopts a relatively mature DeepFM structure, and under the condition that the exchange of CTR loss is acceptable, it brings about a relative increase of +3% in the average duration of the market.

  • duration model

13.png

The third stage is to jointly model user interaction behaviors such as likes, followers, favorites, comments, and shares, and user duration, and use interactive signals to better capture user points of interest. There will be two models in the refinement layer, including the CTR model and Duration interactive twin tower model. After effectively adjusting the parameters of the multi-objective sub-fusion formula, when other indicators are basically the same, the penetration rate of interactive users in the market has increased by +6%.

  • Duration Interactive Twin Towers Model

14.png

The fourth stage is the multi-objective unified modeling of user clicks, user duration and user interaction, and separate modeling of user negative feedback to better integrate the modeling ability of the refined layer for user interests. The refined layer will have two Models, that is, multi-objective models such as click, duration, and interaction, and negative feedback models. Compared with the two-tower model, the multi-objective model needs to be able to adapt to more targets in structure, especially to solve the interaction between CTR tasks and sparse tasks. By constructing a loss function based on the pct_time and pct_inte nodes during training, and performing gradient blocking on the pctr node, it is possible to uniformly model multiple targets in the exposure space. Use ptime and pinte online as the estimated duration and interaction points, and the fusion formula can be consistent online and offline, which helps to obtain the benefits of offline research online. After the launch, the relative increase of market ctr was +2.3%, the relative increase of per capita time was +0.33%, and the relative penetration rate of interactive users was +4.5%. The negative feedback model takes effect at the mechanism level through smooth weight reduction, and the negative feedback rate of the market is relatively reduced by 16%.

  • multi-objective model

15.png

  • Negative Feedback Smooth Downweight

16.jpeg

  • negative feedback model

17.png

2.3.1.2 Immersive Video Single Column Streaming

Different from the dual-column stream product form on the homepage, the immersive video recommendation stream is a single-column stream scenario, and users watch different videos by continuously scrolling down. According to the characteristics of the scene, the initial modeling idea is to cut in from the completion of the video. The model will estimate the proportion pfinish_rate of the duration of the video that the user will watch to the duration of the video itself. When using it online, it will combine the duration of the video itself with videoTime and calculate the videoTime The double-ended limit alleviates the bias caused by the length of the video itself, and finally uses pfinish_rate*truncated(videoTime) as the sorting score. Like the main scene on the homepage, in the subsequent iteration process, the modeling of user interaction behavior is also added. When integrating the interaction estimated score pinte and the completion estimated score pfinish_rate, it is not surprising to encounter a seesaw phenomenon , through continuous experiments and attempts, the form of cascade sorting was finally used to obtain benefits.

Through continuous iterative optimization of several versions, the core indicators of the scene have been significantly improved, the average time of scene visit uv has been increased by +46%, and the exposure interaction rate has been increased by +15%. Combined with the particularity of video scenarios, and through the analysis of business indicators, we are recently considering modeling short-cast and long-play behaviors of users to better capture users' points of interest and provide users with more intimate video recommendation streaming services.

  • multi-objective model

18.png

2.3.2 Multi-target fusion

Multi-target modeling In addition to the model itself, another major challenge is how to effectively use multiple target scores online? We hope that through appropriate sorting goals and mechanism design, the goals that the business focuses on can be profitable, and multiple goals can be improved together. To solve this problem, we have also made various attempts in our scenarios.

The first type of relatively direct solution is to design a formula, and use the formula to fuse multiple target scores as the final ranking score. The advantage of this scheme is that it is simple and clear, and you can know how each target score takes effect on the final ranking according to the weight. One of the commonly used techniques is that due to the large difference in the distribution of estimated scores of different targets, changes in the absolute value of estimated scores will affect the tuning results, so you can consider using the sequence number of a single target score after sorting, and pass it through reasonable normalization After that, multiple targets are fused. The disadvantage is that different models need to be adjusted manually, which brings a lot of workload, and the fusion formula does not achieve personalized fusion according to different users, which affects the overall ranking effect. In the Dewu community scene, we have designed two versions of the fusion formula successively. The second version of the addition form has achieved better returns, and the number of parameters has also been effectively reduced.

  • artificial formula fusion

19.jpeg20.jpeg

The second type of solution is to use the deep model to generate the final ranking score end-to-end, avoid manual parameter adjustment, and consider personalization during fusion. The specific idea is to construct some important basic features of the user side and item side, as well as the estimated scores of multiple models, use them as the input of a simple network, and use the offline trained model to generate the final fusion score. A key point is the construction of the offline model Label, which is generally aggregated by weighting multiple targets. The selection of weights needs to be debugged in combination with the effects of business and online experiments. The disadvantage is that the fine arrangement layer needs to call one more model, which requires more resources. In addition, sometimes some ecological adjustments need to be made in the business, and the model fusion is not as fast as the formula.

  • independent fusion model

21.png

The third category is the solution that is currently being tried, that is, the personalized fusion multi-objective model architecture. We hope that on the basis of the multi-objective model, by constructing the fusion module, the multi-objective prediction and multi-objective prediction sub-fusion can be put into a complete network framework. The loss function during model training can be divided into two parts, the main network loss and the fusion network loss. The main network loss is to optimize the model's prediction of each target score, and the fusion network loss is the result of overall optimization of fusion sorting. The way of training and gradient blocking avoids interfering with the network. Theoretically, this solution combines the optimization of the previous two solutions while avoiding their shortcomings. We hope that after debugging, we can fully implement this solution in our scenario, and further integrate the ability of the refined model.

22.jpeg

  • Personalized fusion multi-objective model

Personality.png

2.4 New user cold start

The cold start of new users has always been a difficulty in the industry, mainly reflected in the following three points. In order to solve these problems, there are many classic works in the industry, such as new user MeLU and FORM models based on meta-learning. These solutions hope to give new users a more reliable initial value, and quickly adjust the dynamic learning rate to converge. , but it is often ineffective in practical applications.

New user behavior is sparse and more sensitive to recommendation results

The distribution of new and old user samples in the training set is uneven, and the proportion of new user samples is often less than 1%.

The characteristics of new user groups and old user groups are very different. Due to the dominance of old users, it will make it difficult for the model to capture the behavior patterns of new user groups

We also tried cold-starting new users on the dual-column streaming scene on the homepage of the Dewu community to improve the efficiency of cold-starting new users. Based on the analysis and judgment of business data, from pushable pools, recalls to fine sorting, breaking up the entire link and the main scene to iterate independently, aiming at the particularity of new users, from features to model structures at the level of fine sorting individual design.

For cold start tasks for new users, I personally think that the following techniques can be tried, and may have different benefits in different business scenarios.

New user sample resampling or Loss weighting to increase the speaking power of new user samples

Construct features that can characterize new user populations, such as new user ID, user's first visit time, etc.

User group ids replace new user ids to alleviate insufficient learning of new user ids

Highlight the features related to new users from the model structure, and increase the discourse power of new user features

In our scenario, the first version of the model is based on the time-weighted CTR model of effective clicks by new users. The model will pay more attention to the content with high consumption time of users, so as to help the model learn the points of interest of new users. In order to further improve the model's ability to capture the interests of different new users, we designed a multi-objective poso model on the model structure to alleviate the problems of new user behavior and sample sparseness. Through personalization at the model structure level, a better experience can be brought to the relevant people. After the full volume, the new user ctr is relatively +2.69%, the per capita recommendation time is relatively +3.08%, the per capita interaction is relatively +18%, and the new user stays for the first time Relative +1.23%.

  • Multi-objective poso model

multi-target.png

3. Outlook

This article mainly introduces some specific solutions and some progress we have made from the aspects of features, samples, multi-objective modeling and new user cold start in the face of the challenges that continue to emerge in the business. In addition to these technologies that have been implemented, we have also explored in other directions, including popularity correction, user deep interest, FeatureStore, and ultra-large-scale distributed sparse models. We hope to further release algorithm dividends in the future to ensure and promote business growth. increase.

4. Quote

[1] Chen Y , Jin J , Zhao H , et al. Asymptotically Unbiased Estimation for Delayed Feedback Modeling via Label Correction[J]. 2022.

[2] Lee H , Im J , Jang S , et al. MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation[J]. ACM, 2019.

[3] Sun X, Shi T, Gao X, et al. FORM: Follow the Online Regularized Meta-Leader for Cold-Start Recommendation[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2021: 1177-1186.

[4] Ma X, Zhao L, Huang G, et al. Entire space multi-task model: An effective approach for estimating post-click conversion rate[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018: 1137-1140.

[5] Ma J, Zhao Z, Yi X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018: 1930-1939.

[7] Guo H, Tang R, Ye Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction[J]. arXiv preprint arXiv:1703.04247, 2017.

[8] Naumov M, Mudigere D, Shi H J M, et al. Deep learning recommendation model for personalization and recommendation systems[J]. arXiv preprint arXiv:1906.00091, 2019.

[9] Zhang W, Qin J, Guo W, et al. Deep learning for click-through rate estimation[J]. arXiv preprint arXiv:2104.10584, 2021.

Text/Zhao Jun

Recommended offline activities : Dewu Technology Salon "The Evolution of Enterprise Collaboration Efficiency" (No. 19)
Time : 14:00, July 16, 2023 ~ 18:00, July 16, 2023
Location : (Yangpu, Shanghai) 5th Floor, Building C, Internet Treasure Land, No. 221, Huangxing Road (Exit 1, Ningguo Road Subway Station)

Highlights of the event : In today's increasingly competitive business environment, the efficiency of enterprise collaboration has become the key to the success of enterprise teams. More and more enterprises realize that through the support of informatization construction and tools, the efficiency of collaboration can be greatly improved and breakthroughs can be made in the industry. This salon will cover a number of topics, which will provide participants with rich thinking and experience, and help improve the efficiency of enterprise collaboration.

Through the exchange platform of Dewu Technology Salon, you will have the opportunity to learn from representatives of other companies and learn from each other's experience and practices. Discuss the best practices of enterprise internal collaboration efficiency to drive long-term survival and development of enterprises. Join Dewu Technology Salon and start a new chapter of collaborative efficiency with industry pioneers! Let us work together for a breakthrough in collaboration efficiency!

Click to sign upDewu Technology Salon "The Evolution of Enterprise Collaboration Efficiency" (No. 19)
This article is an original article of Dewu Technology.

Author: Dewu Technology
Link: juejin.cn/post/724957…

Graduates of the National People’s University stole the information of all students in the school to build a beauty scoring website, and have been criminally detained. The new Windows version of QQ based on the NT architecture is officially released. The United States will restrict China’s use of Amazon, Microsoft and other cloud services that provide training AI models . Open source projects announced to stop function development LeaferJS , the highest-paid technical position in 2023, released: Visual Studio Code 1.80, an open source and powerful 2D graphics library , supports terminal image functions . The number of Threads registrations has exceeded 30 million. "Change" deepin adopts Asahi Linux to adapt to Apple M1 database ranking in July: Oracle surges, opening up the score again
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/10086324