Introduction to machine learning, ALS, LR, GBDT [reproduced]

【转】https://blog.csdn.net/haozi_rou/article/details/104845317

In mainstream apps on the market, in most cases, different users see different pages, and there is a recommended factor.

So if we want to make recommendations, the first thing we need to achieve is of course thousands of people, that is, the content recommended by different people is different, and then we need to recommend according to the scene.

The method
of recommendation is based on the recommendation of rules: it can be recommended according to sales volume and category.

Recommendations based on traditional machine learning: based on the historical behavior of massive users or the characteristics of store users. Of course, the recommendations mentioned above are based on machine learning algorithms.

Recommendations based on deep learning: Algorithms based on neural networks can more deeply explore user demands and calculate through continuous backstepping recursion.

Recommended model
rule model: There are clear rules definitions and simple arithmetic formulas.

Machine learning model training: arithmetic formula after data training. Although there is also a process of adjustment, more depends on the historical behavior data, as long as these data are good enough, you can calculate the mathematical formula.

Machine learning model prediction: the result of the data to be predicted after the arithmetic formula of the training model. For example, if you have basic data and historical behaviors of users, you can use these data to train with the characteristics of our stores. Calculate the arithmetic formula.

Model evaluation indicators
offline indicators: recall rate, precision rate, auc (whether the prediction is correct is more advanced than the wrong prediction), etc.

Online metrics: clickthrough rate, transaction conversion rate, etc.

A / B test: The formulas of group A and group B are tested at the same time, whichever is better.

Recommended system architecture

 

 

Let me talk about recall sorting and rearrangement. Recall is equivalent to selecting the first 100 of them from a bunch of data centers according to user habits, and then sorting the first 100 commodities with another algorithm, and then sorting for the third time. The rearrangement can advance the order of, for example, advertised shelves. Generally, the three sorting algorithms are different.

Besides the lower data layer, online data collection refers to user behavior, such as clicks, transactions, etc. Offline data collection refers to data from business databases, etc. These data can be grouped into big data platforms for machine learning. The offline recall model refers to the model trained by extracting the desired data from the big data platform. Collaborative filtering probably means that the two users of AB behave very similarly. One day B adds a new product, then recommend this product to A. Then store the data in the storage device after the model calculation, so that when the user accesses, the rough row can get the data from it.

 

Personalized recall algorithm ALS
ALS is the least square method. He uses the result of matrix decomposition to approximate the existing data indefinitely and obtain the hidden features. Reuse the hidden features to predict the remaining results.

First look at the table, 4 users, user2 browsed product 2 and product 3, no matter how many times they browsed, they are counted as 1 point, user1 bought product 1, so add 2 points to the 1 point Formed our matrix table.

 

 

What the recommendation system needs to do is to tap the user's potential needs, that is, to predict a matrix without values. This is what ALS has to do.

How does ALS work?

There are two tables:

 

 

Each user and each product will have five characteristics, of course, the characteristics in the two tables may be different, but the number must be the same, and there is a corresponding score. Then pass

 

 

The user matrix and the product p transpose matrix are multiplied, the feature 1 of user1 is multiplied by the feature 1 of commodity 1, and finally added, a first table will be generated. Then ALS is to approximate the scores in the existing table through continuous recursive fitting, so as to predict the numbers in other spaces.

Personalized sorting algorithm LR
LR, also called logistic regression. Look at a formula:

Y = ax1 + bx2 + cx3 + dx4 ...

The sorting problem can also be regarded as a click-through rate estimation in some sense. In the formula, x1x2x3 can be regarded as the characteristics of the user, for example, x1 is age, x2 is gender, etc. In the formula, each feature has A weight abcde, etc., the result will be a Y, the closer to 1 means the greater the probability of clicking. This is a forecasting process.
 

 

The LR algorithm is to calculate abcde. The big data platform will collect Y samples, which may be 1 or 0. On the way, blue is a positive sample, purple is a negative sample, and lr needs to calculate the red line. After the red line, you can make a prediction. A new x point comes in. You can use the red line to estimate whether this point has a high probability of positive samples or a negative sample. This is the principle of the relatively simple logistic regression sorting algorithm.

Decision tree algorithm
Decision tree algorithm, in fact, is a multiple classification selector combined Chengde results, that is, input a parameter, according to this parameter returns 1/0, for example, it can be imagined as the kind of psychological test questions in the previous magazine . Through multiple choices, get a result.

So how to define the characteristics of each node of the decision tree? In principle, the more you can separate most of the features, the higher. How to measure? There is a term in mathematics called information entropy to measure the amount of information, that is, a measure of the uncertainty of random variables. The greater the entropy, the greater the uncertainty.

For example, the weather is sunny, cloudy, rainy, the temperature is cold, hot and moderate, the humidity is high and medium, the wind is windy and no wind, and finally there is a result, whether to go out and play.

So how do we construct this choice tree? We select the node with the highest entropy and place it on top, and then down. Discrete features directly follow the classification selector, and continuous features can be entered using two-point, three-point and other classification methods, such as 20 years old, 20-40 and so on.

The disadvantage of decision tree: when there are too many sample features, the height of the tree is too high. When there is a problem with the sample feature itself, if it is overfitted, it will bias the prediction.

In order to avoid the shortcomings of decision trees, we derived two algorithms for decision trees: random forest method and GBDT

Random Forest: Randomly select samples (replace sampling), that is, randomly select several samples and several features to generate a decision tree, and then put back and then randomly select samples to generate a decision tree, so that a random forest can be generated. Finally, in the testing phase, the results of all decision trees are aggregated together and averaged.

Of course, the shortcoming of randomness is uncertainty, which is both an advantage and a disadvantage. Based on this, another algorithm is derived: GBDT

Regarding GBDT, the order is:

Get a benchmark learner from the initial training set.

Use the benchmark learner to predict the training samples and adjust the weight of the wrong sample attributes.

Iteratively generate T learners

T learning periods serial prediction weighted combination

For GBDT, it uses the first tree as a benchmark and gradually adjusts it, so the tree that comes out in this way is more accurate.
 

 

Guess you like

Origin www.cnblogs.com/linkmust/p/12708275.html