https://www.cnblogs.com/futurehau/p/6184585.html
1. CTR estimated flow
Data - "pretreatment -" feature extraction - "Model Training -" after treatment
Characteristics determine the upper limit to achieve a good evaluation, the model determines how close to this limit.
2. Data Preprocessing
label matches : Show logs and click logs do a join
Sampling : negative samples (low CTR, randomly dropping a portion of the negative samples
A combination of information : the relevant information needed to find another file, you need a combination of relevant information. For example: What if you need to see a query_id representatives that need to correspond to the id of txt query: cat queryid_tokensid.txt | awk '$ 1 == 14092 {print $ 0}' | head
Every time this operation would be more trouble, so they need this information directly incorporated into the training data to go. This is a combination of data preprocessing features inside: Join