Article directory
In today's programmer interview process, it is not difficult to examine some of the SQL skills, but it is almost a must. Give a compliment~
ps: Data developers, it is recommended that you can increase the difficulty to investigate by yourself
Difficulty: Moderate
Interview Frequency: Very High (Have encountered similar questions 3 times+)
Indicator background
The retention rate is a core statistical indicator used to reflect the operation of a website, Internet application or online game . Its specific meaning is the number of daily active users who still start the app on the Nth day during the statistical period (week/month). The average of the proportions. Among them, N usually takes 2, 4, 8, 15, and 31, which correspond to the next-day retention rate, the three-day retention rate, the weekly retention rate, the semi-monthly retention rate, and the monthly retention rate.
In short, retention indicators are very important indicators for toC companies. We verify some strategies a lot of time. For example, in the AB test of the recommendation system, we compare the retention rates of different algorithms and strategies in different channels to verify the effect of improvement.
For toC companies, data or big data related interviews, I think it is still very necessary to be proficient in handwriting the commonly used user retention rate indicator SQL.
Generally, companies are displayed in this form of retention calendar: the effect is clear at a glance
Indicator calculation
Retention rate = the average number of daily active users who still start the app on the Nth day (general statistical period is weekly/monthly)
For example, on the 1st, 100 users have logged in (or registered) (these 100 users are the active users of the 1st), and on the 2nd, 100 users have logged in, but only 20 of the 100 users have logged in. If the user logged in yesterday, the retention rate on the 2nd is 20/100=20%.
PS: In fact, it is more accurate to use the newly added registration number~~
A company's real retention report (note that the channel and specific number of users have been hidden/desensitized):
real question
Generally, it may directly give you a login login flow table; only two fields
- User field
- The login time field
allows you to ask for the retention rate of N days, such as calculating 7-day retention.
The simple mock point data is as follows:
ideas
This kind of problem first needs to have ideas, and ideas come out with code.
- Associate your own user_id B whose login time is greater than your own user_id A, and take out the corresponding login time
- Calculate the difference between the landing times of A and B, and get the difference in the number of days when you log in later.
- Count statistics by the difference between 1-N days
Details: Deduplication and Null Values
See the code below
open
The code can be written freely according to the ideas, and readers can play according to their own needs.
expand
Do you have a better solution or idea?