Return to the city to teleport –" "32 Days SQL Foundation"
Article directory
- zero, preface
- 1. Exercise questions
- Second, SQL ideas
-
- Conditional function: SQL26 count the number of users over and under 25
- Date function: SQL28 calculates the number of daily practice questions of the user in August
- Date function: SQL29 calculates average next-day retention rate for users
- Text function: SQL32 extract age
- Window function: SQL33 find out the students with the lowest GPA in each school
zero, preface
Today is the 5th day of learning SQL punch cards . Every day, I will provide an article for group members to read ( no subscription and payment required ).
I hope you will think for yourself first, if you really have no idea, look at the following problem-solving ideas and implement it yourself. Check in at the corresponding [Punch Sticker] in the Xiaoxuzhu JAVA community , and today's task is completed, and develop a good habit of learning to punch in every day.
Brother Xuzhu will organize everyone to study the same article together, so you can ask any questions in the group, and the friends in the group can help you quickly, one person can walk quickly, and a group of people can walk very quickly. Yuan, how fortunate it is to have a comrade-in-arms who learns and communicates together.
My learning strategy is very simple, question sea strategy + Feynman learning method. If you can earnestly implement these questions yourself, it means that SQL has successfully built the foundation. For the advanced learning in the future, you can continue to follow me and go to the road of architect together.
Today's learning content is: common functions that must be met
1. Exercise questions
Second, SQL ideas
Conditional function: SQL26 count the number of users over and under 25
Initialization data
drop table if exists `user_profile`;
drop table if exists `question_practice_detail`;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int
);
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL
);
CREATE TABLE `question_detail` (
`id` int NOT NULL,
`question_id`int NOT NULL,
`difficult_level` varchar(32) NOT NULL
);
INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong');
INSERT INTO question_practice_detail VALUES(8,5432,117,'wrong');
INSERT INTO question_practice_detail VALUES(9,5432,112,'wrong');
INSERT INTO question_practice_detail VALUES(10,2131,113,'right');
INSERT INTO question_practice_detail VALUES(11,5432,113,'wrong');
INSERT INTO question_practice_detail VALUES(12,2315,115,'right');
INSERT INTO question_practice_detail VALUES(13,2315,116,'right');
INSERT INTO question_practice_detail VALUES(14,2315,117,'wrong');
INSERT INTO question_practice_detail VALUES(15,5432,117,'wrong');
INSERT INTO question_practice_detail VALUES(16,5432,112,'wrong');
INSERT INTO question_practice_detail VALUES(17,2131,113,'right');
INSERT INTO question_practice_detail VALUES(18,5432,113,'wrong');
INSERT INTO question_practice_detail VALUES(19,2315,117,'wrong');
INSERT INTO question_practice_detail VALUES(20,5432,117,'wrong');
INSERT INTO question_practice_detail VALUES(21,5432,112,'wrong');
INSERT INTO question_practice_detail VALUES(22,2131,113,'right');
INSERT INTO question_practice_detail VALUES(23,5432,113,'wrong');
INSERT INTO question_detail VALUES(1,111,'hard');
INSERT INTO question_detail VALUES(2,112,'medium');
INSERT INTO question_detail VALUES(3,113,'easy');
INSERT INTO question_detail VALUES(4,115,'easy');
INSERT INTO question_detail VALUES(5,116,'medium');
INSERT INTO question_detail VALUES(6,117,'easy');
solution
Request Statistics:
- Divide users into two age groups: under 25 and 25 and over
- View the number of users in these two age groups separately
- Note for this question: if age is null, it is also recorded as under 25 years old
analyze:
- Divide users into two age groups: under 25 years old and 25 years old and above, you can use grouping here: use keywords: group by
SELECT
case
when age < 25 then 'under 25'
when age >= 25 then '25 and over'
end age_cut
FROM
user_profile
group by
age_cut
- Looking at the number of users in these two age groups is a statistical count: use keywords: count
SELECT
case
when age < 25 then ‘25岁以下’
when age >= 25 then ‘25岁及以上’
end age_cut,
count(1) as number
FROM
user_profile
group by
age_cut
- If age is null, it is also recorded as under 25 years old, and one condition should be added: age is null
SELECT
case
when age < 25
or age is null then '25岁以下'
when age >= 25 then '25岁及以上'
end age_cut,
count(1) as number
FROM
user_profile
group by
age_cut
Date function: SQL28 calculates the number of daily practice questions of the user in August
Initialization data
drop table if exists `user_profile`;
drop table if exists `question_practice_detail`;
drop table if exists `question_detail`;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int
);
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);
CREATE TABLE `question_detail` (
`id` int NOT NULL,
`question_id`int NOT NULL,
`difficult_level` varchar(32) NOT NULL
);
INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16');
INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18');
INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13');
INSERT INTO question_detail VALUES(1,111,'hard');
INSERT INTO question_detail VALUES(2,112,'medium');
INSERT INTO question_detail VALUES(3,113,'easy');
INSERT INTO question_detail VALUES(4,115,'easy');
INSERT INTO question_detail VALUES(5,116,'medium');
INSERT INTO question_detail VALUES(6,117,'easy');
solution
Request Statistics:
- The number of daily user practice questions in August 2021
analyze:
- The title data of August 2021 is daily, the condition here is August 2021, and the data of the specified date is displayed
SELECT DAY(date) AS day
FROM question_practice_detail
WHERE date LIKE ‘2021-08-%’
- The number of daily user practice questions: here is the grouping by date and counting: the function count can be used to count, using the keyword: **group by ** to group the date
SELECT DAY(date) AS day , COUNT(date) AS question_cnt
FROM question_practice_detail
WHERE date LIKE '2021-08-%'
GROUP BY date
Solution two:
As long as the date is greater than or equal to 2021-8-1 and less than or equal to 2021-8-31 , the query conditions are also met. Use the date function date() to convert the date in string format.
select
day(date) as day,
count(date) as question_cnt
from
question_practice_detail
where
date('2021-8-31') >= date
and date('2021-8-1') <= date
group by
date;
Solution three:
As long as the year is 2021 and the month is 8 , the conditions are also met: at this time, the function YEAR() can be used to obtain the year, and the function MONTH() can be used to obtain the month.
SELECT
DAY(date) AS day,
COUNT(date) AS question_cnt
FROM
question_practice_detail
WHERE
YEAR(date) = 2021
AND MONTH(date) = 08
GROUP BY
day
Date function: SQL29 calculates average next-day retention rate for users
Initialization data
drop table if exists `user_profile`;
drop table if exists `question_practice_detail`;
drop table if exists `question_detail`;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int
);
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);
CREATE TABLE `question_detail` (
`id` int NOT NULL,
`question_id`int NOT NULL,
`difficult_level` varchar(32) NOT NULL
);
INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16');
INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18');
INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13');
INSERT INTO question_detail VALUES(1,111,'hard');
INSERT INTO question_detail VALUES(2,112,'medium');
INSERT INTO question_detail VALUES(3,113,'easy');
INSERT INTO question_detail VALUES(4,115,'easy');
INSERT INTO question_detail VALUES(5,116,'medium');
INSERT INTO question_detail VALUES(6,117,'easy');
solution
Request Statistics:
- Check the average probability that a user will come back to the question the next day after brushing the question one day
- Hidden requirements, the return field is avg_ret , and 4 significant decimals are reserved
analyze:
- The query condition is that the user will come back the next day after brushing the question on a certain day, that is, the user still has a record of answering the question the next day.
- Because there is only one question_practice_detail table, it is equivalent to associate the question_practice_detail table with the question_practice_detail table (as another table with the same table structure and data). The associated field is: device_id
- There are requirements here, it is the user's data for the next day, so the date should be limited: Available keywords: datediff
SELECT DATEDIFF(day,'2008-12-29','2008-12-30') AS DiffDate
//The result DiffDate value is 1
SELECT DATEDIFF(day,'2008-12-30','2008-12-29') AS DiffDate
//The result DiffDate value is -1
SELECT
qpd1.*
FROM
question_practice_detail qpd1
LEFT JOIN question_practice_detail qpd2 ON qpd1.device_id = qpd2.device_id
and datediff(qpd1.date, qpd2.date) = 1
- The calculation method of the average probability that a user will brush the question again the next day after brushing the question on a certain day: if the same user brushes the question on a certain day, the count is increased by 1; the denominator is the sum of the counts of a certain day; the numerator is the sum of the counts of the next day; To deduplicate users and time (because the same user will brush multiple questions on the same day)
- De- reuse keyword: distinct , count keyword: COUNT
COUNT(distinct qpd2.device_id, qpd2.date) / COUNT(distinct qpd1.device_id, qpd1.date) as avg_ret
- The return field is avg_ret and retains 4 significant decimals. The key function can be used: round()
SELECT
round(COUNT(distinct qpd2.device_id, qpd2.date) / COUNT(distinct qpd1.device_id, qpd1.date),4) as avg_ret
FROM
question_practice_detail qpd1
LEFT JOIN question_practice_detail qpd2 ON qpd1.device_id = qpd2.device_id
and datediff(qpd1.date, qpd2.date) = 1
Text function: SQL32 extract age
Initialization data
drop table if exists user_submit;
CREATE TABLE `user_submit` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`profile` varchar(100) NOT NULL,
`blog_url` varchar(100) NOT NULL
);
INSERT INTO user_submit VALUES(1,2138,'180cm,75kg,27,male','http:/url/bisdgboy777');
INSERT INTO user_submit VALUES(1,3214,'165cm,45kg,26,female','http:/url/dkittycc');
INSERT INTO user_submit VALUES(1,6543,'178cm,65kg,25,male','http:/url/tigaer');
INSERT INTO user_submit VALUES(1,4321,'171cm,55kg,23,female','http:/url/uhsksd');
INSERT INTO user_submit VALUES(1,2131,'168cm,45kg,22,female','http:/url/sysdney');
solution
Request Statistics:
- How many contestants are there for each age group
analyze:
- The age is placed in the field profile ; to intercept the text content, use the function: substring_index
substring_index(str,delim,count)
//Description
str: String to be processed
delm: Delimiter
count: Count
If count is a positive number, it counts from left to right
select substring_index(profile, ',', 2)
from user_submit
If count is negative, it counts from right to left
result:
SELECT
substring_index(substring_index(profile, ',', -2), ',', 1) as age,
count(device_id) as number
from
user_submit
group by
age;
Window function: SQL33 find out the students with the lowest GPA in each school
Initialization data
drop table if exists user_profile;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int
);
INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
solution
Request Statistics:
- Find the classmates with the lowest gpa in each school
- and take out the minimum gpa for each school
- The output is sorted by university in ascending order
First implement the analysis without using a window function
:
- Find out the minimum gpa for each school
select university,min(gpa) min_gpa from user_profile group by university
- Then use the user_profile table to associate the data found above, and use the inner join to match the values that match the university and min(gpa).
select
up1.device_id,
up1.university,
up1.gpa
from
user_profile up1
inner join (
select
university,
min(gpa) min_gpa
from
user_profile
group by
university
) up2 On up1.university = up2.university
and up1.gpa = up2.min_gpa
- Finally sort by university in ascending order
select
up1.device_id,
up1.university,
up1.gpa
from
user_profile up1
inner join (
select
university,
min(gpa) min_gpa
from
user_profile
group by
university
) up2 On up1.university = up2.university
and up1.gpa = up2.min_gpa
order by up1.university asc
This way of writing can be achieved, but there are loopholes, have you found it? If you find anything, let me know in the comments
Solution 2:
analyze:
- Use the window function, first group the calculation by the university field , and sort by the gpa field at the same time
Description: MySQL supports window functions since version 8.0
Syntax of window function:
<window function> over (partition by <column name for grouping>
order by <column name for sorting>)
The location of <window function> can put the following two functions:
1) Dedicated window functions, including rank, dense_rank, row_number and other dedicated window functions to be mentioned later.
2) Aggregation functions, such as sum.avg, count, max, min, etc.
- Description of the special window function rank : The RANK function returns the rank of the current row in the partition. If there is data with the same ranking, the subsequent rankings will jump
If there is a row with a tie, it will occupy the position of the next rank. For example, the normal ranking is 1, 2, 3, 4, but now the top 3 are tied, and the result is: 1, 1, 1, 4.
- Description of dedicated window function ROW_NUMBER : The ROW_NUMBER function can assign a sequence number to each row of data in the partition, and the sequence number starts from 1.
Ties are not considered. For example, the top 3 are tied, and the ranking is normal 1, 2, 3, and 4.
- Description of the dedicated window function dense_rank : There are data with the same rank, and the subsequent ranking is also a continuous value.
For example, the normal ranking is 1, 2, 3, 4, but now the top 3 are tied, and the result is: 1, 1, 1, 2.
SELECT
*,
ROW_NUMBER() over (
PARTITION BY university
ORDER BY
gpa
) AS rn
FROM
user_profile
- As shown in the figure, at this time, you only need to take out the one whose rn is 1.
SELECT
device_id,
university,
gpa
FROM
(
SELECT
*,
ROW_NUMBER() over (
PARTITION BY university
ORDER BY
gpa
) AS rn
FROM
user_profile
) AS temp
WHERE
temp.rn = 1
Summary of window functions
1) It has the functions of grouping (partition by) and sorting (order by) at the same time
2) will not reduce the number of rows in the original table
I'm Xuzhu, see you tomorrow~