[SQL should know and know] Row and column conversion (1) • MySQL version

Please add a picture description

Welcome to the blog of a programmer who loves books but never loses . This blog is dedicated to sharing knowledge and learning and communicating with more people

This article is included in the column of SQL should know and meet . This column is mainly used to record some learning about databases. There are basic and advanced, including MySQL and Oracle.

Please add a picture description

1. MySQL row and column conversion

1. Preparing for operation

  • first build a table
create table table_grade(
id int,
user_name varchar(20),
course varchar(10),
score decimal(5,2)
);
  • Practice three insertions
insert into table_grade(id,user_name,course,score) values (1,'张龙','语文','78'),(2,'张龙','数学','95'),(3,'张龙','英语','81');
insert into table_grade values(4,'赵虎','语文','97'),(5,'赵虎','数学','78'),(6,'赵虎','英语','91');
insert into table_grade set id = 7,user_name = '王五',course = '语文',score = '81';
insert into table_grade set id = 8,user_name = '王五',course = '数学',score = '55';
insert into table_grade set id = 9,user_name = '王五',course = '英语',score = '75';
insert into table_grade values(10,'马六','语文','87'),(11,'马六','数学','65'),(12,'马六','英语','75');
  • modify column type
alter table table_grade modify score decimal;  -- decimal默认10位整数

2. Row to column

1.1 Why perform row-to-column conversion?

  • Some of the data used in reality are detailed data/accounts
    Username Product purchase time
    A Mobile phone 4.20
    AU disk 4.21
    B ...
    B ...

    • As above, the display is detailed data. Each user has multiple purchase records of goods. This kind of detailed data is called [multi-user set]. Often we want to change the way of display, because each user has multiple data. I want to turn this kind of multi-user data into single-user data, that is, [single-user set], as follows
      Username Last purchase of goods Last two purchases of goods AU disk Mobile phone
      B ... ...
      • Obviously, a single-user set displays more information. For the multi-user set above, if a user has too many pieces of information, it will look messy, so it needs to be changed to a single-user set
      • There is a problem with the single user set, that is, the columns need to be continuously increased, and the storage overhead will be larger, because it needs to allocate space for many columns
      • Detailed data is often displayed in the form of a vertical table as shown above, so it is also called [vertical table], because its data is displayed in a stacked manner (a user has multiple records), that is, displayed in a stack manner; a single user set It is displayed in the form of columns, also known as [horizontal table], so the method of changing a multi-user set into a single user set is also called [row to column] or [unstack] process
      • The single-user set data after row-to-column conversion is more suitable for data analysis. For example, in the above single-user set, you can intuitively see the last or two most recent purchase data, and if you look at the above multi-user set, you can also see You have to compare according to time to see some hidden information
      • Column conversion is to convert single user set back to multi-user set

1.2 Row to column has two meanings: 1. Row to column in the table 2. Row to column across tables

  • The cross-table is to assume that in addition to the flow of purchasing goods, there are other flows that are also recorded together with the user ID, such as the user's point table, and the above single-user set can also be associated to increase the number of columns in the single-user set. The process of often transferring rows across tables and increasing the number of columns is called the process of widening
  • The above shopping list is called [fact table], and multiple fact tables are associated through keys, which will widen the fields (dimensions + indicators)
update table_grade set id = 4 where user_name = '马六'

alter table table_grade add oid varchar(10)

alter table table_grade modify oid varchar(50)

update table_grade set oid = uuid();  -- 设置一个代理键

alter table table_grade rename column id to user_id -- 修改某列名字

alter table table_grade modify oid varchar(50) first  -- 修改某列位置

show columns from table_grade;   -- 查看表各列

show keys from table_grade;   -- 查看索引

3. The idea of ​​​​turning rows into columns: fewer rows and more columns

3.1 How to perform row-to-column conversion: adding fields and performing aggregation (fewer rows)

  • Turn multiple users into single users through aggregation, which aggregation function to use is related to the data

  • Use max() to select the largest one

  • The current table is as follows
    Zhang Long Chinese 78
    Zhang Long Mathematics 95
    Zhang Long English 81
  • We want to display it as:
    Chinese Mathematics English
    Zhang Long 78 95 81
  • In this case, Zhang Long Chinese can only have one grade, if there are multiple grades
    • ** Idea 1: Use a list to put multiple grades together**
      Chinese Mathematics English
      Zhang Long 78,88 95 81
    • ** Idea 2: Take a recent test result (this requires adding a time column to the table)**
    • ** Idea 3: If the two scores are repeated, for example, both are 100, you can use max or min**
      Chinese Math English
      Zhang Long 100 95 81
      • If you use sum, the statistical result becomes 200
        Chinese Mathematics English
        Zhang Long 200 95 81
        - add a time limit
alter table table_grade add column exam_date date after score;
  • set random time
update table_grade set exam_date = date_format(from_unixtime(  -- from_unixtime() 将时间戳转换为日期,date_format()设置一下日期格式
unix_timestamp('2023-01-01')    -- 将日期转为一个时间戳,此处是用1月1号加随机的天数(由于天数是1月1到4月28,所以最终的结果不会超过4月28)
+ floor( rand() *               -- rand() 随机数[0,1),floor() 向下取整
		(unix_timestamp('2023-04-28') - unix_timestamp('2023-01-01') + 1) -- 4月28减去1月1号的天数,因为rand的范围是[0,1),所以后面+1是为了保证可以取得4月28
	) 
	),'%Y-%m-%d');

4. Practical operation of row-to-column conversion

  • Premise: Assume that Zhang Long only took the test once for each subject, and transfer the row to column

    • In order to see the difference, delete Wang Wu’s English and Ma Liu’s Chinese
    delete from table_grade where user_name = '王五' and course = '英语';
    delete from table_grade where user_name = '马六' and course = '语文';
    

4.1 Universal row-to-column conversion (both Mysql and Oracle can be used)

select user_id '学生ID', 
			 max(case when course = '语文' then score else null end) '语文',  -- 因为只有一个成绩,max min 没区别; else null 可以省略,默认其他情况就是null
			 max(case when course = '数学' then score end) '数学',
			 max(case when course = '英语' then score end) '英语'
from table_grade
group by user_id  -- 用了聚合函数,那就用group by 去显示

4.1.1 Want to add student names to the results

  • If you directly add user_name in group by, it can be executed, so that the granularity of group by will become finer, and the fine-grained fields can have coarse-grained fields (here refers to select, because there is only one field user_id in select)
  • If user_name is added to the select clause, it can be executed in mysql (mysql8 here is wrong, maybe 8 has modified this), but it cannot be executed in orale. Even if it can be executed in mysql, it means to group by user_id, and then bring out a finer division of user_id+user_name, and one user_id may correspond to multiple user_names, which one should be taken? mysql will automatically take the first user_name in the group
  • Do not add names in group by , because the names may be duplicated
4.1.1.1 Method 1 of adding name:
select user_id '学生ID',
			 (select max(user_name) from table_grade where user_id = t.user_id) user_name, -- 使用相关子查询(关联子查询)
			 max(case when course = '语文' then score end) '语文',
			 max(case when course = '数学' then score end) '数学',
			 max(case when course = '英语' then score end) '英语'
from table_grade t
group by user_id;
  • The aggregate function in the correlated subquery is different from the external aggregate function. The external aggregate function is to find the maximum value according to all groups at the same time. The aggregate function in the correlated subquery is to pass in a value from the outer layer, and the inner layer to perform a query.
  • The usual left association and right association may produce a Cartesian product, but the associated subquery will not produce a Cartesian product, because it passes in a value every time the outer query, so it is always one-to-many from the outside to the inside, no will result in a Cartesian product
  • Adding max() to the above related subquery is to prevent errors, because the same ID in the table corresponds to multiple names. If max() is not added, an error will be reported: , that is, the subquery returns multiple rows Subquery returns more than 1 row. Therefore, a function needs to be added here for processing. As for which function to add, it depends on the demand, which will be discussed later
  • Another way to write the name is select ... from in ——> It is best to use in as a small combination of in(1,2). If you nest select queries in in, this is not easy to understand, the number of layers Too much

Please add a picture description

Guess you like

Origin blog.csdn.net/qq_40332045/article/details/131406975