- 单列索引和复合索引的选择
问题SQL:
mysql>explain SELECT
product_subject_rule_id,
property_type,
property_value,
product_subject_id,
submited_num,
created_at,
updated_at,
enabled
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
AND submited_num = 1
AND enabled = 1
);
可以看到执行计划是全表扫描,性能并不好,加索引可以提高性能。但是sql中where过滤条件有三个需要怎样判断加单列的索引还是复合索引比较好呢?单列索引应该加在哪个列上呢?
复合索引相比单列索引会占用更多的字段也相应的会浪费更多的资源。所以在索引选择上需要考虑待加索引的列的对于整个SQL来讲的过滤性如何。
mysql>select count(*) from product_subject_rule
+--------------------+
| count(*) |
+--------------------+
| 73768 |
+--------------------+
返回行数:[1],耗时:18 ms.
mysql>select count(distinct product_subject_id) from product_subject_rule;
+----------------------------------------------+
| count(distinct product_subject_id) |
+----------------------------------------------+
| 14682 |
+----------------------------------------------+
返回行数:[1],耗时:33 ms.
可以看到总数据量为7万行左右,我们的目标列去重后为1万5千行左右。过滤性还是很好的。或者使用group by可以更加直观的看出来过滤性能。
mysql>select product_subject_id,count(*) from product_subject_rule group by product_subject_id order by count(*) desc limit 10;
+------------------------------+--------------------+
| product_subject_id | count(*) |
+------------------------------+--------------------+
| 14634 | 95 |
| 26 | 82 |
| 14630 | 60 |
| 14657 | 45 |
| 14627 | 29 |
| 14650 | 26 |
| 31 | 25 |
| 14658 | 21 |
| 14659 | 20 |
| 27 | 17 |
+------------------------------+--------------------+
返回行数:[50],耗时:44 ms.
现在基本可以确定product_subject_id是索引列的候选列之一,接下来确定一下是否需要复合索引
mysql>select submited_num,count(*) from product_subject_rule group by submited_num order by count(*) desc limit 10;
+------------------------+--------------------+
| submited_num | count(*) |
+------------------------+--------------------+
| 1 | 73212 |
| 2 | 137 |
| 3 | 77 |
| 4 | 62 |
| 5 | 41 |
| 6 | 37 |
| 8 | 24 |
| 7 | 23 |
| 9 | 18 |
| 10 | 16 |
+------------------------+--------------------+
返回行数:[10],耗时:34 ms.
mysql>select enabled,count(*) from product_subject_rule group by enabled order by count(*) desc limit 10;
+-------------------+--------------------+
| enabled | count(*) |
+-------------------+--------------------+
| 1 | 73371 |
| 0 | 397 |
+-------------------+--------------------+
返回行数:[2],耗时:40 ms.
#可以看到存在比较严重的数据倾斜,成为索引列其效果也不是很理想
还有一个简单的办法就是分别测试带剩余条件和不带剩余条件的查询返回数据量的对比,如果相差不大那么说明其过滤性并不是很好,也就不需要成为索引列。
如下所示:
mysql>SELECT
count(*)
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
AND submited_num = 1
AND enabled = 1
);
+--------------------+
| count(*) |
+--------------------+
| 1 |
+--------------------+
返回行数:[1],耗时:28 ms.
mysql>SELECT
count(*)
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
);
+--------------------+
| count(*) |
+--------------------+
| 1 |
+--------------------+
返回行数:[1],耗时:19 ms.
#前后结果没变化也可以判断出来
最终选择增加单列索引,
alter table `newtest-product`.`product_subject_rule` add index `pro_sub_rul_idx` (`product_subject_id`)