Mysql5.7 order by与limit混用注意点

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/wolfchenxing/article/details/88947577

背景

近期开发过程中碰到了个BUG,即通过order by + limit对数据库表进行分页查询,在排序字段有数据重复的情况下,发现不同页的数据有重复。

原因

在MySQL 5.6的版本上,优化器在遇到order by limit语句的时候,做了一个优化,即使用了priority queue。

使用 priority queue 的目的,就是在不能使用索引有序性的时候,如果要排序,并且使用了limit n,那么只需要在排序的过程中,保留n条记录即可,这样虽然不能解决所有记录都需要排序的开销,但是只需要 sort buffer 少量的内存就可以完成排序。

之所以5.6及其之后的版本出现了分页数据重复的问题,是因为 priority queue 使用了堆排序的排序方法,而堆排序是一个不稳定的排序方法,也就是相同的值可能排序出来的结果和读出来的数据顺序不一致。

参见Mysql5.7版本的官方文档:https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html

If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

如果将LIMIT row_count与ORDER BY结合使用,MySQL会在找到排序结果的第一个row_count行后立即停止排序,而不是对整个结果进行排序。如果使用索引完成排序,则速度非常快。如果必须完成文件排序,则在找到第一个row_count之前,将选择与没有LIMIT子句的查询匹配的所有行,并对其中的大部分或全部进行排序。在找到初始行之后,MySQL不会对结果集的任何剩余部分进行排序。

此行为的一种表现形式是,带有和不带LIMIT的ORDER BY查询可能会以不同的顺序返回行。

官方给出的例子:

mysql> SELECT * FROM ratings ORDER BY category;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
|  1 |        1 |    4.5 |
|  5 |        1 |    3.2 |
|  3 |        2 |    3.7 |
|  4 |        2 |    3.5 |
|  6 |        2 |    3.5 |
|  2 |        3 |    5.0 |
|  7 |        3 |    2.7 |
+----+----------+--------+

mysql> SELECT * FROM ratings ORDER BY category LIMIT 5;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
|  1 |        1 |    4.5 |
|  5 |        1 |    3.2 |
|  4 |        2 |    3.5 |
|  3 |        2 |    3.7 |
|  6 |        2 |    3.5 |
+----+----------+--------+

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

如果ORDER BY列中的多个行具有相同的值,则服务器可以按任何顺序自由返回这些行,并且可能会根据整体执行计划的不同而不同。换句话说,这些行的排序顺序相对于无序列是不确定的。

解决方案

官方给出了解决方案:

If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic. For example, if id values are unique, you can make rows for a given category value appear in id order by sorting like this:

如果想在Limit存在或不存在的情况下,都保证排序结果相同,可以额外加一个排序条件。例如id字段是唯一的,可以考虑在排序字段中额外加个id排序去确保顺序稳定。

mysql> SELECT * FROM ratings ORDER BY category, id;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
|  1 |        1 |    4.5 |
|  5 |        1 |    3.2 |
|  3 |        2 |    3.7 |
|  4 |        2 |    3.5 |
|  6 |        2 |    3.5 |
|  2 |        3 |    5.0 |
|  7 |        3 |    2.7 |
+----+----------+--------+

mysql> SELECT * FROM ratings ORDER BY category, id LIMIT 5;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
|  1 |        1 |    4.5 |
|  5 |        1 |    3.2 |
|  3 |        2 |    3.7 |
|  4 |        2 |    3.5 |
|  6 |        2 |    3.5 |
+----+----------+--------+

猜你喜欢

转载自blog.csdn.net/wolfchenxing/article/details/88947577
今日推荐