The difference and application of IN and exists in Sql statement

Replace IN with EXISTS and NOT IN with NOT EXISTS:


In many queries based on underlying tables, it is often necessary to join another table in order to satisfy one condition. In this case, using EXISTS (or NOT EXISTS ) will usually improve the efficiency of the query. In subqueries, NOT IN clause will perform an internal sort and merge. In either case, NOT IN is the least efficient (since it performs a full table traversal of the tables in the subquery). To avoid using NOT IN, we can Rewrite it as Outer Joins or NOT EXISTS.


  table display

    First, the two tables involved in the query, a user and an order table, the contents of the specific tables are as follows:

    user table:

    

    order table:

    

 

  in

    Determines whether the given value matches values ​​in a subquery or list. When querying, first query the table of the subquery, then do a Cartesian product of the inner table and the outer table, and then filter according to the conditions. Therefore, when the internal table is relatively small, the speed of in is faster.

    The specific sql statement is as follows:

copy code
 1 SELECT
 2     *
 3 FROM
 4     `user`
 5 WHERE
 6     `user`.id IN (
 7         SELECT
 8             `order`.user_id
 9         FROM
10             `order`
11     )
copy code

    This statement is very simple. The data of the user_id found by the subquery is matched with the id in the user table and the result is obtained. The execution result of this statement is as follows:

    

    What is its execution process like? Let's look together.

    First, inside the database, query the subquery and execute the following code:

        SELECT
            `order`.user_id
        FROM
            `order`

    After execution, the result is as follows:

    

    At this point, a Cartesian product of the query result and the original user table is performed, and the result is as follows:

    

    此时,再根据我们的user.id IN order.user_id的条件,将结果进行筛选(既比较id列和user_id 列的值是否相等,将不相等的删除)。最后,得到两条符合条件的数据。
    

  exists

    指定一个子查询,检测行的存在。遍历循环外表,然后看外表中的记录有没有和内表的数据一样的。匹配上就将结果放入结果集中。

    具体sql语句如下:

copy code
 1 SELECT
 2     `user`.*
 3 FROM
 4     `user`
 5 WHERE
 6     EXISTS (
 7         SELECT
 8             `order`.user_id
 9         FROM
10             `order`
11         WHERE
12             `user`.id = `order`.user_id
13     )
copy code

    这条sql语句的执行结果和上面的in的执行结果是一样的。

    

    但是,不一样的是它们的执行流程完全不一样:

    使用exists关键字进行查询的时候,首先,我们先查询的不是子查询的内容,而是查我们的主查询的表,也就是说,我们先执行的sql语句是:

     SELECT `user`.* FROM `user` 

    得到的结果如下:

    

    然后,根据表的每一条记录,执行以下语句,依次去判断where后面的条件是否成立:

copy code
EXISTS (
        SELECT
            `order`.user_id
        FROM
            `order`
        WHERE
            `user`.id = `order`.user_id
    )
copy code

    如果成立则返回true不成立则返回false。如果返回的是true的话,则该行结果保留,如果返回的是false的话,则删除该行,最后将得到的结果返回。

  区别及应用场景

    The difference between in and exists: If the result set obtained by the subquery has fewer records, in should be used when the table in the main query is large and has an index. On the contrary, if the outer main query has fewer records, the table in the subquery Use exists when it is large and has an index. In fact, we distinguish between in and exists mainly because of the change of driving order (this is the key to performance change). If it is exists, then the outer table is the driving table, and it is accessed first. If it is IN, then the subquery is executed first, so We will aim to drive the fast return of the table, then the relationship between the index and the result set will be considered, and NULL will not be processed during IN.

    in is a hash connection between the outer and inner tables, and exists is a loop for the outer, and the inner table is queried every time the loop is looped. The long-standing belief that exists is more efficient than in is inaccurate.

  not in 和not exists

    If the query statement uses not in, then both the inner and outer tables perform a full table scan without using the index; and the subquery of not extsts can still use the index on the table. So no matter the size of the table, using not exists is faster than not in.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324598114&siteId=291194637