exist and in efficiency issues

select * from A where id in (select id from B);
select * from A where exists (select 1 from B where A.id=B.id);


1. select * from A where id in (select id from B) ;
in() is executed only once, it finds all the id fields in the B table and caches them. After that, check whether the id of table A is equal to the id in table B. If they are equal, add the records of table A to the result set until all records of table A are traversed.
Its query process is similar to the following process:
Array A=(select * from A);
Array B=(select id from B);
for(int i=0;i<A.length;i++) {
   for(int j= 0;j<B.length;j++) {
      if(A[i].id==B[j].id) {
         resultSet.add(A[i]);
         break;
      }
   }
}

It can be seen that when B It is not suitable to use in() when the table data is large, because it will traverse all the data in table B once
. For example, if table A has 10,000 records, and table B has 1,000,000 records, it is possible to traverse 10,000*1,000,000 times at most, which is inefficient. .
Another example: Table A has 10,000 records and table B has 100 records, so it is possible to traverse 10,000*100 times at most, the number of traversals is greatly reduced, and the efficiency is greatly improved.


Conclusion: in() is suitable for the case that the data in table B is smaller than that in table A.




2. select * from A where exists (select 1 from B where A.id=B.id);
exists() will execute A.length times, it does not cache the exists() result set, because the exists() result set The content is not important. The important thing is that the result set of the query statement in it is empty or non-empty. If it is empty, it will return false, and if it is not empty, it will return true.
Its query process is similar to the following process:
Array A=(select * from A);
for(int i=0;i<A.length;i++) {
   if(exists(A[i].id) { //Execute select 1 from B where B.id=A.id is there a record to return
       resultSet.add(A[i]);
   }
}

When the data of table B is larger than that of table A, exists() is suitable because it does not have so many traversal operations , you only need to execute the query again.
For example: A table has 10,000 records, B table has 1,000,000 records, then exists() will be executed 10,000 times to determine whether the id in table A is equal to the id in table B.
For example : There are 10000 records in table A and 100000000 records in table B, then exists() is still executed 10000 times, because it only executes A.length times, it can be seen that the more data in table B, the more suitable exists() is to play its effect.
Another example: table A has 10,000 records and table B has 100 records, then exists() is still executed 10,000 times, it is better to use in() to traverse 10,000*100 times, because in() is to traverse and compare in memory, and exists() needs to query the database. We all know that querying the database consumes higher performance and memory is relatively fast.

Conclusion: exists() is suitable for the case where the data in table B is larger than that in table A.

 

 

About EXISTS:
EXISTS is used to check if a subquery will return at least one row of data, the subquery doesn't actually return any data, but the value TRUE or FLASE.
EXISTS specifies a subquery that checks for the existence of rows.
Syntax: EXISTS subquery
Parameters: subquery is a restricted SELECT statement (COMPUTE clause and INTO keyword are not allowed).
Result Type: Boolean Returns TRUE if the subquery contains rows, FLASE otherwise.
Conclusion: select * from A where exists (select 1 from B where A.id=B.id)
The return value of the EXISTS (including NOT EXISTS ) clause is a boolean value. There is a subquery statement (SELECT ... FROM...) inside EXISTS, which I call the inner query statement of EXIST. The query statement within it returns a result set. The EXISTS clause returns a boolean value based on whether the result set of the query within it is empty or not.
A popular can be understood as: Substitute each row of the outer query table into the inner query as a test. If the result returned by the inner query takes a non-null value, the EXISTS clause returns TRUE, and this row can be used as the result row of the outer query. , otherwise it cannot be used as a result.
The analyzer will first look at the first word of the statement, and when it finds that the first word is the SELECT keyword, it will jump to the FROM keyword, then use the FROM keyword to find the table name and load the table into memory. Next, look for the WHERE keyword. If it is not found, return to SELECT to find the field analysis. If the WHERE is found, analyze the conditions, and then return to the SELECT analysis field after completion. Finally, a virtual table is formed that we want.
WHERE close

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324776340&siteId=291194637