One: scene description
Tables A and B are both structured as follows:
logo | name | delete state |
id | na'me | delete |
Now we want to find out the content of the name field in table A in table B, and then find out the repeated content of name, and logically delete all the rows with the same name except the largest id
2: Find duplicate content
SELECT * FROM A WHERE A.NAME IN( SELECT A.NAME FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1 );
Three: Find the row with the largest ID in the duplicate content
SELECT * FROM A WHERE A.NAME IN( SELECT A.NAME FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1) AND A.ID IN( SELECT MAX(A.ID) FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1 );
Four: update with the update method
UPDATE A SET A.DELETE=1 WHERE A.NAME IN( SELECT A.NAME FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1) AND A.ID IN( SELECT MAX(A.ID) FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1 );
This method is extremely inefficient, and it takes 80 minutes to update 30,000 pieces of data from 30W pieces of data.
Five: When there is permission to create a table, the data update can be realized by creating a new transfer table. The specific steps are as follows:
1. Use the above query statement to query the content that needs to be updated
2. Create a new table C, copy the structure of table A, and accept the data from the super information in step 1
3. Update table A through table C The code is as follows:
UPDATE A SET A.DELETE=1 WHERE A.ID IN( SELECT C.ID FROM C );
By creating a new table update, the execution time of the code is 8S, but the premise is that you need to have the permission to create a table
Six: use merge into to update
merge into is the most efficient in the actual update, and does not occupy other space and does not require other permissions, but the syntax is more complex, and it is easy to misuse the table. The specific code is as follows
MERGE INTO A USING( SELECT A.NAME,MAX(A.ID) AS ID FROM A,B WHERE A.NAME=B.NAME AND A.DELETE=0 GROUP BY A.NAME HAVING COUNT(A.NAME)>1 ) Q ON(A.NAME=Q.NAME AND A.ID<>Q.ID) WHEN MATCHED THEN UPDATE SET A.DELETE=1;
The same amount of data execution time 3S