5. Conditionally delete duplicate columns in the table

One: scene description

Tables A and B are both structured as follows:

logo name delete state
id na'me delete

Now we want to find out the content of the name field in table A in table B, and then find out the repeated content of name, and logically delete all the rows with the same name except the largest id

2: Find duplicate content

SELECT * 
FROM A
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

Three: Find the row with the largest ID in the duplicate content

SELECT * 
FROM A
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1)
AND A.ID IN(
  SELECT MAX(A.ID)
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

Four: update with the update method

UPDATE A
SET A.DELETE=1
WHERE A.NAME IN(
  SELECT A.NAME
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1)
AND A.ID IN(
  SELECT MAX(A.ID)
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
);

This method is extremely inefficient, and it takes 80 minutes to update 30,000 pieces of data from 30W pieces of data.

Five: When there is permission to create a table, the data update can be realized by creating a new transfer table. The specific steps are as follows:

       1. Use the above query statement to query the content that needs to be updated

       2. Create a new table C, copy the structure of table A, and accept the data from the super information in step 1

       3. Update table A through table C The code is as follows:

UPDATE A
SET A.DELETE=1
WHERE A.ID IN(
  SELECT C.ID
  FROM C
);

By creating a new table update, the execution time of the code is 8S, but the premise is that you need to have the permission to create a table

Six: use merge into to update

      merge into is the most efficient in the actual update, and does not occupy other space and does not require other permissions, but the syntax is more complex, and it is easy to misuse the table. The specific code is as follows

MERGE INTO A
USING(
  SELECT A.NAME,MAX(A.ID) AS ID
  FROM A,B
  WHERE A.NAME=B.NAME
  AND A.DELETE=0
  GROUP BY A.NAME
  HAVING COUNT(A.NAME)>1
) Q
ON(A.NAME=Q.NAME AND A.ID<>Q.ID)
WHEN MATCHED THEN
UPDATE SET A.DELETE=1;

The same amount of data execution time 3S

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325036719&siteId=291194637