codestings codestings - 26 days ago 9
MySQL Question

very slow search and update database operation

i have a table "table1" which has almost 400,000 records. There is another table "table2" which has around 450,000 records.

I need to delete all the rows in table1 which are duplicate in table2. I been trying to do it with php and the script was running for hours and not completed yet. Does it really takes that much time?


field asin is varchar(20) in table1

field ASIN is Index and char(10) in table2


$duplicat = 0;
$sql="SELECT asin from asins";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
$ASIN = $row['asin'];
$sql2 = "select id from asins_chukh where ASIN='$ASIN' limit 1";
$result2 = $conn->query($sql2);
if ($result2->num_rows > 0) {
$duplicat++;
$sql3 = "UPDATE `asins` SET `duplicate` = '1' WHERE `asins`.`asin` = '$ASIN';";
$result3 = $conn->query($sql3);
if($result3) {
echo "duplicate = $ASIN <br/>";
}
}
}
}

echo "totaal :$duplicat";

Answer

u can run one single sql command, instead of a loop, something like:

update table_2 t2 
set t2.duplicate = 1
where exists (
    select id 
    from table_1 t1 
    where t1.id = t2.id);

Warning! i didn't test the sql above, so you may need to verify the syntax.

For such kind of database operation, using php to loop and join is never a good idea. Most of the time will be wasted on network data transfer between your php server and mysql server.

If even the the above sql takes too long, you can consider limiting the query set with some range. Something like:

update table_2 t2 
set t2.duplicate = 1
where exists (
    select id 
    from table_1 t1 
    where t1.id = t2.id
      and t2.id > [range_start] and t2.id < [range_end] );

This way, you can kick of several updates running in parallel