Mysql Performance: Which of the query will take more time?
I have two tables: 1. user table with around 10 million data columns: token_type, cust_id(Primary) 2. pm_tmp table with 200k data columns: id(Primary | AutoIncrement), user_id
user_id is foreign key for cust_id
update user set token_type='PRIME' where cust_id in (select user_id from pm_tmp where id between 1 AND 60000);
2nd Approach/Query: Here we will run below query for different cust_id individually for 60000 records:
作者: Prashant Mudgal 的来源 发布者： 2017 年 12 月 27 日
update user set token_type='PRIME' where cust_id='1111110';
Theoretically time will be less for the first query as it involves less number of commits and in turn less number of index rebuilds. But, I would recommend to go with the second option since its more controlled and will appear to be less in time and you can event think about executing 2 seperate sets parellelly.
Note: The first query will need sufficient memory provisioned for mysql buffers to get it executed quickly. Second query being set of independent single transaction queries, they will need comparatively less memory and hence will appear faster if executed on limited memory environments.
Well, you may rewrite the first query this way too.
发布者: 2017 年 12 月 27 日
update user u, pm_tmp p set u.token_type='PRIME' where u.cust_id=p.id and p.in <60000;
Some versions of MySQL have trouble optimizing
in. I would recommend:
update user u join pm_tmp pt on u.cust_id = pt.user_id and pt.id between 1 AND 60000 set u.token_type = 'PRIME' ;
(Note: This assumes that
cust_id is not repeated in
pm_temp. If that is possible, you will want a
select distinct subquery.)
Your second version would normally be considerably slower, because it requires executing thousands of queries instead of one. One consideration might be the
update. Perhaps the logging and locking get more complicated as the number of updates increases. I don't actually know enough about MySQL internals to know if this would have a significant impact on performance.
IN ( SELECT ... ) is poorly optimized. (I can't provide specifics because both
IN have been better optimized in some recent version(s) of MySQL.) Suffice it to say "avoid
IN ( SELECT ... )".
Your first sentence should say "rows" instead of "columns".
Back to the rest of the question. 60K is too big of a chunk. I recommend only 1000. Aside from that, Gordon's Answer is probably the best.
But... You did not use
OFFSET; Do not be tempted to use it; it will kill performance as you go farther and farther into the table.
COMMIT after each chunk. Else you build up a huge undo log; this adds to the cost. (And is a reason why 1K is possibly faster than 60K.)
But wait! Why are you updating a huge table? That is usually a sign of bad schema design. Please explain the data flow.
Perhaps you have computed which items to flag as 'prime'? Well, you could keep that list around and do
JOINs in the
SELECTs to discover prime-ness when reading. This completely eliminates the
UPDATE in question. Sure, the
JOIN costs something, but not much.