Improving performance of updating large table with join

user976315

Currently I have a table with schema as follows:

 mData | CREATE TABLE `mData` (
   `m1` mediumint(8) unsigned DEFAULT NULL,
   `m2` smallint(5) unsigned DEFAULT NULL,
   `m3` bigint(20) DEFAULT NULL,
   `m4` tinyint(4) DEFAULT NULL,
   `m5` date DEFAULT NULL,
   KEY `m_m1` (`m1`) USING HASH,
   KEY `m_date` (`m5`),
   KEY `m_m2` (`m2`),
   KEY `m_combined` (`m1`,`m2`,`m5`),
   KEY `m1_tradeday` (`m1`,`m5`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
 /*!50100 PARTITION BY RANGE ( YEAR(m5))
 SUBPARTITION BY HASH (MONTH(m5))
 (PARTITION p2013 VALUES LESS THAN (2014)
  (SUBPARTITION dec_2013 ENGINE = InnoDB,
   SUBPARTITION jan_2013 ENGINE = InnoDB,
   SUBPARTITION feb_2013 ENGINE = InnoDB,
   SUBPARTITION mar_2013 ENGINE = InnoDB,
   SUBPARTITION apr_2013 ENGINE = InnoDB,
   SUBPARTITION may_2013 ENGINE = InnoDB,
   SUBPARTITION jun_2013 ENGINE = InnoDB,
   SUBPARTITION jul_2013 ENGINE = InnoDB,
   SUBPARTITION aug_2013 ENGINE = InnoDB,
   SUBPARTITION sep_2013 ENGINE = InnoDB,
   SUBPARTITION oct_2013 ENGINE = InnoDB,
  SUBPARTITION nov_2013 ENGINE = InnoDB),
  PARTITION p2014 VALUES LESS THAN (2015)
  (SUBPARTITION dec_2014 ENGINE = InnoDB,
   SUBPARTITION jan_2014 ENGINE = InnoDB,
   SUBPARTITION feb_2014 ENGINE = InnoDB,
   SUBPARTITION mar_2014 ENGINE = InnoDB,
   SUBPARTITION apr_2014 ENGINE = InnoDB,
   SUBPARTITION may_2014 ENGINE = InnoDB,
   SUBPARTITION jun_2014 ENGINE = InnoDB,
   SUBPARTITION jul_2014 ENGINE = InnoDB,
   SUBPARTITION aug_2014 ENGINE = InnoDB,
   SUBPARTITION sep_2014 ENGINE = InnoDB,
   SUBPARTITION oct_2014 ENGINE = InnoDB,
   SUBPARTITION nov_2014 ENGINE = InnoDB),
  PARTITION p2015 VALUES LESS THAN (2016)
  (SUBPARTITION dec_2015 ENGINE = InnoDB,
   SUBPARTITION jan_2015 ENGINE = InnoDB,
   SUBPARTITION feb_2015 ENGINE = InnoDB,
   SUBPARTITION mar_2015 ENGINE = InnoDB,
   SUBPARTITION apr_2015 ENGINE = InnoDB,
   SUBPARTITION may_2015 ENGINE = InnoDB,
   SUBPARTITION jun_2015 ENGINE = InnoDB,
   SUBPARTITION jul_2015 ENGINE = InnoDB,
   SUBPARTITION aug_2015 ENGINE = InnoDB,
   SUBPARTITION sep_2015 ENGINE = InnoDB,
   SUBPARTITION oct_2015 ENGINE = InnoDB,
   SUBPARTITION nov_2015 ENGINE = InnoDB),
  PARTITION p2016 VALUES LESS THAN (2017)
  (SUBPARTITION dec_2016 ENGINE = InnoDB,
   SUBPARTITION jan_2016 ENGINE = InnoDB,
   SUBPARTITION feb_2016 ENGINE = InnoDB,
   SUBPARTITION mar_2016 ENGINE = InnoDB,
   SUBPARTITION apr_2016 ENGINE = InnoDB,
   SUBPARTITION may_2016 ENGINE = InnoDB,
   SUBPARTITION jun_2016 ENGINE = InnoDB,
   SUBPARTITION jul_2016 ENGINE = InnoDB,
   SUBPARTITION aug_2016 ENGINE = InnoDB,
   SUBPARTITION sep_2016 ENGINE = InnoDB,
   SUBPARTITION oct_2016 ENGINE = InnoDB,
   SUBPARTITION nov_2016 ENGINE = InnoDB),
  PARTITION pmax VALUES LESS THAN MAXVALUE
  (SUBPARTITION dec_max ENGINE = InnoDB,
   SUBPARTITION jan_max ENGINE = InnoDB,
   SUBPARTITION feb_max ENGINE = InnoDB,
   SUBPARTITION mar_max ENGINE = InnoDB,
   SUBPARTITION apr_max ENGINE = InnoDB,
   SUBPARTITION may_max ENGINE = InnoDB,
   SUBPARTITION jun_max ENGINE = InnoDB,
   SUBPARTITION jul_max ENGINE = InnoDB,
   SUBPARTITION aug_max ENGINE = InnoDB,
   SUBPARTITION sep_max ENGINE = InnoDB,
   SUBPARTITION oct_max ENGINE = InnoDB,
   SUBPARTITION nov_max ENGINE = InnoDB)) */ |

m1, m2, and m5 are set as index in this table, unique/primary are not applicable in my case.

As the data is getting bigger (100,000 new row a day), the update command is getting very slow.

I would like to know if there are any ways to improve the following statement.

update mData as a join (select * from mData
                        where m1 = 326 and m5 = '2015-   07-06' ) as b
            on  a.m5 > b.m5 and a.m1 = b.m1
            and a.m2 = b.m2 and a.m3 = b.m3
    set a.m4 = 0;

I am quite sure that in select statement, if I replace mData as a to (select * from mData where m1 = 326), the executive time could largely reduce (from 5 sec to less than 1 sec).

However, it is not possible to do the same in UPDATE statement.

Is there any solution for this, to speed up update?

P.S. the table has been partitioned by month(m5) and year(m5)

Here is the EXPLAIN partitions for my join query, very messy, hope you don't mind. Adding ' and a.m5 > '2015-07-06' does improve the perfomance, query time drops from 0.68 sec to 0.2 sec.

explain partitions (select * from (select * from mData where m1 = 326) as a join (select * from mData where m1 = 326 and m5= '2015-07-06') as b on  a.m5 > b.m5 and a.m1 = b.m1 and a.m2 = b.m2 and a.m3 = b.m3 and a.m5 > '2015-07-06');

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+----------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- ------------------------------+------+------------------------------------------ --------+--------------+---------+------+------+-------------------------------- + | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 358 | | | 1 | PRIMARY | | NULL | ALL | NULL | NULL | NULL | NULL | 1073 | Using where; Using join buffer | | 3 | DERIVED | mData | p2015_jul_2015 | ref | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | 8 | | 357 | Using where | | 2 | DERIVED | mData | p2013_dec_2013,p2013_jan_2013,p2013_feb_2013,p 2013_mar_2013,p2013_apr_2013,p2013_may_2013,p2013_jun_2013,p2013_jul_2013,p2013_ aug_2013,p2013_sep_2013,p2013_oct_2013,p2013_nov_2013,p2014_dec_2014,p2014_jan_2 014,p2014_feb_2014,p2014_mar_2014,p2014_apr_2014,p2014_may_2014,p2014_jun_2014,p 2014_jul_2014,p2014_aug_2014,p2014_sep_2014,p2014_oct_2014,p2014_nov_2014,p2015_ dec_2015,p2015_jan_2015,p2015_feb_2015,p2015_mar_2015,p2015_apr_2015,p2015_may_2 015,p2015_jun_2015,p2015_jul_2015,p2015_aug_2015,p2015_sep_2015,p2015_oct_2015,p 2015_nov_2015,p2016_dec_2016,p2016_jan_2016,p2016_feb_2016,p2016_mar_2016,p2016_ apr_2016,p2016_may_2016,p2016_jun_2016,p2016_jul_2016,p2016_aug_2016,p2016_sep_2 016,p2016_oct_2016,p2016_nov_2016,pmax_dec_max,pmax_jan_max,pmax_feb_max,pmax_ma r_max,pmax_apr_max,pmax_may_max,pmax_jun_max,pmax_jul_max,pmax_aug_max,pmax_sep_ max,pmax_oct_max,pmax_nov_max | ref | m_m1,m_combined,m1_m5 | m_m1 | 4 | | 1074 | Using where |

Below is the query explain asked by "Rick James"

EXPLAIN PARTITIONS select * from ccass_data where sid = 326 and trade_day = '2015-07-06';

| id | select_type | table      | partitions     | type | possible_keys                                    | key          | key_len | ref         | rows | Extra       |
 +----+-------------+------------+----------------+------+--------------------------------------------------+--------------+---------+-------------+------+-------------+
 |  1 | SIMPLE      | mData     | p2015_jul_2015 | ref  | m_m1,m_m5,m_combined,m1_m5               | m1_m5 | 8    | const,const |  357    | Using where        |

Rick James

For starters, add INDEX(m1, m5). After I see SHOW CREATE TABLE mData;, I may have other recommendations.

EDIT

Adding AND a.m5 > '2015-07-06' may get partition pruning to kick in. I don't have any experience with UPDATE and SUBPARTITION to predict.

InnoDB must have a PRIMARY KEY. Would (m1, m2, m3, m5) work as a PK?

USING HASH is ignored, since InnoDB does not implement it. It will be a BTree, which is nearly as good, anyway.

KEY `m_m1` (`m1`)

is redundant and can be dropped, since there is another (actually two) index that starts with it.

Can't you do a JOIN instead of using a subquery? (That would avoid a tmp table.)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-03-31

Comments

0 comments

TOP Ranking

Article

Improving performance of updating large table with join

Improving performance of updating large table with join

pump.io port in URL

grouping by column variables and appending a new variable based on condition

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

Group boxplot data while keeping their individual X axis labels in ggplot2 in R

Vector input in shiny R and then use it

BigQuery - concatenate ignoring NULL

Can a 32-bit antivirus program protect you from 64-bit threats

How to remove the extra space from right in a webview?

How to how increase/decrease compared to adjacent cell

android.content.Context.getSharedPreferences(java.lang.String, int)' on a null object reference id DBhandler

Getting 502 Bad Gateway Error While Deploying WordPress On Dockerized Lemp?

Type 'number' is not assignable to type 'NgIterable<any>' when trying to async observe a datasource

Check if a number is a perfect square

FFmpeg resize without upscaling

How do I display Label text character-by-character?

How to show an image in a View with ASP.NET MVC 5? (Many suggestions failed so far)

Json Schema - Conditional Evaluation with RegEx

PlayOnLinux displays weird looking window on 18.04 for MS Office installation

JMeter: Why get error when try to save test plan

Emulator wrong screen resolution in Android Studio 1.3