Technical Blog Post
Abstract
Reducing Backup Runtimes by Redistributing Tables over Table Spaces
Body
Problem Description
DB2 Backup is able to use IO parallelism by reading and backing up multiple table spaces at the same time. Number of concurrent threads reading the table spaces (called "buffer manipulators", with db2bm.X EDU name, EDU stands for Engine Dispatchable Unit) is controlled by "PARALLELISM" keyword. By default the value computes based on the size of utility heap, number of table spaces and CPUs.
Each table space in a DB2 database is read by one single buffer manipulator (db2bm EDU). The backup starts with the largest table space. As soon as the backup of one table space is completed successfully, the next table space is read until all table spaces are backed up.
It is possible that the largest table space has reached a size that is larger than all other table spaces together. This leads to a situation that this table space determines the complete backup runtime.
2016-09-07-08.44.18.108926+120 E5729808E1726 LEVEL: Info
PID : 8829 TID : 139796638852864 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-37 APPID: *LOCAL.db2inst1.160907064019
AUTHID : DB2INST1 HOSTNAME: XXX.XXX.XXX
EDUID : 22 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1212 bytes
Parallelism = 5
Number of buffers = 6
Buffer size = 3674112 (897 4kB pages)
BM# Total I/O MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 238.01 230.23 0.21 0.01 16701 58443
001 238.01 4.30 0.00 233.65 67 233
002 238.01 0.02 0.00 237.93 1 1
003 238.01 0.09 0.00 237.89 6 19
004 238.01 16.84 0.01 220.63 1180 4129
--- -------- -------- -------- -------- -------- --------
TOT 1190.06 251.50 0.22 930.14 17955 62826
MC# Total I/O MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 238.01 0.05 237.81 0.00 5988 20974
001 238.01 0.05 237.79 0.01 5985 20967
002 238.01 0.05 237.79 0.01 5985 20967
--- -------- -------- -------- -------- -------- --------
TOT 714.04 0.17 713.40 0.02 17958 62909
As you can see buffer manipulator 000 is doing most of the I/Os (note "I/O" column - time spent doing IO and "MBytes" - number of megabytes read) whereas the other manipulators are waiting the most of the time ("WaitQ", time when given db2bm thread is just waiting for others to complete)
The reason for this in this example is the following imbalance of the sizes of the table spaces:
$ db2 "select substr(tbsp_name,1,16) tbsp_name, tbsp_total_size_kb from sysibmadm.mon_tbsp_utilization order by tbsp_total_size_kb desc"
TBSP_NAME TBSP_TOTAL_SIZE_KB
---------------- --------------------
USERSPACE1 59867136
SYSCATSPACE 262144
IBMDB2SAMPLEXML 32768
SYSTOOLSPACE 32768
IBMDB2SAMPLEREL 13056
TEMPSPACE1 8
6 record(s) selected.
As the result, there is little benefit from backup parallelism because there is a single table space that contains majority of the data.
The following tables reside in the table space USERSPACE1:
$ db2 "select substr(tabname,1,20) as tabname, fpages from syscat.tables where tbspace = 'USERSPACE1'"
TABNAME FPAGES
-------------------- --------------------
DATA1 3085586
DATA2 6426
DATA3 44978
DATA4 96384
DATA5 128513
DATA6 3599640
6 record(s) selected.
To relax this situation slightly, this table space can be split and some of the tables, namely DATA5 and DATA3 could be moved to a new table space. This would have the size of the table space USERSPACE1 and should limit the backup runtime. In this example it would even be better to split the table space into three.
Create a New Table Space
db2look -d <dbname> -l
CREATE LARGE TABLESPACE "USERSPACE1" IN DATABASE PARTITION GROUP IBMDEFAULTGROUP
PAGESIZE 8192 MANAGED BY AUTOMATIC STORAGE
USING STOGROUP "IBMSTOGROUP"
AUTORESIZE YES
INITIALSIZE 32 M
MAXSIZE NONE
EXTENTSIZE 4
PREFETCHSIZE AUTOMATIC
BUFFERPOOL "IBMDEFAULTBP"
DATA TAG INHERIT
OVERHEAD INHERIT
TRANSFERRATE INHERIT
DROPPED TABLE RECOVERY ON;
It may be possible that the table space was part of the create database statement. If so you can find the parameters in the output of the following command:
db2look -d <dbname> -createdb
Now the table space USERSPACE2 can be created in the same way as USERSPACE1.
Moving Tables
Now it is time to move tables from the old table space USERSPACE1 to the new table space USERSPACE2. For this purpose the procedure ADMIN_MOVE_TABLE is used. This procedure takes the following parameters:
>>-ADMIN_MOVE_TABLE--(--tabschema--,--tabname--,---------------->
>--data_tbsp--,--index_tbsp--,--lob_tbsp--,--------------------->
>--organize_by_clause--,--partkey_cols--,--data_part--,--------->
.---------.
V |
---coldef--,----options-+--,--operation--)---------------------><
So in this case for table DATA5 the following command would be issued:
$ db2 “CALL SYSPROC.ADMIN_MOVE_TABLE(
'TAB1',
'DATA5',
'USERSPACE2',
'USERSPACE2',
'USERSPACE2',
'',
'',
'',
'',
'',
'')”
This command moves table DATA5 to the new table space USERSPACE2. Note that other table space can be specified for index and lob table spaces. The table DATA3 is moved accordingly.
CALL SYSPROC.ADMIN_MOVE_TABLE( 'TAB1', 'DATA6', 'USERSPACE2', 'USERSPACE2', 'USERSPACE2', '', '', '', '', '', 'MOVE')
Result set 1
--------------
KEY VALUE
-------------------------------- ----------------------------------------------------
AUTHID DB2INST1
CLEANUP_END 2016-09-07-12.28.04.948662
CLEANUP_START 2016-09-07-12.28.04.295206
COPY_END 2016-09-07-12.00.20.117581
COPY_OPTS
COPY_START 2016-09-07-09.23.54.112229
COPY_TOTAL_ROWS 95724175
INDEXNAME
INDEXSCHEMA
INIT_END 2016-09-07-09.23.54.014399
INIT_START 2016-09-07-09.23.52.670441
ORIGINAL_TBLSIZE 29491200
REPLAY_END 2016-09-07-12.28.04.037688
REPLAY_START 2016-09-07-12.00.20.118824
REPLAY_TOTAL_ROWS 0
REPLAY_TOTAL_TIME 1
STATUS COMPLETE
SWAP_END 2016-09-07-12.28.04.217737
SWAP_RETRIES 0
SWAP_START 2016-09-07-12.28.04.079908
UTILITY_INVOCATION_ID 0100000009000000080000000000000000002016090709235401723700000000
VERSION 11.01.0000
22 record(s) selected.
Return Status = 0
Shrinking the old table space
db2 alter tablespace USERSPACE1 reduce max
$ db2 "SELECT * FROM TABLE(SYSPROC.MON_GET_EXTENT_MOVEMENT_STATUS('', -1))"
TBSP_NAME TBSP_ID CURRENT_EXENT LAST_EXTENT NUM_EXTENTS_MOVED NUM_EXTENTS_LEFT TOTAL_MOVE_TIME
--------------- ------- ------------- ----------- ----------------- ---------------- ---------------
SYSCATSPACE 0 -1 -1 -1 -1 -1
TEMPSPACE1 1 -1 -1 -1 -1 -1
USERSPACE1 2 137038 137039 308 15761 7607
IBMDB2SAMPLEREL 3 -1 -1 -1 -1 -1
IBMDB2SAMPLEXML 4 -1 -1 -1 -1 -1
SYSTOOLSPACE 5 -1 -1 -1 -1 -1
USERSPACE2 6 -1 -1 -1 -1 -1
7 record(s) selected.
Results
The sizes of the table spaces of the database look like the following now:
$ db2 "select substr(tbsp_name,1,16) tbsp_name, tbsp_total_size_kb from sysibmadm.mon_tbsp_utilization order by tbsp_total_size_kb desc"
TBSP_NAME TBSP_TOTAL_SIZE_KB
---------------- --------------------
USERSPACE1 31047424
USERSPACE2 29523968
SYSCATSPACE 262144
IBMDB2SAMPLEXML 32768
SYSTOOLSPACE 32768
IBMDB2SAMPLEREL 13056
TEMPSPACE1 8
7 record(s) selected.
The backup statistics show the following picture now:
2016-09-07-13.37.26.511820+120 E5956251E1726 LEVEL: Info
PID : 8829 TID : 139794881439488 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-448 APPID: *LOCAL.db2inst1.160907113403
AUTHID : DB2INST1 HOSTNAME: XXX.XXX.XXX
EDUID : 651 EDUNAME: db2agent (SAMPLE) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1212 bytes
Parallelism = 5
Number of buffers = 6
Buffer size = 3674112 (897 4kB pages)
BM# Total I/O MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 202.43 2.94 0.00 199.42 69 233
001 202.43 117.76 0.08 80.94 8663 30319
002 202.43 198.90 0.08 0.02 8232 28811
003 202.43 0.10 0.00 202.29 6 19
004 202.43 0.11 0.00 202.26 4 13
--- -------- -------- -------- -------- -------- --------
TOT 1012.15 319.82 0.16 684.95 16974 59397
MC# Total I/O MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 202.43 0.03 202.29 0.00 5659 19821
001 202.43 0.03 202.28 0.01 5658 19821
002 202.43 0.03 202.28 0.01 5660 19828
--- -------- -------- -------- -------- -------- --------
TOT 607.29 0.11 606.85 0.02 16977 59471
At the beginning, the backup was taken in 238 seconds. The runtime was decreased to 202 seconds only be re-balancing the data in the table spaces. The impact is even higher on enterprise server systems.
References
UID
ibm11140676