Technical Blog Post
Abstract
Performance considerations for using Declared Global Temporary Tables.
Body
DGTT's are frequently used in stored procedure code to store and manipulate an intermediate result set.
There are however a few performance considerations which I wanted to illustrate and share.
In particular issues that are related to highly concurrent execution of DGTT's.
First of all I will start off with the baseline example, a simple stored procedure creating 5 dgtt's :
create or replace procedure dgtt1 ( out level int )
language sql
begin atomic
declare a integer;
declare b integer;
declare c integer;
declare d integer;
declare e integer;
set level = 1;
declare global temporary table t1 like sysibm.syscoldist on commit preserve rows not logged with replace;
create index session.idx1 on session.t1( name, tbname ) collect detailed statistics ;
declare global temporary table t2 like sysibm.syscoldist on commit preserve rows not logged with replace;
create index session.idx2 on session.t2( name, tbname ) collect detailed statistics ;
declare global temporary table t3 like sysibm.syscoldist on commit preserve rows not logged with replace;
create index session.idx3 on session.t3( name, tbname ) collect detailed statistics ;
declare global temporary table t4 like sysibm.syscoldist on commit preserve rows not logged with replace;
create index session.idx4 on session.t4( name, tbname ) collect detailed statistics ;
declare global temporary table t5 like sysibm.syscoldist on commit preserve rows not logged with replace;
create index session.idx5 on session.t5( name, tbname ) collect detailed statistics ;
insert into session.t5 select * from sysibm.syscoldist fetch first 5 rows only;
insert into session.t4 select * from sysibm.syscoldist fetch first 5 rows only;
insert into session.t3 select * from sysibm.syscoldist fetch first 5 rows only;
insert into session.t2 select * from sysibm.syscoldist fetch first 5 rows only;
insert into session.t1 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into a from session.t5;
select max( seqno ) into b from session.t4;
select max( seqno ) into c from session.t3;
select max( seqno ) into d from session.t2;
select max( seqno ) into e from session.t1;
end
The test I executed runs 50 simultaneous threads and each of these threads calls the procedure above 100 times.
I used these columns : TOTAL_CPU_TIME, TOTAL_WAIT_TIME, TOTAL_SECTION_TIME, TOTAL_IMPLICIT_COMPILE_PROC_TIME, TOTAL_APP_SECTION_EXECUTIONS
from the SYSPROC.MON_GET_DATABASE monitoring table function to give me an indication of performance.
This gives me cpu usage per second , the time spent executing the section , the time spent waiting, the time in compiling statements and the number of section executions
( a stored procedure will have multiple statements, hence multiple sections )
The first run gave me these numbers :
Monitoring starting at 12:08:56 PM
( all times reported in milliseconds )
cpu section wait comp time execs
661 3946 20391 863 1211
715 1076 30518 922 1294
828 1087 44350 994 1398
...
834 1125 1030 1056 1486
775 1041 980 996 1392
839 1175 1093 1060 1488
Monitoring ends at 12:10:10 PM
Summary:
58311 104786 201410 74729 104589
( The output has been abbreviated on purpose. )
We can see that per second about 1.3k individual section invocations are executed.
The number of section executions is typically larger than the number of statements, as it involves a few section invocations per statement.
e.g. when there is open / fetch /close processing. But it is a good metric for throughput of the database server.
This is our baseline performance, executing in 74 seconds.
You will note that I created indexes on the DGTT tables in this example.
By default indexes on DGTT tables are compressed. This makes sense, it might save space and I/O for larger DGTT's.
However, we created indexes on very small tables and are running this in a highly concurrent way.
This adds a lot of overhead as it performance a license check to see if compression is allowed and this is serialized.
To illustrate the difference, I altered the index definition to avoid compression by using the "compress no" option :
create index session.idx1 on session.t1( seqno ) compress no collect detailed statistics ;
Monitoring starting at 12:11:34 PM
cpu section wait comp time execs
1513 10832 54656 4107 5760
1494 12562 68547 3813 5363
1729 13772 33338 3919 5482
...
1766 12395 33236 3814 5343
1732 12913 32958 3880 5445
1683 12720 30881 4084 5702
1679 10428 22089 4056 5659
879 4386 5333 2633 3651
Monitoring ends at 12:11:54 PM
Summary:
29886 239664 652748 71665 100296
The effect is enormous.. execution has gone from 74 seconds to 20 seconds and throughput has changed from 1.3k execs on average to 5.4 k execs.
The lesson to be learned is to understand the usage of the procedure and the volume of data in the DGTT.
Avoiding creating compressed indexes on small temporary tables in procedures which are executed with high concurrency.
The next test involves the life of the DGTT. These exist for the duration of a connection or until they are explicitly dropped.
This is important for a couple of reasons.
1. If a dgtt is created using the "on commit preserve rows" option, then that DGTT will keep the rows until the connection is dropped.
With connection pools used in application servers, this could mean that the data hangs around for quite some time. This is likely not desirable.
2. If a dgtt is not explicitly dropped, it will also be dropped when the next "declare global temporary table ... " statement is seen for the same table.
This means that before the new DGTT is created, the old one is dropped.
The net result for the stored procedure execution time, when dropping a DGTT explicitly is not that different as seen by the following metrics :
Monitoring starting at 12:13:28 PM
cpu section wait comp time execs
2078 13961 94592 6951 9055
1587 10482 47026 4791 6257
1645 10479 36429 4731 6142
1645 10789 35693 4748 6180
1562 10729 36903 4780 6227
..
1786 10314 32615 5170 6706
1507 8849 22985 5051 6532
655 3019 4060 2550 3275
Monitoring ends at 12:13:50 PM
Summary:
31718 209994 751718 94785 123168
We get about the same runtime ( 2 seconds more , but that's within normal variation )
The number of execs increased though - from 5.5k to 6.x k / second.
The reason is that we added the explicitly drop table session.t<n> to the procedure.
An important reason to explicitly drop a DGTT is that otherwise the package cache is not cleared of these dgtt statement until either regular package cache maintenance
removes the statement, or until the DGTT is declared again. Not dropping this explicitly can lead to an inflated package cache.
This can actually lead to more contention in a highly concurrent environment. In the initial version of the procedure, I declared all DGTT's at the start.
Upon re-execution of the procedure the database server will need to
1. drop the old dgtt
2. remove all statements from the package cache referencing this old dgtt
3. create the new dgtt
It is not hard to imagine that with more concurrency this can lead to bursty activity on the package cache and more contention.
My next test was hence to declare the dgtt when needed and drop the dgtt after it is no longer needed.
Monitoring starting at 12:15:31
cpu section wait comp time execs
1448 12106 36868 5508 7150
1546 19874 55385 5267 6833
1483 28233 50287 5316 6908
1742 28723 25332 5754 7482
..
1711 23567 20822 5920 7700
1609 18775 16236 6024 7840
927 6304 4862 3580 4674
Monitoring ends at 12:15:51 PM
Summary:
28056 461845 508047 97852 127214
The net result is somewhat the same, about a 20 second execution time, slightly less cpu usage ( less contention ) and slightly improved throughput.
Now one thing that I have not commented on is the compilation time.
You might find it odd that we have compilation time at all, given that these are stored procedures and it should be using static SQL.
The key thing to understand is that DB2 can only compile the statements using the DGTT when the DGTT has been created.
So it is forced to compile all SQL in the stored procedure when executing the procedure.
Moreover, each invocation redeclares the dgtt, so there is no re-use possible of the cached statements.
In order to break this 20 second barrier, we need to get rid of these compilations to benefit from static SQL.
One way to do this is to avoid using DGTT's completely.
In my simple test, this can easily be achieved :
insert into session.t5 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from session.t1;
==> replaced by
select max(seqno) into e from sysibm.syscoldist fetch first 5 rows only;
but if this is not easily done, then consider using Created Global Temporary Tables instead of using DGTT's.
These are created up front e.g. by running :
The Created global temporary tables are just created once outside of the procedure code :
create global temporary table t1 like sysibm.syscoldist on commit preserve rows not logged ;
create global temporary table t2 like sysibm.syscoldist on commit preserve rows not logged ;
create global temporary table t3 like sysibm.syscoldist on commit preserve rows not logged ;
create global temporary table t4 like sysibm.syscoldist on commit preserve rows not logged ;
create global temporary table t5 like sysibm.syscoldist on commit preserve rows not logged ;
The procedure body now becomes :
set level = 1;
insert into t1 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from t1;
insert into t2 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from t2;
insert into t3 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from t3;
insert into t4 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from t4;
insert into t5 select * from sysibm.syscoldist fetch first 5 rows only;
select max( seqno ) into e from t5;
note that I do not need to reference the CGTT tables with the "SESSION" schema.
If I run this now, the result is phenomenal, it completes in less than 2 seconds.
( this includes the time creating 50 connections )
Monitoring starting at 12:17:14
cpu section wait comp time execs
570 444 43875 141 15708
Monitoring ends at 12:17:16 PM
Summary:
570 444 43875 141 15708
CPU usage is very low, compilation time is very low as well...
To summarize, when using DGTTs in a highly concurrent environment, be aware of :
* compressed indexes
* Package cache contention
* Retention of data in the temp tables.
* high number or compilation.
My advice :
* use no or uncompressed indexes for low data volumes
* Use CGTT instead of DGTT
* Drop DGTT explicitly in procedure code.
UID
ibm13285885