Enabling advanced pruning of the Datamart database

You can enable advanced pruning of the Datamart database to control the size of the database.

About this task

Transaction tracking data is stored in multiple tables in the Datamart database. Pruning automatically occurs for the transaction instance data in the database table names that start with F_ and for aggregated transaction data in the table names that start with AGG_. However, pruning does not automatically occur for the static data stored in the table names that start with D_. In some environments, the data in a subset of the D_xxx tables is dynamic and causes the size of the Datamart database to continually grow. You can run a script to enable advanced pruning which deletes unreferenced rows in the D_xxxx to control the size of the Datamart database.

Before you begin

The enable_dmart_pruning.sh script can be run at any time to enable pruning of the unreferenced rows in the D_xxx tables in the Datamart database. The pruning occurs nightly when the Datamart daily cron job runs between 3 am and 4 am based on the time zone of the Cloud APM server.

Note: Before you run the enable_dmart_pruning.sh script, you need to install IBM® Cloud Application Performance Management V8.1.4.0 interim fix 10 or a later server interim fix. You can download the interim fixes for IBM Cloud Application Performance Management V 8.1.4 from IBM Fix Central.

If there is a not a large amount of data in the Datamart database tables, the enable_dmart_pruning.sh script automatically enables pruning of the D_xxx tables and no further action is required on your part. However, if there is a large amount of transaction tracking data in the Datamart database, then the enable_dmart_pruning.sh does not enable pruning and instead displays a message indicating that you should plan a maintenance window to execute the initial pruning to prevent a performance impact to your Cloud APM server and database server.

The enable_dmart_pruning.sh script indicates the Datamart database contains a large amount of data if the D_xxxx tables contain over 100 million rows, or the D_xxxx tables contain over 10 million rows and the F_xxxx tables contain over 100 million rows. The script displays the number of rows in each D_xxxx table and also writes the information to the latest /opt/ibm/dtl/datalayer/logs/enable_dmart_pruning.sh-*.log file.

During the pruning of the D_xxx tables, the D_xxx table and the F_xxx tables that reference the D_xxx table are locked which prevents these tables from being updated with new data until the pruning is complete. By default, the pruning deletes up to 1 million rows in the D_xxxx tables to prevent the tables from being locked too long. Pruning 1 million rows might take 30 to 60 minutes. If there are more than 1 million rows in the D_xxx tables to prune, the next 1 million rows are pruned when the cron job runs the next day or if you manually run the Datamart daily cron job. The following procedure describes how you can configure the max number or rows that are deleted per pruning job if you want to adjust the number of rows that are pruned at one time. Also for the initial pruning of a large Datamart database, the steps in the procedure describe how to configure the pruning job to prune one large D_xxxx table at a time.

When the enable_dmart_pruning.sh script indicates there is a large amount of transaction tracking data in the Datamart database, the initial pruning of the D_xxxx tables might also require you to increase the Datamart database transaction log size or increase the Db2 LOCKTIMEOUT value for the initial pruning of the D_xxx tables. The steps in the procedure indicate what Db2 error messages are displayed for those scenarios and how to increase the Db2 transaction log size or LOCKTIMEOUT value.

Complete the following steps as the root user on the Cloud APM server:

Procedure

  1. Change to the /opt/ibm/dtl/datalayer/cron directory and run the following command:
    ./enable_dmart_pruning.sh
    Note: /opt/ibm/ is the default Cloud APM server install directory. If you installed Cloud APM into a different directory, use your install directory instead of /opt/ibm/.
  2. If your DATAMART database does not contain a large amount of data, the script automatically enables advanced pruning of the D_xxx tables and displays messages similar to the following ones. No further action is required. The pruning will run daily as part of the existing Datamart daily cron job.
    The DATAMART database has low usage for transaction tracking data.
    Editing /opt/ibm/dtl/datalayer/cron/delete_unreferenced_rows.cfg to enable datamart pruning.
    Changing _CHECK_FOR_UNREFERENCED_ROWS_ENABLED to Yes.
    DATAMART database pruning is already enabled for D_REQUEST.
    DATAMART database pruning is already enabled for D_TRANSACTION.
    DATAMART database pruning is already enabled for D_URL_DETAILS.
    DATAMART database pruning is already enabled for D_SOURCE_IP_ADDRESS.
    DATAMART database pruning is already enabled for D_USER.
    DB20000I  The TERMINATE command completed successfully.
    .... Done
  3. If your DATAMART database contains a large amount of transaction tracking data, the script does not enable advanced pruning and displays messages similar to the following ones:
    "The DATAMART database has high usage for transaction tracking data. 
    The D_xxx tables contain 12345678 rows for database DATAMART. 
    The F_xxx tables contain 87654321 rows for database DATAMART."
    "We recommend that you cleanup the unreferenced rows in the DATAMART D_xxx tables.  
    Edit the /opt/ibm/dtl/datalayer/cron/delete_unreferenced_rows.cfg file.  
    For detailed instructions refer to the topic  "Enable advanced pruning of the Datamart database" in the Knowledge Center at URL: 
    https://www.ibm.com/support/knowledgecenter/SSHLNR_8.1.4/com.ibm.pm.doc/welcome.html.
    If you see the messages, plan a maintenance window to perform the initial pruning of the D_xxx tables.
  4. During the maintenance window, perform these steps to perform pruning:
    1. Confirm you configured the recommended number of Datamart buffer pools for a large Cloud APM server environment. You can display the current number of buffer pools by running these commands on the Cloud APM server if a local Db2 server is being used or on the remote Db2 server:
      • Logon as the Db2 instance user
        su - db2apm
        Where db2apm is the default instance user.
      • db2 connect to DATAMART
        Note: You must connect as the instance ID. Connecting as ITMUSER will not have the authority to update bufferpools.
      • db2 "select * from syscat.bufferpools where BPNAME ='IBMDEFAULTBP' " 
        Note: If this command fails when you cut and paste it, try typing it manually.
      If the number of buffer pools is less than 300000, you should set it to the recommended value by running these commands:
      db2 alter bufferpool IBMDEFAULTBP immediate size 300000
      db2 connect reset; db2 terminate
    2. On the Cloud APM server, logon as root and change to the /opt/ibm/dtl/datalayer/cron directory and run the following command to get the latest count of the number of rows in the D_xxxx tables.
      ./enable_dmart_pruning.sh

      The script displays the number of rows in each D_xxxx table that can be pruned.

    3. Only prune one D_ table with over 1 million rows at a time during the maintenance window to minimize the impact on Db2 resources. The /opt/ibm/dtl/datalayer/cron/delete_unreferenced_rows.cfg file has a line for each D_xxxx table that can be pruned.
      • If the number of rows for a D_xxxx table is less than 1 million, set the value for that table to YES in the delete_unreferenced_rows.cfg file.
      • Choose one D_xxxx table with more than 1 million rows to prune and set the value for the other large D_xxxx tables to NO. We recommend waiting to enable pruning for the largest D_xxxx table until the other D_xxxx tables have been pruned.
      For example, if the enable_dmart_pruning.sh displays these messages for the number of rows in the D_xxx tables:
      1200000 rows exist in table D_REQUEST for DB DATAMART
      1500000 rows exist in table D_TRANSACTION for DB DATAMART
      1400000 rows exist in table D_URL_DETAILS for DB DATAMART
      10000 rows exist in table D_SOURCE_IP_ADDRESS for DB DATAMART
      5000 rows exist in table D_USER for DB DATAMART
      Then edit opt/ibm/dtl/datalayer/cron/delete_unreferenced_rows.cfg. Ensure _DATAMART_clean_D_SOURCE_IP_ADDRESS and _DATAMART_clean_D_USER are set to YES because they have less than 1 million rows and set _DATAMART_clean_D_REQUEST to YES as the first large D_xxx table to prune because it is not the table with the most rows. Then set _DATAMART_clean_D_TRANSACTION and _DATAMART_clean_D_URL_DETAILS to NO as the following:
      _DATAMART_clean_D_REQUEST=YES
      _DATAMART_clean_D_TRANSACTION=NO
      _DATAMART_clean_D_URL_DETAILS=NO
      _DATAMART_clean_D_SOURCE_IP_ADDRESS=YES
      _DATAMART_clean_D_USER=YES
      

      Also if you want to change the max number of rows to prune at a time then change the value of the _MAX_NUM_ROWS_TO_DELETE config property in the delete_unreferenced_rows.cfg file.

    4. Run the following command to run the Datamart daily cron job to perform the pruning:
      /etc/cron.daily/datamart.daily

      The cron job prunes up to 1 million unreferenced rows from the D_xxx tables that are enabled for pruning. The cron job might take 30 to 60 minutes to perform the pruning when there are more than 1 million unreferenced rows to delete.

    5. In a separate console window, run the following command to view the latest log file for the cron job:
      ls -ltr /opt/ibm/dtl/datalayer/logs/partition-maintenance.sh-*.log | tail -1
    6. Locate the following line and check for any error messages that occur after that line:
      ********************* Running Delete Unreferenced Rows ******************
      
      • If you see the following message, it means Db2 timed out before the pruning could be completed.
        DB21034E  The command was processed as an SQL statement because it was not a valid 
        Command Line Processor command.  
        During SQL processing it returned:
        SQL0911N  The current transaction has been rolled back because of a deadlock or timeout.  
        Reason code "2".  SQLSTATE=40001"
        You should increase the value for the Db2 LOCKTIMEOUT setting by performing these steps:
        1. If the Db2 server is local, enter the following command:
          su - db2apm
        2. If the Db2 server is remote, logon to the remote Db2 server as the Db2 instance user.
          db2 connect to DATAMART
          db2 get db cfg for DATAMART | grep LOCKTIMEOUT
        3. Enter this command to update the LOCKTIME value so that Db2 does not timeout locks:
          db2 update db cfg for DATAMART using LOCKTIMEOUT -1
           
          db2 connect reset; db2 terminate
      • If you see the following message, the Db2 transaction log size should be increased.
        DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.  
        During SQL processing it returned:
        SQL0964C  The transaction log for the database is full.  SQLSTATE=57011
        You should increase the values for the Db2 LOGPRIMARY and LOGSECOND configuration values. Db2 does not provide an algorithm for determining the value that should be set. You should double the LOGSECOND value (which is for temporary space). If you still run out of transaction log space, double the value for the LOGPRIMARY config value.
        Perform the following steps to change the Db2 transaction log size:
        • If the Db2 server is local, enter the following command:
          su - db2apm
        • If the Db2 server is remote, logon to the remote Db2 server as the Db2 instance user and record the current values.
          db2 connect to DATAMART
          db2 get db cfg for DATAMART | egrep "LOGFILSIZ|LOGPRIMARY|LOGSECOND" 
        • To update the LOGSECOND config value, enter this command:
          db2 update db cfg for DATAMART using LOGSECOND 50
          Where you should replace 50 with 2 times the LOGSECOND value displayed by the db2 get db cfg command.
        • To update LOGPRIMARY, run:
          db2 update db cfg for DATAMART using LOGPRIMARY 20
          Where you should replace 20 with 2 times the LOGPRIMARY value displayed by the db2 get db cfg command.
        • The LOGFILESIZ config value is normally between 4K and 25K. There is no requirement to change it unless it is less than 4K. To update LOGFILESIZ , enter this command:
          db2 update db cfg for DATAMART using LOGFILESIZE 16000
          Where 16000 is larger than the current value.
        • After you have completed the Db2 configuration updates, enter this command:
          db2 connect reset; db2 terminate
      • Ignore the following Db2 messages in the /opt/ibm/dtl/datalayer/logs/partition-maintenance.sh-*.log file:
        • sqldEscalateLocks: DB2 locks several 100 thousand rows at a time. When it reaches a high number of rows being locked for a given table, it escalates to the point where it locks the entire table. This is normal.
        • SQL0598W: Existing index ITMUSER.D_SRCIP_SRCIPADDID_IDX is used as the index for the primary key or a unique key. SQLSTATE=01550. This is normal.
        • SQL0598W: Existing index ITMUSER.D_USER_USERID_IDX is used as the index for the primary key or a unique key. SQLSTATE=01550. This is normal.
        • SQL fails when attempting to add the primary key or foreign key constraint. This happens because the tables are large and there is not enough Db2 memory to add the constraint. The constraints are dropped to improve performance when deleting rows. It takes very little memory to disable a constraint, but it takes a lot of memory to enable it. When the tables become smaller, the constraints are restored.
    7. If you saw error messages in the log file and performed Db2 config updates then perform steps 4d through 4f again to re-run the Datamart daily cron job as root on the Cloud APM server.
    8. If the log file shows that pruning is successfully deleting rows then the Datamart daily cron job should automatically reduce the number of unreferenced rows by up to 1 million rows each night until there are no more unreferenced rows for the tables that you enabled for pruning in step 4c . You can also manually run "/etc/cron.daily/datamart.daily" multiple times per day to reduce the number of rows.
    9. If you have additional D_xxxx tables to enable for pruning, repeat steps 4b through 4h to enable the next large D_xxxx table for pruning. You can either enable the next D_xxxx table for pruning in the current maintenance window or schedule another maintenance window to enable pruning for the other large D_xxxx tables. The Datamart daily cron job will continue to prune the tables that have already been enabled for pruning in the /opt/ibm/dtl/datalayer/cron/delete_unreferenced_rows.cfg file.
    10. If you have issues enabling the advanced pruning then contact IBM Cloud APM support for assistance.

What to do next

There are two additional setting in the delete_unreferenced_rows.cfg file that you can tune:
  • _MAX_NUM_ROWS_TO_DELETE=1000000: This setting controls the time the Datamart database tables are locked. It takes about 30 minutes for Db2 to delete 1 million rows. Check the timestamps in the latest /opt/ibm/dtl/datalayer/logs/partition-maintenance.sh-*.log file to determine when the deletes start and when they finish. If the nightly deletes are finishing up before your work day begins, you can increase this setting. A run time of 30-60 minutes is normal for large customers until the pruning stabilizes.
  • _SKIP_SMALL_TABLES=500000: This setting prevents the scanning for unreferenced rows in the D_xxxx tables from being run every night. After the unreferenced rows in the D_ tables have been cleaned up, you might not want to run the Datamart daily cron job to scan for new unreferenced rows until you have a large number of rows to delete again. When there are no more unreferenced rows to be deleted, the latest opt/ibm/dtl/datalayer/logs/partition-maintenance.sh-*.log file indicates a low number (less than 1 million) of rows have been deleted. At this point you can adjust the value of this config property to be 2 million higher than the number of rows in your largest D_table. Then the Datamart daily cron jobs skips the step to scan for unreferenced rows until there are at least 2 million new rows to be scanned.