Optimizing recovery performance

There are strategies that you can use to improve Db2® performance during database recovery and decrease the time that is required to recover from a Db2 service outage.

The following should be considered when thinking about recovery performance:

  • You can improve performance for databases that are frequently updated by placing the logs on a separate device. In the case of an online transaction processing (OLTP) environment, often more I/O is needed to write data to the logs than to store a row of data. Placing the logs on a separate device will minimize the disk arm movement that is required to move between a log and the database files.

    You should also consider what other files are on the disk. For example, moving the logs to the disk used for system paging in a system that has insufficient real memory will defeat your tuning efforts.

    Db2 database products automatically attempt to minimize the time it takes to complete a backup or restore operation by choosing an optimal value for the number of buffers, the buffer size and the parallelism settings. The values are based on the amount of utility heap memory available, the number of processors available and the database configuration.

  • To reduce the amount of time required to complete a restore operation, use multiple source devices.
  • If a table contains large amounts of long field and LOB data, restoring it could be very time consuming. If the database is enabled for rollforward recovery, the RESTORE command provides the capability to restore selected table spaces. If the long field and LOB data is critical to your business, restoring these table spaces should be considered against the time required to complete the backup task for these table spaces. By storing long field and LOB data in separate table spaces, the time required to complete the restore operation can be reduced by choosing not to restore the table spaces containing the long field and LOB data. If the LOB data can be reproduced from a separate source, choose the NOT LOGGED option when creating or altering a table to include LOB columns. If you choose not to restore the table spaces that contain long field and LOB data, but you need to restore the table spaces that contain the table, you must roll forward to the end of the logs so that all table spaces that contain table data are consistent.
    Note: If you back up a table space that contains table data without the associated long or LOB fields, you cannot perform point-in-time rollforward recovery on that table space. All the table spaces for a table must be rolled forward simultaneously to the same point in time.
  • The following apply for both backup and restore operations:
    • Multiple devices should be used.
    • Do not overload the I/O device controller bandwidth.
  • Some recovery settings can be tuned for performance through the DB2_RECOVERY_SETTINGS registry variable.

    Db2 uses multiple agents to process log records during database recovery operations, such as crash recovery, database rollforward, and HADR standby log replay. Multiple agents means better performance during these operations, particularly on symmetric multi-processor (SMP) machines where extra CPU power is available. Use the NUM_AGENTS option of the DB2_RECOVERY_SETTINGS registry variable to set the number of agents to use for log replay during database recovery.

    The default value of the NUM_AGENTS option is determined by the number of CPUs on the machine. The agents involved in log replay during recovery are db2redom and db2redow, where a single db2redom agent distributes log records to multiple db2redow agents so that they can be reapplied concurrently, where appropriate. For example, the processing of log records associated with INSERT, DELETE, UPDATE, ADD KEY, and DELETE KEY operations can be parallelized in this way. Log records are parallelized at the page level, meaning log records on the same data page are processed by the same agent. This method improves performance, even when all work is done on one table.

    Queues used to hold log records are organized in sets, where the records from different table spaces are hashed across different sets. A larger number of sets generally results in more records from each table space going into different sets. A larger number of queues within a set allows for more records from a table space to be processed in parallel. These values are set through NUM_QSETS and QSETSIZE options of the DB2_RECOVERY_SETTINGS registry variable. Using higher values for these settings can result in higher parallelization, however, the benefit diminishes as the values increase. Beyond a certain value, the overhead required to manage the larger number of agents and queues reduces performance.
    Note: In some scenarios, lowering the values can result in better performance. Experiment with the settings to determine the optimal values for your database. Factors that affect performance include the following:
    • Available CPUs
    • Available memory
    • Database layout
    • Workload type
    • Workload patter
    When you perform a recovery operation, Db2 automatically selects an optimal value for the number of buffers, the buffer size, and the parallelism settings (if these values have not already been set using the DB2_RECOVERY_SETTINGS registry variable). The values are based on the amount of utility heap memory available, the number of processors available and the database configuration. Depending on the amount of storage available on your system, consider allocating more memory by increasing the util_heap_sz configuration parameter.