White Papers

Abstract

The accelerator needs transient (temporary) storage to write transient data during various tasks. ‘Transient’ means: this data will be deleted immediately after the task has finished. If the amount of transient data to be written is huge, the writing can lead to a performance impact of the executing task or to resource shortages of the environment. With Accelerator on Z maintenance level 7.5.12 (or later), there are options to configure dedicated storage for transient data.
This document describes these options in detail including the advantages of each option, sizing considerations, setup details, and operational aspects.

Content

The accelerator needs transient storage for various tasks, for example:

Writing temporary results of extensive sort operations during query processing that cannot be executed exclusively in the system memory.
Writing replication spill queues when a replication-enabled table is loaded to the accelerator and at the same time new data changes on this table are replicated to the accelerator.
Writing query results that tend to arrive faster than they can be picked up by the receiving client.

Prior to Accelerator on Z maintenance level 7.5.12, this transient storage was always part of the configured data storage pool. By using the JSON configuration parameter “temp_working_space” the amount of storage used for transient data could be specified (for example “unlimited”, “automatic” or a fixed size).

With Accelerator on Z maintenance level 7.5.12 or later, it is possible to define dedicated storage for transient data using one of the following options:

Define a dedicated transient storage pool on external storage (Option A)
Use local NVMe (non-volatile memory express) storage – LinuxONE Only- (Option B)

This document describes both options in detail and highlights the advantages of using dedicated storage for transient data. The content of this document is structured as follows:

1 Overview of dedicated storage options for transient data
    1.1 Define a transient storage pool on external storage (Option A)
    1.2 Use local NVMe (non-volatile memory express) storage (Option B)
2 Advantages and recommendations for using dedicated storage for transient data
3 Configuring and using a transient storage pool on external storage (Option A)
    3.1 Estimating the size of transient storage pool
    3.2 Configuring a transient storage pool on external storage
    3.3 Operational considerations when using a transient storage pool
4 Configuring and using local NVMe storage (Option B)
    4.1 Estimating the size of local NVMe storage
    4.2 Configuring local NVMe storage

4.3 Operational considerations when using local NVMe storage

1 Overview of dedicated storage options for transient data

With Accelerator maintenance level 7.5.12 or later, it is possible to define dedicated storage for transient data using one of the following options:

Define a dedicated transient storage pool on external storage (Option A)
Use local NVMe (non-volatile memory express) storage – LinuxONE Only - (Option B)

The following two subchapters provide an overview of each option.
Note, that you can only use one or the other option, but not a combination of both. If you defined both options then option B takes precedence over option A.

1.1 Define a transient storage pool on external storage (Option A)

The new optional keyword “transient_devices” in the “storage_environment” section of the JSON configuration file facilitates the definition of a separate storage pool for transient data.

A separate storage pool for transient data can be defined on FCP- or FICON-attached storage.

Note: Different storage pools can be deployed on different storage types. For example, the data pool could be deployed on FICON-attached storage and the transient pool on FCP-attached storage.

1.2 Use local NVMe (non-volatile memory express) storage (Option B)

Adding of the keyword “transient_storage”:”NVME” to the “runtime_environments” section of the JSON configuration file enables the use of all NVMe-devices plugged into the LinuxONE machine as transient storage.

This option

Is only available for accelerator deployments on LinuxONE machines.
Is highly recommended for multi-node accelerator deployments. It should be configured directly during initial installation of each new multi-node accelerator.
Became already available as technical preview with maintenance level 7.5.8 and is fully supported with maintenance level 7.5.12 or later.

2 Advantages and recommendations for using dedicated storage for transient data

Using dedicated storage for transient data has several advantages:

Separation of user data (loaded or replicated) from transient data
This allows more precise calculation of the storage needs of each pool, especially of the data pool.
Improved query performance for queries that write transient data
This is achieved by using faster storage for transient data than for user data.
Improved load and replication performance.
Loads, replication, queries running at the same time will perform better if their storage needs are separated from each other. While queries and replication will use transient storage as needed for their temporary files, loads will use the data pool storage.
Improved stability
Transient data files can get very large, up to multiple Terrabyte (TB). A corresponding high temporary storage use can lead to an out-of-space condition in the data pool due. When using NVMe storage for transient data, this exceptional situation can be prevented.
Reduced cost
In accelerator environments that have storage mirroring implemented (for example in active/passive accelerator environments) the storage for transient data can and should be excluded from any storage mirroring procedures. This reduces the amount of storage that needs to be mirrored and with that reduces costs.

Using local NVMe storage for transient data has the following additional advantages:

Protection of the storage system of the data pool (located on FCP- or ECKD-storage) from being utilized by heavy I/O activity
- I/O of transient data file of multiple TBs can take multiple minutes.
- Used storage bandwidth during this time might impact other systems using the same storage system.
Further improved query performance (compared to transient storage on external storage)
First tests have shown an improvement of a high double digit percentage value compared to using FCP-attached FLASH storage

When you install Accelerator maintenance level 7.5.12 (either as an upgrade from a previous level or as a new install), it is recommended to utilize one of the dedicated storage options to improve performance and stability. Use the following guidelines to decide when to use a transient storage pool versus local NVMe storage.

Use a transient storage pool if one or more of the following is given:

Your accelerator is or will be installed on IBM zSystems (and not on LinuxONE) either as a single-node or multi-node deployment
Your accelerator is a single-node deployment and you want to improve performance and stability further by isolating workloads and their storage usage.
Your accelerator is a multi-node deployment on LinuxONE, but your workload does not require the additional advantages (listed above) that local NVMe storage provide.

Use local NVMe storage if one or more of the following is given:

Your accelerator is a multi-node deployment on LinuxONE and your workload require the additional advantages (listed above) that local NVMe storage provide
- Note: for new multi-node deployments on LinuxONE it is highly recommended to use local NVMe storage already for the initial installation.
Your accelerator is a single-node deployment on LinuxONE and you have complex queries with high temporary storage demands for that you want to improve the performance significantly

3 Configuring and using a transient storage pool on external storage (Option A)

This chapter provides detailed information on the following topics:

Estimating the size of a transient storage pool
Configuring a transient storage pool on external storage
Operational considerations when using a transient storage pool

3.1 Estimating the size of transient storage pool

The sizing approach to estimate the required amount of transient storage differs dependent on whether the sizing is for a new Acceleration installation or for an existing accelerator installation that migrates from using the data pool to a dedicated transient storage pool.

3.1.1 Estimating the size for a new accelerator installation

The accelerator sizing estimation spreadsheet that is used for new accelerator installations now includes the estimation of the transient storage. The estimated amount is based on the amount of Db2 for z/OS data to be synchronized with the accelerator and some assumptions about the workload on the data. Use the estimated amount as the minimum initial size of the transient storage pool.

3.1.2 Estimating the size for an existing accelerator installation

The initial transient storage pool size could be derived from the current setting of “temp_working_space” in the JSON configuration file:

“temp_working_space”= <Fixed Size>
Use the specified amount in <Fixed Size>, e.g. 500 GB, as initial size of the transient storage pool.
“temp_working_space”=”automatic”
This is the default setting used with 7.5.11 or later if temp_working_space is not explicitly set in the JSON configuration file. With this setting the initial size of the temporary tablespace in the data pool was the smaller of the following two values (80% of the LPAR memory, 50% of the free space of the data pool), but the accelerator might have reduced the size over time in case the data pool free space was low. To derive the initial size of the transient storage pool from this setting take one of the following options:
- Use 80% of the configured LPAR memory as the initial size per accelerator LPAR. The amount of configured LPAR memory can be determined from the LPAR configuration, e.g. in the HMC.
- Open a support case with IBM support to determine the current size of the configured temporary tablespace and use this as the minimum initial size of the transient storage pool.
“temp_working_space”=”unlimited”
This is the default setting for Accelerator maintenance levels lower than 7.5.11 if temp_working_space is not explicitly set in the JSON configuration file. If your accelerator installation uses the “unlimited” setting (either as default setting or explicitly set), take one of the following options:
- Use 80% of the configured LPAR memory as the initial size per accelerator LPAR.
- Open a support case with IBM support to monitor the temporary storage requirements of your workload and determine the initial size of the transient storage pool based on the results.

3.2 Configuring a transient storage pool on external storage

For configuring a transient storage pool on external storage, the same requirements exist as for any other storage pool, such as the data pool:

FCP- or FICON-attached storage is required.
The use of all-flash storage is strongly recommended.
For FICON-attached storage devices the use of HyperPAV and zHPF is mandatory.
For FCP-attached storage devices a connection to a Fibre Channel SAN using a switched fabric is required.

It is recommended to see the prerequisites page for more details.

Different storage pools can be deployed on different storage types. For example, the data pool can be deployed on FICON-attached storage while the transient pool is implemented on cheaper and possibly also faster FCP-attached storage.

Ensure that the storage devices are visible to the accelerator SSC LPARs before specifying the storage devices for the transient storage pool in the JSON configuration file and uploading it to the accelerator. For example, after completing IODF changes to define new storage devices to the system, reactivate the accelerator SSC LPAR(s) from the HMC to make the new disks visible to the SSC LPAR(s). See also this documentation chapter: https://www.ibm.com/docs/en/daafz/7.5?topic=z-adding-storage-devices

In the JSON configuration file define “transient_devices” in the “storage_environment” section, for example:

"transient_devices": {
      "type": "dasd",
      "devices": [
        "0.0.9b12"
      ]
    }

In a multi-node environment, define “transient_devices” for all LPARs. You must assign the same amount of storage to the transient pool on each LPAR.

Note: you can only use added storage for the transient storage pool. It is not possible to reassign storage devices that are currently part of the data pool.

See the following chapters in the Accelerator documentation for more information on the “transient_devices” configuration parameter:
https://www.ibm.com/docs/en/daafz/7.5?topic=z-installing-starting-appliance-single-node-setup
https://www.ibm.com/docs/en/daafz/7.5?topic=z-installing-starting-appliance-multi-node-setup

Remove any custom setting of the "temp_working_space" parameter because this parameter applies to transient data in the data pool only. As soon as transient data is processed on transient storage, the "temp_working_space" parameter becomes ineffective.

If “transient_devices” and “transient_storage” are defined in the JSON configuration file, the setting of “transient_storage” takes precedence and “transient_devices” is ignored.

After you completed the changes in the JSON configuration file, upload it to the accelerator as described in this chapter: https://www.ibm.com/docs/en/daafz/7.5?topic=z-updating-existing-configuration

After the JSON configuration file has been successfully uploaded and applied, your accelerator uses the new transient storage pool for writing transient data as part of your workload execution.

3.3 Operational considerations when using a transient storage pool

This chapter describes how to monitor transient storage usage and what actions to take if transient storage gets short.

In addition, this chapter provides considerations for setting or unsetting the WLM threshold MAXTEMPSPACECONSUME in combination with a transient storage pool.

3.3.1 Monitoring transient storage pool usage

The following options are available to monitor how much transient storage is used by the workload:

The SMF counter Q8STTSA provides the current amount of disk space used by all paired Db2 subsystems for transient data. Note, that the provided value is not a high water mark, but a current value. Thus, a regular monitoring of this counter would be required to get a good understanding of the transient storage usage over time.
With Accelerator maintenance level 7.5.12, the accelerator-internal monitoring has been improved to track the temp space usage high watermark in the accelerator trace file. Thus, IBM Support can help to determine the current high water mark from the trace file.

Both options can help you to determine the initial size or to adjust the configured size of the transient storage pool at a later point in time.

If the transient storage pool is too small to handle the transient storage requirements of the workload then queries might fail. To avoid such failures take one of these options:

Increase the transient storage pool size
Only additional storage can be used to increase the transient storage pool. You cannot move storage from the data pool to the transient storage pool
Set the Db2 Warehouse WLM threshold MAXTEMPSPACECONSUME
This threshold is used cancel queries automatically that require more transient storage than the configured threshold value. Open a support case with IBM support to set this threshold on the accelerator.

3.3.2 Considerations for setting or unsetting WLM threshold MAXTEMPSPACECONSUME

The Db2 Warehouse WLM threshold MAXTEMPSPACECONSUME can be used to cancel queries that require more transient storage than the configured threshold value.

Has the threshold MAXTEMPSPACECONSUME been set in your current accelerator installation?

If the answer is Yes, it is suggested to remove the MAXTEMPSPACECONSUME threshold when moving to a transient storage pool. Thereby queries that are currently canceled due to the threshold will finish successfully.

Keep in mind: if you are facing so-called run-away queries consuming a lot of resources in general, not just transient storage and impacting the overall performance of the accelerator, then consider keeping the threshold or reestablishing it again later. For removing or recreating this threshold, open a support case with IBM support.

4 Configuring and using local NVMe storage (Option B)

This chapter provides detailed information on the following topics:

Estimating the size of local NVMe storage
Configuring local NVMe storage
Operational considerations when using local NVMe storage

4.1 Estimating the size of local NVMe storage

For using local NVMe storage, the sizing guidelines are straightforward as outlined in the following. The guidelines apply to multi-node accelerator as well as for single-node accelerator installations. The same guidelines also apply to new or existing accelerator installations.

As a general sizing guideline use two NVMe cards (Samsung PM1733A) of 15 TB each per accelerator LPAR.

Each NVMe card requires a carrier built into the LinuxONE machine, thus in addition to the NVMe cards two carriers are required per accelerator LPAR. For example, if your multi-node accelerator consists of 4 LPARs (1 head node and 3 data nodes), then according to the sizing guideline this results in 8 carriers and 8 NVMe cards (each 15 TB).
If local NVMe storage should be used for a single-node accelerator installation and the LPAR memory is < 4 TB, you should consider to use just one 15 TB NVMe card and one carrier.

If for any reason different NVMe card sizes (e.g. smaller ones) are under consideration, open a support case with IBM support to monitor the transient storage requirements of your workload and determine the required size based on the results. But note, that any other NVMe card with different size or from different vendor cannot be acquired from IBM. Only the mentioned 15 TB card from Samsung can be acquired from IBM.

4.2 Configuring local NVMe storage

For configuring local NVMe storage for transient data, the following requirements exists:

One of the following IBM LinuxONE systems:
- IBM LinuxONE^TM Emperor 4 (LA1 - 3931)
- IBM LinuxONE^TM Rockhopper 4 (LA2 and AGL - 3932)
- IBM LinuxONE^TM III LT1 (8561)
- IBM LinuxONE^TM III LT2 (8562)
IBM Adapter for NVMe 1.1 with FC0448
- IBM part number: P/N 02WN273
- The number of adapters must be equal to the number of carriers calculated in the sizing process.
NVMe card(s) “Samsung PM1733A MZWLR15THBLA-00A07 (15.36TB)”
- IBM part number: P/N 01CM547
- Contact your IBM zHW representative to acquire the cards from IBM
- The number of cards must be equal to the number of carriers calculated during sizing process.
- IBM provides support for this NVMe card and adapter FC0448 when used for an accelerator deployment. If a different NVMe card is used for your accelerator deployment, IBM only provides support for the adapter, but not for the NVMe card.
- For NVMe card specification see this Link
Minimum MCL level S84a for z15 based LinuxONE
Minimum MCL level S27 for z16 based LinuxONE

For more information on the installation of the adapter and NVMe card into a LinuxONE system, refer to one of the following documentations:

z15 based LinuxONE: https://www.ibm.com/support/pages/8562-installation-manual-physical-planning
z16 based LinuxONE: https://www.ibm.com/support/pages/3931-installation-manual-physical-planning

After the NVMe cards have been correctly installed into the LinuxONE system and are visible to the accelerator SSC LPAR(s) after reactivation of the SSC LPAR(s), specify the “transient_storage”:”NVMe” keyword in the “runtime_environment” section. In a multi-node accelerator environment, define it for all data node LPARs. Define it also for the head node LPAR if NVMe cards have also be installed into the system to be used by the head node LPAR.

For example:

For example: 
"runtime_environments": [
        {
            "cpc_name": "Z16_4",
            "lpar_name": "LPAR1",
            "transient_storage": "NVMe",
            "network_interfaces": [
                      ...

See the following chapters in the Accelerator documentation for more information on the “transient_storage” configuration parameter:
https://www.ibm.com/docs/en/daafz/7.5?topic=z-installing-starting-appliance-single-node-setup
https://www.ibm.com/docs/en/daafz/7.5?topic=z-installing-starting-appliance-multi-node-setup

Remove any custom setting of the "temp_working_space" parameter because this parameter applies to transient data in the data pool only. As soon as transient data is processed on local NVMe storage, the "temp_working_space" parameter becomes ineffective.

If “transient_devices” and “transient_storage” are defined in the JSON configuration file, the setting of “transient_storage” takes precedence and “transient_devices” is ignored.

After the JSON configuration file has been successfully uploaded and applied, your accelerator uses all available local NVMe storage devices for writing transient data as part of your workload execution.

4.3 Operational considerations when using local NVMe storage

This chapter describes how to monitor the expected lifetime of an NVMe card and what to do if a NVMe card fails unexpectedly.

In addition, this chapter provides considerations for setting or unsetting the WLM threshold MAXTEMPSPACECONSUME in combination with using local NVMe storage.

4.3.1 Monitoring the lifetime of a NVMe card

A NVMe card has an expected lifetime of multiple years depending on the size of the card and the I/O activity on it. The accelerator estimates the remaining lifetime and issues a warning message in the Admin UI starting when 50% of the expected lifetime has been reached. This message can be used to plan replacement of the cards early before 90%-100% are reached or the NVMe card fails.

The following screenshot shows an example of this message:

The expected lifetime or endurance of a NVMe card is outlined in the specification of a NVMe card and is expressed using the metric ‘Drive writes per day’ (DWPD). For the “Samsung PM1733A” NVMe card with 15 TB the DWPD is 1 for 5 years. This means that the endurance is 5 years assuming that the card is completely overwritten once a day with 15 TB of data. If for example only 7.5 TB transient data is written each day, the expected lifetime of the card is 10 years.

4.3.2 NVMe card failures

If a NVMe card fails for any reason, for example because the end of lifetime has been reached and the card hasn’t been exchanged before, the accelerator gets into a ‘hang’ situation and manual recovery steps are required.

If a NVMe card failure occurs during startup of an accelerator the symptoms are the following:

The accelerator will hang in the system state "starting". After several hours after a timeout is reached the system state becomes "failed".
In the Admin UI storage panel no storage information is displayed for the LPAR with the failed NVMe card or the storage panel is not shown at all. For other LPARs (in case of a multi-node deployment), the storage and NVMe card information is displayed correctly.
In the Admin UI logs panel no log messages are shown for the LPAR with failed NVMe card after messages related to network connection. For other LPARs (in case of a multi-node deployment), further log messages are written.

If a NVMe card failure occurs during operation of an accelerator the symptoms are the following:

Executing queries that write transient data are hanging, but they do not fail. Internally they are waiting (forever) for the transient storage to become available, therefore they do not fail.
In the Admin UI no storage information is displayed for the LPAR with the failed NVMe card. The display hangs. For other LPARs (in case of a multi-node deployment) the storage and NVMe card information is displayed correctly.

To recover from this situation

Change the JSON configuration file: remove the “transient_storage”:”NVMe” keyword for the LPAR with the failed card.
Upload the JSON configuration file to the accelerator
Reactivate the accelerator SSC LPARs

Any transient data on this LPAR will then be written to the data pool. This might impact performance.

As a final resolution replace the failing NVMe card and reactivate the accelerator SSC LPARs afterwards. Don’t forget to revert any temporary changes in the JSON configuration file and upload it to use the NVMe local storage again for all LPARs.

4.3.3 Considerations for setting or unsetting WLM threshold MAXTEMPSPACECONSUME

The Db2 Warehouse WLM threshold MAXTEMPSPACECONSUME can be used to cancel queries that require more transient storage than the configured threshold value.

Has the threshold MAXTEMPSPACECONSUME been set in your current accelerator installation?

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4LQ8","label":"Db2 Analytics Accelerator for z\/OS"},"ARM Category":[{"code":"a8m0z000000072oAAA","label":"Install and Migrate"}],"Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"7.5.0"}]

Tips

Configuring and using transient storage for IBM Db2 Analytics Accelerator for z/OS on Z

White Papers

Abstract

Content

1 Overview of dedicated storage options for transient data

1.1 Define a transient storage pool on external storage (Option A)

1.2 Use local NVMe (non-volatile memory express) storage (Option B)

2 Advantages and recommendations for using dedicated storage for transient data

3 Configuring and using a transient storage pool on external storage (Option A)

3.1 Estimating the size of transient storage pool

3.1.1 Estimating the size for a new accelerator installation

3.1.2 Estimating the size for an existing accelerator installation

3.2 Configuring a transient storage pool on external storage

3.3 Operational considerations when using a transient storage pool

3.3.1 Monitoring transient storage pool usage

3.3.2 Considerations for setting or unsetting WLM threshold MAXTEMPSPACECONSUME

4 Configuring and using local NVMe storage (Option B)

4.1 Estimating the size of local NVMe storage

4.2 Configuring local NVMe storage

4.3 Operational considerations when using local NVMe storage

4.3.1 Monitoring the lifetime of a NVMe card

4.3.2 NVMe card failures

4.3.3 Considerations for setting or unsetting WLM threshold MAXTEMPSPACECONSUME

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?