Transform IBM Z data pattern

Incrementally build new modernized system-of-record (SOR) data stores by tapping into IBM Z® data traffic through a data adapter to transform to the modernized data format.

Overview

← Back to Application modernization patterns

The transformation of SOR data by software processes to create a wholly new data set is a specialization of a broader copy-and-access use case. That use case also includes extract, transform, and load (ETL) processes, software replication processes, and virtualization and federation processes. All of those broader processes include some transformation capabilities, and it’s common to require light transformations during otherwise routine copy-and-access use cases.

This pattern involves heavy set-based transformations that create a data set that is composed by external feeds, derived transformations, and source SOR data sets, such as aggregations or consolidations and summations. Currency requirements dictate the use of either real-time mechanisms, such as virtualization, near-real-time mechanisms, such as software replication, or periodic batch mechanisms, such as ETL or extract, load, and transform (ELT).

You might need combinations of those three mechanisms if set-based transformations, which are typically the province of ETL products, are required in addition to real-time or near-real-time currency. While this pattern applies to SOR data on any platform, its use with SOR data on IBM® z/OS® is relevant and valuable because many large enterprises use z/OS as an SOR.

In the following simplified depiction, the transformation processes are indicated by the Data Adapter box. In practice, these transformations can span both processes and time.

Solution and pattern for IBM Z®

On z/OS, it’s typical to have solutions that maintain logically related data across different stacks, such as IBM® Db2®, IMS, VSAM, or sequential files. The ability to view this data through federated queries or from an aggregated copy has value in and of itself. Add the ability to extend the aggregation to derive data (summations, transformations) and to add sources (distributed databases, external feeds), and the value of the original z/OS SOR data grows without disrupting the original workloads.

Beyond data creation, you can use this pattern to create schemas over data. One common use case is to convert normalized data models from the SOR to a de-normalized data model. A de-normalized data model might be one that is used in data warehouse solutions and dimensional data marts that are optimized for statistical and analytical processing. Another common use case is to create user-oriented schemas from what might be a product orientation at the SOR. This use case enables consumption by people who are less familiar with the internal view of the technology or products that underpin the data.

IBM has industry-leading product capabilities in the data transformation space. One such product is Data Stage, which is available in on premises, on IBM Cloud Pak for Data, and in cloud implementations.

Advantages

Creating data sets with extra attributes that are derived from production data sets allows for extendibility with minimal disruptions. It also reduces the need to maintain the same data in multiple stores without synchronization.

  • No changes are required to the original SOR data sets. All of these methods apply transformations downstream from the original data.

  • These methods all provide remote access to SOR data in addition to the transformation of that SOR data.

  • Downstream data sets can be created or modified without impacting the source SOR data that they are derived from.

  • New applications can be created outside of the context of the original SOR data.

  • The impact to core SORs from downstream workload requirements is mitigated.

Considerations

When you create transformed data from SOR data, you need to consider a few factors. Any changes to the source SOR schema or content can have a downstream impact on the new, derived workload. Modify SOR procedures and processes to account for the impact on the new, derived workloads. Make sure that lineage and provenance are discoverable to ensure that the downstream processes are maintained correctly.

Contributors

Paul Cadarette
STSM, Data Replication, IBM Master Inventor IBM

Greg Vance
STSM, IMS Development IBM