Data and Analytics Data Fabric

Overview

Data fabric is an architectural pattern geared towards amplifying use of data across an organization irrespective of type of data formats, data sources, location of data and usage of data. The various aspects of data lifecycle, from data access to consumption, those are covered by Data Fabric are data discovery, data governance, data quality, data classification, business context association, data lineage, self service and data operationalization to make right data available in right place and time. Click to see additional guidance.

The Reference Architecture for Data Fabric is a template that can be used by enterprises as a guidance which can help them to implement various components of Data Fabric in their respective environments. The Data Fabric reference architecture has five key modules - namely Meta Data Import, Meta Data Enrichment, Meta Data Cataloging, Data Curation and Transformation and Data Consumption. These modules are key to realize benefits of Data Fabric stated before.

The reference architecture covers key components, the steps involved and the architecture decisions for each module which can help in realization the objective of the five modules. It also covers the various technology options available in IBM technology landscape to implement the components and the steps. For the Data Consumption module the generic consumption pattern is covered with the assumption that the details of each consumption use case would be covered by the respective reference architecture of each use case.

Also of interest should be the overall Data and AI Reference Architecture.

Overview for IBM Z®

IBM Z systems Data Fabric Reference Architecture is a specialization of the broader IBM Data and Analytics Data Fabric architectural pattern which is geared towards amplifying use of data across an organization irrespective of type of data formats, data sources, location of data and usage of data. The various aspects of data lifecycle, from data access to consumption, those are covered by Data Fabric are data discovery, data governance, data quality, data classification, business context association, data lineage, self-service and data operationalization to make right data available in right place and time. See additional guidance:

Specializing the broader Data Fabric architectural pattern with respect to IBM Z® systems, drills down on two aspects:

• Dealing with governance and access to various data sources on IBM Z Systems (e.g., VSAM, IMS, DB2, …)
• Linux® on IBM Z or LinuxONE (MongoDB,...) and
• Implementing components of Enterprise-wide Data Fabric architecture on IBM Z Systems and Linux on IBM Z/LinuxONE. The solution includes components running on zSystems / LinuxONE and/or outside systems.

The Reference Architecture for Data Fabric is a template that can be used by enterprises as a guidance which can help them to implement various components of Data Fabric in their respective environments. The Data Fabric reference architecture has five key modules - namely Meta Data Import, Meta Data Enrichment, Meta Data Cataloging, Data Curation & Transformation and Data Consumption. These modules are key to realize benefits of Data Fabric stated before.

The reference architecture covers key components, the steps involved and the architecture decisions for each module which can help in realization the objective of the five modules. It also covers the various technology options available in IBM technology landscape to implement the components and the steps. For the Data Consumption module the generic consumption pattern is covered with the assumption that the details of each consumption use case would be covered by the respective reference architecture of each use case.

Application modernization for IBM Z architecture further details architectural patterns for modern, easier access to system-of-record (SOR) data on IBM Z and LinuxONE as well as various Data-integration centric patterns. This is essential in gaining insight for data-driven business value as applications share system-of-record (SOR) data either through direct access, replication, caching, or data virtualization concepts that combine data assets across the enterprise.

Also of interest should be the overall Data , Analytics and AI Reference Architecture:

Architectural decisions

Name

Issue or Problem Statement

Assumptions

Motivation

Data location, gravity and sovereignty

AD01

Proper control and data access methods need to be in place to support availability and regulatory requirements.

The selected implementation method will have a direct impact, on costs, viability to support latency requirements, regulatory adherence, and overall customer satisfaction.

Proper control and data access methods need to be in place to support availability and regulatory requirements.

Data movement and replication should be minimized to improve simplicity, governance, costs, and regulatory concerns while at the same time providing an effective, resilient, and flexible platform for analytics (including deep analytics ,decision optimization and AI workloads).

The selected implementation method will have a direct impact, on costs, viability to support latency requirements, regulatory adherence, and overall customer satisfaction.

Based on where data is located, determine whether data should be moved or accessed virtually based on workload, latency, and regulatory considerations, just in time.

Knowledge catalog(s) organization and relationships

AD02

Organizations may need to support the existence of multiple catalogs depending on various types of requirements including e.g. the existence of hybrid multi cloud ecosystems where catalogs need to be virtually connected, Further catalog structures may be based on Project, LOB and Corporate considerations. There may also be the need for experimental/sandbox, development instances within an organization.

Catalog instantiation should be implemented in a manner that supports the organizational needs without being overly complex to manage and traverse.

The catalog choices will impact the organization’s ability to leverage data across corporate ecosystems and potentially business partner ecosystems.

Data asset and relationships including metadata capture and enrichment

AD03

Data assets are being created and consumed at an ever-increasing rate. Organizations are no longer able to depend on manual and loosely automated processes to support the capture and cataloging of data assets and their related metadata.

Automation is key in capturing and enriching the metadata data created for the various data assets in a timely manner.

Without automation the organization will not be able to maintain a current and usable catalog of data assets which in turn will stifle the organization ability to leverage their data assets to further their progress in becoming a data-driven organization.

Ensuring the appropriate method of transformation and curation based on the workload at hand and accounting for non functional requirements

AD04

Organizations will require various types of implementations (e.g. real-time, near-real-time (streaming), batch (micro/mini/large) for small, medium, large and extremely large workloads that that needs transformation and curation processing.

Regardless of implementation path the transformations and curation of the data should remain consistent in order for the appropriate data science, analytics and reporting functions to be accurate.

Selecting the appropriate method of data transformation and curation will ensure that organization will be able to meet their objectives in various use cases including trustworthy AI, Customer 360, and insights development.

Resources

What is a Data Fabric Architecture

Read about the six core capabilities of a data fabric architecture in this blog post.

IBM Data and Analytics Strategy Field Guide

Businesses need to move rapidly. Data and the related analytics are key to differentiation, but traditional approaches.

Infuse IBM Artificial Intelligence Field Guide

Artificial Intelligence (AI) is a powerful set of software engineering data techniques that enable you to make sense of data.

IBM Data Storage Placement in Hybrid Multicloud

Determine which types of architecture patterns for data repositories to use in the advanced technologies across hybrid multicloud.