DataStage on Cloud Pak for Data

Version: 5.0.3    Premium   IBM

Description

You can use the DataStage services, DataStage Enterprise and DataStage Enterprise Plus, to design and run data flows that move and transform data. Compose your data flows with speed and accuracy using an intuitive graphical design interface that lets you connect to a wide range of data sources, integrate and transform data, and deliver it to your target system in batch or real time.

Both services provide hundreds of ready-to use, built-in business operations for your data flows. The high performance parallel runtime underneath DataStage lets you scale to meet the needs for your data volumes and data complexity.

Use DataStage Enterprise Plus to access all the capabilities that DataStage Enterprise has, but with additional useful features for data quality. These features include:

  • Cleansing data by identifying potential anomalies and metadata discrepancies.
  • Identifying duplicates by using data matching and probabilistic matching of data entities between two data sets.

If you have the ELT Pushdown Express offering, usage is limited to compiling all DataStage flows in SQL and running them in SQL pushdown mode.

Quick links

Compatible data sources

See Supported data sources for a list of data source services that are compatible.

Integrated services

Table 1. Related services. The following related services are often used with this service and provide complementary features, but they are not required.
Service Capability
IBM Knowledge Catalog Create catalogs of curated assets with this secure enterprise catalog management platform that is supported by a data governance framework.
Watson Studio Prepare, analyze, and model data in a collaborative environment with tools for data scientists, developers, and domain experts.
Orchestration Pipelines Use Orchestration Pipelines and create end-to-end flows of machine learning pipelines to create models and customize various functions.