DataStage on Cloud Pak for Data

Version: 5.0.3 Premium IBM

Thumbnail depiction of the interface of this service. No code is necessary to build a DataStage flow with the canvas.

Thumbnail depiction of the interface of this service

Description

You can use the DataStage services, DataStage Enterprise and DataStage Enterprise Plus, to design and run data flows that move and transform data. Compose your data flows with speed and accuracy using an intuitive graphical design interface that lets you connect to a wide range of data sources, integrate and transform data, and deliver it to your target system in batch or real time.

Both services provide hundreds of ready-to use, built-in business operations for your data flows. The high performance parallel runtime underneath DataStage lets you scale to meet the needs for your data volumes and data complexity.

Use DataStage Enterprise Plus to access all the capabilities that DataStage Enterprise has, but with additional useful features for data quality. These features include:

Cleansing data by identifying potential anomalies and metadata discrepancies.
Identifying duplicates by using data matching and probabilistic matching of data entities between two data sets.

If you have the ELT Pushdown Express offering, usage is limited to compiling all DataStage flows in SQL and running them in SQL pushdown mode.

Quick links

Install: Install the service
Upgrade: Upgrade the service
Use: Work with the service
Known issues: View limitations
Administer: Manage and maintain the service
Troubleshoot: Find solutions to problems
Develop: Write code and build applications

Compatible data sources

See Supported data sources for a list of data source services that are compatible.

Integrated services

Table 1. Related services. The following related services are often used with this service and provide complementary features, but they are not required.
Service	Capability
IBM Knowledge Catalog	Create catalogs of curated assets with this secure enterprise catalog management platform that is supported by a data governance framework.
Watson Studio	Prepare, analyze, and model data in a collaborative environment with tools for data scientists, developers, and domain experts.
Orchestration Pipelines	Use Orchestration Pipelines and create end-to-end flows of machine learning pipelines to create models and customize various functions.