Creating a DataStage flow

DataStage® flows are the design-time assets that contain data integration logic.

You can create an empty DataStage flow and add connectors and stages to it or you can import an existing DataStage flow from an ISX file.

The basic building blocks of a flow are:
  • Data sources that read data
  • Stages that transform the data
  • Data targets that write data
  • Links that connect the sources, stages, and targets

Palette and canvas in IBM DataStage

DataStage flows and their associated objects are organized in projects. To start, open an existing project or create a new project.

Creating a DataStage flow by individually adding connectors and stages

To create a DataStage flow by individually adding connectors and stages, complete the following steps.

  1. Open an existing project or create a project.
  2. Click Add to project + and select DataStage flow or select New DataStage flow + in the DataStage flows section of the page.
  3. Add a name and optional description for the new flow on the New tab of the New DataStage flow page.
  4. Drag connectors or stages from the palette onto the canvas as nodes and arrange them as you like. Connect these nodes on the canvas by clicking the arrow icon on a node and dragging it to the node you want to connect to.

    This action creates a link between the nodes.

    Note: The connections that you add to the flow must be created already in the project that you are working in. For more information, see Adding connections to projects.
  5. Double-click a node to open up its Details card, where you can specify configurations and settings for the node.
  6. Click Run when you are done setting up the flow.

    The flow is automatically saved, compiled, and run. You can view logs for both the compilation and job run.

After the flow is compiled into a job, you can re-run the job, set a schedule, monitor the job, and update the environment that you want to run it in. For more information about running jobs, see Creating, scheduling, and running jobs. For more information about updating the DataStage environment where you want your jobs to run, see Creating instance and environments in DataStage.

Editing a DataStage flow

You can use the following actions to edit a DataStage flow.

  • Drag a stage or connector and drop it on a link between two nodes that are already on the canvas. Links are automatically added for the new node and columns are automatically propagated. Click Run again to see the results.
  • Manually detach and reattach links from nodes on the canvas by hovering your pointer over them and clicking the end points of the links.
  • Drag a stage or connector from the palette and drop it onto a link that is already on the canvas. The stage or connector is automatically linked to the node on either side of it and the columns in the DataStage flow automatically propagated.

Writing and reading persistent data

Use persistent storage mounted at /px-storage whenever writing data from a stage to ensure all parallel processes running on the conductor or compute pods can access the data. Paths that are local to individual pods such as /tmp are not recommended.

Creating a new DataStage Component

You can collect a set of stages and connectors to reuse in DataStage flows by creating a new DataStage component. Use subflow components to collect a set of stages and connectors to reuse in DataStage flows and jobs.

  1. Open an existing project or create a project.
  2. Click Add to the project + and select DataStage component from the available asset types.
  3. Select Subflow as the DataStage component type.

You can manage all your DataStage components from the Assets tab.