InterProcess Stages

An InterProcess (IPC) stage is a passive stage which provides a communication channel between IBM® InfoSphere® DataStage® processes running simultaneously in the same job. It allows you to design jobs that run on SMP systems with great performance benefits. To understand the benefits of using IPC stages, you need to know a bit about how InfoSphere DataStage jobs actually run as processes. See IBM InfoSphere DataStage Jobs and Processes for information.

The output link connecting IPC stage to the stage reading data can be opened as soon as the input link connected to the stage writing data has been opened.

You can use InterProcess stages to join passive stages together. For example you could use them to speed up data transfer between two data sources:

Figure 1. Example job
Shows a job with an InterProcess stage

In this example the job will run as two processes, one handling the communication from the Sequential File stage to the IPC stage, and one handling communication from the IPC stage to the ODBC stage. As soon as the Sequential File stage has opened its output link, the IPC stage can start passing data to the ODBC stage. If the job is running on a multiprocessor system, the two processors can run simultaneously so the transfer will be much faster.

You can also use the IPC stage to explicitly specify that connected active stages should run as separate processes. This is advantageous for performance on multiprocessor systems. You can also specify this behavior implicitly by turning interprocess row buffering on, either for the whole project via the Administrator client, or individually for a job in its Job Properties dialog box.

Figure 2. Example job
Shows a job with multiple InterProcess stages