Partition parallelism

Imagine you have the same simple pipeline parallelism job, but that it is handling very large quantities of data.

In this scenario you could use the power of parallel processing to your best advantage by partitioning the data into a number of separate sets, with each partition being handled by a separate instance of the job stages.

Using partition parallelism the same job would effectively be run simultaneously by several processors, each handling a separate subset of the total data.

At the end of the job the data partitions can be collected back together again and written to a single data source.

Shows the same sample job using partition parallelism