Range partitioning

Using the range partitioner with the psort operator.

The range partitioner guarantees that all records with the same key field values are in the same partition and it creates partitions that are approximately equal in size so that all processing nodes perform an equal amount of work when performing the sort. In order to do so, the range partitioner must determine distinct partition boundaries and assign records to the correct partition.

To use a range partitioner, you first sample the input data set to determine the distribution of records based on the sorting keys. From this sample, the range partitioner determines the partition boundaries of the data set. The range partitioner then repartitions the entire input data set into approximately equal-sized partitions in which similar records are in the same partition.

The following example shows a data set with two partitions:
Shows a data set with two partitions, and the effect that range partitioning and sorting has on the data set
This figure shows range partitioning for an input data set with four partitions and an output data set with four:
Shows a data set with four partitions, and the effect that range partitioning and sorting has on the data set

In this case, the range partitioner calculated the correct partition size and boundaries, then assigned each record to the correct partition based on the sorting key fields of each record. See The range prtitioner for more information on range partitioning.