Partitioned reads
The JDBC Connector stage can be configured to run on multiple processing nodes and read data from the data source. In this scenario each of the processing nodes for the stage retrieves a set (partition) of records from the data source. The partitions are then combined to produce the entire result set for the output link of the stage.
To enable partitioned reads set the Enable partitioned reads property to Yes.
When the stage is configured for partitioned reads, it runs the statement that is specified in the Select statement property on each processing node. You can utilize special placeholders in the statements to ensure that each of the processing nodes retrieves a distinct partition of records from the data source.
- [[node-number]] - replaced at run time with the index of the current processing node. The indexes are zero-based. The placeholder is replaced with the value 0 on the first processing node, value 1 on the second processing node, value 2 on the third processing node, and so forth.
- [[node-number-base-one]] - replaced at run time with the index of the current processing node. The indexes are one-based. The placeholder is replaced with the value 1 on the first processing node, value 2 on the second processing node, value 3 on the third processing node, and so forth.
- [[node-count]] - the total number of processing nodes for the stage. By default this number is the number of nodes in the parallel configuration file. The location of the parallel configuration file is specified in the APT_CONFIG_FILE environment variable. The stage can be configured to run on a subset of nodes from the parallel configuration file, by defining node constraints on the Advanced page under the Stage page in the stage editor.
When partitioned reads are enabled and Read select statement from file property is set to Yes, the connector resolves any [[node-number]], [[node-number-base-one]], and [[node-count]] placeholders in the Select statement property value before accessing the file location specified through this property. Once it retrieves the SELECT statement text from the file, it again resolves any [[node-number]], [[node-number-base-one]], and [[node-count]] placeholders in the statement text before executing the statement.
The placeholder support also applies to the Before SQL (node) and After SQL (node) properties. When partitioned reads are enabled, any [[node-number]], [[node-number-base-one]] and [[node-count]] placeholders in the Before SQL (node) and After SQL (node) statements are resolved on each processing node before running the statements. When the Read Before SQL (node) statement from file or Read After SQL statement (node) from file properties are set to Yes, any [[node-number]], [[node-number-base-one]], and [[node-count]] placeholders in the file names that are specified in Before SQL (node) and After SQL (node) properties are resolved before opening the files. Also, any [[node-number]], [[node-number-base-one]], and [[node-count]] placeholders in the Before SQL (node) and After SQL (node) statements that are retrieved from the files are resolved on each processing node before running the statements on that node.
Ensure that the records returned by statements on individual processing nodes have matching field definitions. For example, if the statement on one of the processing nodes returns records with the columns C1 INTEGER, C2 VARCHAR(20), and C3 DATE, ensure that the statements on all the remaining processing nodes return records with the same column definitions.
Example
SELECT C1, C2, C3 FROM TABLE1 WHERE MOD(C1, [[node-count]]) = [[node-number]]
The
connector runs the following statements on each of the processing
nodes:
Node | Statement |
---|---|
1 | SELECT C1, C2, C3 FROM TABLE1 WHERE MOD(C1, 4) = 0 |
2 | SELECT C1, C2, C3 FROM TABLE1 WHERE MOD(C1, 4) = 1 |
3 | SELECT C1, C2, C3 FROM TABLE1 WHERE MOD(C1, 4) = 2 |
4 | SELECT C1, C2, C3 FROM TABLE1 WHERE MOD(C1, 4) = 3 |
C1 | C2 | C3 |
---|---|---|
1 | Value one | 2013-01-01 |
2 | Value two | 2013-01-02 |
3 | Value three | 2013-01-03 |
4 | Value four | 2013-01-04 |
5 | Value five | 2013-01-05 |
6 | Value six | 2013-01-06 |
7 | Value seven | 2013-01-07 |
8 | Value eight | 2013-01-08 |
9 | Value nine | 2013-01-09 |
10 | Value ten | 2013-01-10 |
C1 | C2 | C3 |
---|---|---|
4 | Value four | 2013-01-04 |
8 | Value eight | 2013-01-08 |
C1 | C2 | C3 |
---|---|---|
1 | Value one | 2013-01-01 |
5 | Value five | 2013-01-05 |
9 | Value nine | 2013-01-09 |
C1 | C2 | C3 |
---|---|---|
2 | Value two | 2013-01-02 |
6 | Value six | 2013-01-06 |
10 | Value ten | 2013-01-10 |
C1 | C2 | C3 |
---|---|---|
3 | Value three | 2013-01-03 |
7 | Value seven | 2013-01-07 |