LOADDEST
The LOADDEST clause allows to specify the destination to be considered relatively to a set of data to be transferred towards it. The LOADDEST clause is taken into account only if a data migration is performed or if a LOADFILE clause is specified too. Otherwise the LOADDEST clause is ignored. There are five types of environments supported: Db2®, NoSQL, Hadoop, Object Storage and PostgreSQL environments.
When specifying a destination with the LOADDEST clause, it is generally needed to configure various parameters associated to this destination. These parameters must be configured within the 'db2hpu.dest' configuration file. For more information, see Optim High Performance Unload configuration for Big Data, Db2, Object Storage or PostgreSQL destinations.
Specifying a LOADDEST clause is mandatory when a data upload command has to be prepared, if the data file to be handled is of JSON or XML output format.
DB2 option
It allows to specify that the destination is a Db2 environment. The supported Db2 databases are the standard ones, the remote ones cataloged locally and the Db2 Warehouse ones. There is no specific keyword to be specified for a standard Db2 database. Concerning the remote Db2 databases cataloged locally and the Db2 Warehouse databases, they can be respectively specified through the usage of the REMOTE and WAREHOUSE keywords.
- Standard databases
- There is no specific keyword to be added. It allows to specify that the destination is a
standard Db2 database. This is the default
destination, which is considered if no explicit LOADDEST clause is specified The Db2 destination is consistent with the following
output formats: DEL, DELIMITED, ASC, IXF, DSNTIAUL and XML. When a Db2 Load command is written into a file specified through the LOADFILE clause, such a file contains:
- an optional Db2 connection step, if the WITH STANDARD AUTH option has been specified into a LOADDEST clause
- the Db2 Load command itself
- an optional Db2 disconnection step, if the WITH STANDARD AUTH option has been specified into a LOADDEST clause
- Remote databases cataloged locally
- The REMOTE keyword must be specified. It allows to specify that the destination is a remote Db2 environment cataloged locally. The remote Db2 destination is consistent with the following output formats: DEL, DELIMITED, ASC and IXF.
- Db2 Warehouse databases
- The WAREHOUSE keyword must be specified. It allows to specify that the destination is a Db2 Warehouse environment. The WAREHOUSE option is only consistent with the DEL and DELIMITED output formats.
- WITH STANDARD AUTH option:
- When considering an unload task, relative to a standard Db2 destination, or a remote Db2 destination, or a Db2 Warehouse destination based on the usage of the CLPPlus tool, the WITH STANDARD AUTH option affects the content of the file specified with the LOADFILE clause.
NOSQL_DB option
It allows to specify that the destination is a NoSQL environment. The NOSQL_DB option is only consistent with the JSON output format.
- Cloudant: http://www.cloudant.com
- CouchDB: http://www.couchdb.apache.org
- MongoDB: http://www.mongodb.org
- Cloudant and CouchDB: the tool involved is cURL (it can be downloaded on its official website http://curl.haxx.se/download.html)
- MongoDB: the tool involved is mongoimport (it can be downloaded on its official website within a package of MongoDB binaries https://www.mongodb.org/downloads)
- an optional authentication step, if a WITH STANDARD/KERBEROS AUTH option has been specified into the LOADDEST clause
- the upload command itself
When an upload command has to be prepared for a NoSQL destination, its generation is based on the usage of various parameters which values have to be set within the 'db2hpu.dest' configuration file, into its section for the NoSQL destination.
- "alias" option
- It is an optional one. The purpose of this option is to support several sections relative to a given destination within the configuration file for destinations. The alias is the mean to distinguish them. It must correspond to the alias set into a section relative to the destination type considered configured into the 'db2hpu.dest' file. Its value is case sensitive. When specifying such an option, Optim High Performance Unload will search into the 'db2hpu.dest' configuration file a section corresponding to the destination chosen and containing an 'alias' parameter set with the alias considered.
HADOOP option
It allows to specify that the destination is an Hadoop environment. The HADOOP option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
There are three Hadoop destinations supported by explicit keywords associated to them. These destinations are HBase, Hive and HDFS. Each one can be respectively chosen by explicitly using the HBASE, HIVE and HDFS keywords. One can also choose any existing Hadoop destination (not only one of these three ones above), by specifying the usage of a MapReduce program of his choice, associated to the destination to be considered. This can be specified through the usage of the MAPREDUCE keyword.
- Apache HBase: http://hbase.apache.org
- Apache Hive: http://hive.apache.org/downloads.html
- Apache Pig: http://pig.apache.org/releases.html
- HBASE: the tool involved is hadoop or pig
- HIVE: the tool involved is beeline (compatible with the Hive 0.14 version and greater)
- HDFS: the tool involved is hdfs
- MAPREDUCE: the tool involved is hadoop
- an optional authentication step, if a WITH KERBEROS AUTH option has been specified through the LOADDEST clause,
- if the LOADDEST clause is specified with another destination keyword than HDFS, a step copying the data file to be uploaded to a temporary file located on the associated HDFS file system,
- the upload command itself,
- if the LOADDEST clause is specified with the destination keyword HBASE or MAPREDUCE, a step removing the temporary file created in the previous copying step (this step does not exist when the HIVE destination keyword is specified, because the temporary file is automatically removed by the upload command executed in this case).
When an upload command has to be prepared for an Hadoop destination, its generation is based on the usage of various parameters which values have to be set within the 'db2hpu.dest' configuration file, into its section for the Hadoop destination.
When considering the HBase destination, if an upload command has to be generated, it is mandatory to specify an INTO TABLE(S) clause with its WITH COLUMNS option. In order to upload data to an HBase table, each column of the source table concerned must be associated to a column name and a column family defined within the target HBase table, a column family being a group of columns of an HBase table. These associations can be specified through the usage of the WITH COLUMNS option in question. It must contain the specification of a default column family. If needed, it can also contain a list of optional associations, each association relating a column of the source table concerned to a column (and its column family) within the target HBase table. Any column of the source table concerned not explicitly associated to a column family is associated to a column of the same name accompanied by the default column family. A source column name specified at this level must be concerned by the task involved for the associated table.
When considering the Hive destination, if the output format involved is DELIMITED, pay attention to the format chosen for the DATE and TIMESTAMP data types: the DATE_C option must be used for the DATE columns formatting, and the TIMESTAMP_G option must be used for the TIMESTAMP columns formatting.
- "alias" option
- It is an optional one. The purpose of this option is to support several sections relative to a given destination within the configuration file for destinations. The alias is the mean to distinguish them. It must correspond to the alias set into a section relative to the destination type considered configured into the 'db2hpu.dest' file. Its value is case sensitive. When specifying such an option, Optim High Performance Unload will search into the 'db2hpu.dest' configuration file a section corresponding to the destination chosen and containing an 'alias' parameter set with the alias considered.
OBJECT_STORAGE option
It allows to specify that the destination is an Object Storage environment. The supported destinations are Amazon EC2, Amazon S3, OpenStack Swift, Microsoft Azure and a remote file system. They can be respectively specified through the usage of the AWS_EC2, AWS_S3, SWIFT, AZURE and FILESYSTEM keywords.
- AWS_EC2 option
- The AWS_EC2 option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
- AWS_S3 option
- The AWS_S3 option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
- SWIFT option
- The SWIFT option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
- AZURE option
- The AZURE option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
- FILESYSTEM option
- It allows to specify that the destination is a file system environment where files can be uploaded, based on the usage of the scp command. The FILESYSTEM option is consistent with the following output formats: DEL, DELIMITED, JSON and XML.
- "alias" option
- It is an optional one. The purpose of this option is to support several sections relative to a given destination within the configuration file for destinations. The alias is the mean to distinguish them. It must correspond to the alias set into a section relative to the destination type considered configured into the 'db2hpu.dest' file. Its value is case sensitive. When specifying such an option, Optim High Performance Unload will search into the 'db2hpu.dest' configuration file a section corresponding to the destination chosen and containing an 'alias' parameter set with the alias considered.
POSTGRESQL option
It allows to specify that the destination is a PostgreSQL environment. The POSTGRESQL option is consistent with the DEL and DELIMITED output formats. All the information about PostgreSQL is available here: https://www.postgresql.org/
If an upload command specified through the usage of a LOADFILE clause has to be generated, or if an automatic migration is performed, the psql PostgreSQL interactive terminal must be installed through a complete package of PostgreSQL. The way to install it depends on the platform used: https://www.postgresql.org/download/.
- an optional authentication step, if a WITH STANDARD AUTH option has been specified into the LOADDEST clause
- the upload command itself
When an upload command has to be prepared for a PostgreSQL destination, its generation is based on the usage of various parameters which values have to be set within the 'db2hpu.dest' configuration file, into its section for the PostgreSQL destination considered.
- "alias" option
- It is an optional one. The purpose of this option is to support several sections relative to a given destination within the configuration file for destinations. The alias is the mean to distinguish them. It must correspond to the alias set into a section relative to the destination type considered configured into the 'db2hpu.dest' file. Its value is case sensitive. When specifying such an option, Optim High Performance Unload will search into the 'db2hpu.dest' configuration file a section corresponding to the destination chosen and containing an 'alias' parameter set with the alias considered.
WITH STANDARD/KERBEROS AUTH option for the NOSQL_DB, HADOOP, SWIFT and POSTGRESQL destinations
The WITH STANDARD/KERBEROS AUTH option allows to specify the authentication method to be applied for an upload of data towards a given destination.
- the STANDARD keyword: it refers to a standard method, based on the usage of a traditional user and password combination
- the KERBEROS keyword: it refers to a Kerberos method, based on the usage of a Kerberos principal
When performing an automatic migration to a NoSQL or Hadoop destination, it is mandatory to specify a standard or Kerberos authentication method. Specifying an authentication method in such a migration case is mandatory for security reasons. When migrating with a standard authentication method to a NoSQL destination, creating appropriate credentials for the destination considered is mandatory too.
When performing an automatic migration to a Swift or PostgreSQL destination, it is mandatory to specify a standard authentication method. Specifying an authentication method in such a migration case is mandatory for security reasons. Creating appropriate credentials for the destination considered is mandatory too.
When generating an upload command for a NoSQL, Hadoop, Swift or PostgreSQL destination, if there is no reference to an authentication method, the upload command is generated without any preliminary step for authentication.
Constraints
- UTF-8 encoding: Optim High Performance Unload must use an UTF-8 encoding when generating files aimed to be uploaded to these destinations. When run for a NoSQL or an Hadoop destination, the reason is that they do not support any other encoding than UTF-8. When run for a Db2 Warehouse destination, the reason is that the underlying Db2 database is implicitly created with the UTF-8 encoding. In consequence, in order to avoid problems, for all of these destinations, if an encoding specification different from UTF-8 is made into a control file for a NoSQL, Hadoop or Db2 Warehouse destination, it is ignored and internally forced to UTF-8
- When considering an Hive destination, most of the date and timestamp formats are
inconsistent with this destination date and timestamp data types. Here is a list of
formats for these data types existing in Optim High Performance Unload, and supported by the Hive
destination:
- The unique date data type format supported is DATE_C.
- The timestamp data type formats supported are TIMESTAMP_A and TIMESTAMP_G. If the TIMESTAMP_A format is considered, its time and timestamp separators must be changed in order to be consistent with an Hive destination. In this case, the time separator must be specified with the TIMEDELIM clause set with the ':' value, and the timestamp separator must be specified with the TIMESTAMPDELIM clause set with the space character for its first value, its second value being ignored.
In order to change such a format, the output format for the output file must be set to DELIMITED.
- When considering a Db2 Warehouse destination
with the usage of the CLPPlus tool, most of the date, time and timestamp formats are
inconsistent with this tool. Here is a list of formats for these data types existing in
Optim High Performance Unload, and supported by this tool:
- The unique date data type format supported is DATE_C.
- The time data type formats supported are TIME_A and TIME_B. The default separator for these formats is the dot character. The colon character is also supported as a separator for the time values by Db2 Warehouse. Its usage can be specified with the TIMEDELIM clause set with the ':' value.
- The timestamp data type formats supported are TIMESTAMP_A, TIMESTAMP_B and TIMESTAMP_G.
In order to change such a format, the output format for the output file must be set to DELIMITED.
- When considering a PostgreSQL destination, most of the date, time and timestamp
formats are inconsistent with this destination date, time and timestamp data types. Here
is a list of formats for these data types existing in Optim High Performance Unload, and supported by the PostgreSQL destination:
- The unique date data type format supported is DATE_C.
- The unique time data type format supported is TIME_F.
- The unique timestamp data type format supported is TIMESTAMP_G.
- Naming convention applied to files containing upload commands generated on Windows platforms: on these platforms, the name of a file which must be executed must be terminated with a “.bat” suffix, in order to succeed in executing it. When generating an upload command file on a Windows platform, if the name specified for it through the LOADFILE clause does not end with such a “.bat” suffix, this suffix is automatically added to the name used when creating the corresponding upload command file.
- Limitations relative to the usage of a user-defined MapReduce program: if one wants to
use a MapReduce of his own for the generation of an upload command, one can do so
specifying a LOADDEST clause with its MAPREDUCE keyword, and setting its associated
'command' parameter within the db2hpu.dest configuration file, into its section
referring to a MapReduce destination. The string to be specified for this parameter must
follow a specific pattern. It must:
- start with the name of the MapReduce program considered, which must be packaged as a jar file
- continue with an optional part containing potential options to be given to the MapReduce program
- expect after it the specification of the data file to be uploaded, which is going to be added to it automatically when generating the associated upload command
LOADDEST(HADOOP MAPREDUCE)
with an associated destination
section into the db2hpu.dest configuration file like the following one:
[MapReduce]
command=/tmp/myMR.jar --input
hdfspath=/tmp/
where:- the MapReduce program is called myMR.jar
- the --input option is an option of the MapReduce program expecting a subsequent specification of the data file to be handled
OUTFILE("outfile")
hadoop jar /tmp/myMR.jar --input /tmp/outfile