Importing data with the Teradata Connector for Hadoop

You can use the Teradata Connector for Hadoop to import data from the Teradata database to Db2 Big SQL.

Before you begin

Before you can use the Teradata Connector for Hadoop with Sqoop, you must install the connector from their website.

About this task

When the Teradata Connector is installed on the Db2 Big SQL cluster, you can use Sqoop commands to import data between a Teradata database and the distributed file system or a Hive table.

Procedure

Use the Sqoop command tdimport.
```
sqoop tdimport <teradata import options> <sqoop options>
```
The <teradata import options> are options that are available in the Teradata Connector but they are not available in Sqoop. When possible, use the import <sqoop options> instead.

Configure the Teradata and Sqoop parameters.

Teradata specific options that are not available in Sqoop can be specified before the Sqoop options. The <teradata import options> must begin with -D. These options are described in the readme file that is included with the downloaded package.

This table shows <sqoop options> that can be used with the tdimport command. Each Sqoop option must begin with --.

Table 1. Sqoop options for the tdimport command
Sqoop option	Description
--connect <jdbc-uri>	Specify the JDBC connection string
--help	Print usage instructions
--password <password>	Set the authentication password
--password-file <file>	A local or HDFS file that contains the password.
--password-alias	The alias for the Hadoop credential.
--username <username>	Set the authentication user name
--as-avrodatafile	Imports data to Avro data files (see Examples)
--as-sequencefile	Imports data to sequence files
--as-textfile	Imports data as plain text (default)
--columns <col,col,col...>	Specify the columns to import from a table
--num-mappers <n>	Use `'n'` map tasks to import in parallel
--query <statement>	Import the results of a SQL 'statement'
--split-by <column-name>	Specify the column of the table to use to split work units
--table <table-name>	Specify the table to read
--target-dir <dir>	Specify the HDFS plain table destination
--where <where clause>	Specify the WHERE clause to use during import
--input-enclosed-by <char>	Set a required field encloser
--input-escaped-by <char>	Set the input escape character
--input-fields-terminated-by <char>	Set the input field separator
--input-lines-terminated-by <char>	Set the input end-of-line char
--hive-table <table-name>	Specify the target Hive table name
--map-column-hive <columnname=type,columnname=type,... >	Specify the target Hive table column names and data types. Required only when you create the target table.

Example

Importing data with the Teradata Connector into a Hive table

Create a new Hive table with sequence file format and import by using five map tasks.

/usr/hdp/current/sqoop-client/bin/sqoop tdimport 
	--connect jdbc:teradata://tdserver/database=GOSALES 
	--username dbuser 
	--password dbpw 
	--as-sequencefile 
	--hive-table country 
	--map-column-hive "SALESCOUNTRYCODE=int, COUNTRY=string,
	ISOTHREELETTERCODE=string, ISOTWOLETTERCODE=string,
	ISOTHREEDIGITCODE=string, CURRENCYNAME=string,
	EUROINUSESINCE=string"
	--table COUNTRY 
	--split-by SALESCOUNTRYCODE 
	--num-mappers 5

Import and append data into an existing partitioned Hive table in text file format. The Teradata option teradata.db.input.method is entered before the Sqoop options.

/usr/hdp/current/sqoop-client/bin/sqoop tdimport 	
	-Dteradata.db.input.method=split.by.partition 
	--connect jdbc:teradata://tdserver/database=GOSALES 
	--username dbuser 
	--password dbpw  
	--as-textfile  
	--hive-table salestarget
	--table salestarget 
	--columns "SALESSTAFFCODE, SALESYEAR" 
	--split-by SALESYEAR

Import and append data into an existing Hive table in text file format with fields delimited by '|'. The Teradata option to use hcatalog is entered before the Sqoop options.

/usr/hdp/current/sqoop-client/bin/sqoop tdimport 
	-Dteradata.db.input.job.type=hcat 
	--connect jdbc:teradata://tdserver/database=GOSALES 
	--username dbuser 
	--password dbpw  
	--as-textfile 
	--hive-table country 
	--table country  
	--split-by SALESCOUNTRYCODE 
	--input-fields-terminated-by "|"

Importing data with the Teradata Connector in the Avro format

To import data between a Teradata database and HDFS, you can specify the file format as Avro by using the option --as-avrodatafile . When using the option --as-avrodatafile, you must also specify the Avro schema Teradata property tdch.output.hdfs.avro.schema.file. For example:

/usr/hdp/current/sqoop-client/bin/sqoop tdimport
    -Dtdch.output.hdfs.avro.schema.file=/tmp/data/country.avsc --connect jdbc:teradata://tdserver/database=SALES  --username tduser
    --password tdpw --as-avrodatafile --target-dir  /tmp/data/country_avro --table COUNTRY
    --split-by SALESCOUNTRYCODE  --num-mappers 1

The Avro schema file follows this format that includes all of the field names that match the column names in the Teradata table.

{  "type": "record",  "name": "country", 
    "fields" : [    {"name": "SALESCOUNTRYCODE", "type": "int"},   
    ...  ]}

Create a new Hive table and import by using a local password file.

The password file should contain only the password.


$SQOOP_HOME/bin/sqoop tdimport
  --connect jdbc:teradata://tdserver/database=GOSALES
  --username dbuser
  --password-file file://home/user1/gosalesl.txt 
  --hive-table country
  --map-column-hive "SALESCOUNTRYCODE=int, 
     COUNTRY=string, ISOTHREELETTERCODE=string, 
     ISOTWOLETTERCODE=string, ISOTHREEDIGITCODE=string, 
     CURRENCYNAME=string, EUROINUSESINCE=string"
  --table COUNTRY
  --split-by SALESCOUNTRYCODE
  --num-mappers 1

Create a new Hive table and import by using a password alias.

The password alias should be created using the hadoop credential command.


hadoop credential create testit -v somepw 
  -provider jceks://file/tmp/alias.jceks


$SQOOP_HOME/bin/sqoop tdimport
-Dhadoop.security.credential.provider.path=jceks://file/tmp/alias.jceks
  --connect jdbc:teradata://tdserver/database=GOSALES
  --username dbuser
  --password-alias testit
  --hive-table country
  --map-column-hive "SALESCOUNTRYCODE=int, COUNTRY=string, 
    ISOTHREELETTERCODE=string, ISOTWOLETTERCODE=string, 
    ISOTHREEDIGITCODE=string, CURRENCYNAME=string, 
    EUROINUSESINCE=string"
  --table COUNTRY
  --split-by SALESCOUNTRYCODE
  --num-mappers 1