Using S3 Object Storage with Big SQL tables
Big SQL tables can be created in a location that is specified as an S3a Object Storage URI. Such tables store data in the Object Storage service, which is typically remote to the Big SQL cluster and managed separately by, for example, IBM®'s Cloud Object Storage - S3 API service that is available through IBM Bluemix®.
About this task
Simple Storage Service (S3) is an object store interface protocol that was invented by Amazon. S3a is the name of a Hadoop component that understands the S3 protocol. S3a provides an interface for Hadoop services, such as Big SQL, to consume S3-hosted data.
Before you can use an S3a uniform resource identifier (URI) as a location in a Hive or Big SQL table, you must add the connection information about the S3a Object Storage service to the core-site.xml file. You can access the HDFS service core-site.xml file from the Ambari web console to add the properties in Table 1 for the S3a file system configuration. After you add these properties, restart the HDFS, MapReduce, YARN, Hive, and Big SQL services. Obtain the authentication and connection property values for your Object Storage service to use here.
Property name | Property value |
---|---|
fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem |
fs.s3a.endpoint | The endpoint from which the S3 service is provided. IBM's Cloud Object Storage – S3 API is available from many regions. For example, the San Jose region's endpoint is s3-api.sjc-us-geo.objectstorage.service.networklayer.com. |
fs.s3a.secret.key | The required secret key from the S3 endpoint. |
fs.s3a.access.key | The required access key from the S3 endpoint. |
- The S3a component supports only one endpoint definition.
- Adding these
fs.s3a.*
properties to the core-site.xml file makes data that is stored in the object storage server publicly accessible to anyone with access to the Hadoop cluster. The data does not have an owner, group, or permissions set. Big SQL can use GRANT statements to restrict access to a table, but the data is still accessible to everyone through Hadoop shell commands. Note that Big SQL impersonation is not applicable to these tables. If you prefer not to expose this information, see Enabling Big SQL to use a credential keystore file.
Accessing S3a buckets and files through Hadoop
About this task
s3atables, using the Bluemix San Jose S3 endpoint.
# aws s3 ls s3atables --recursive --endpoint-url=https://s3-api.sjc-us-geo.objectstorage.softlayer.net
2016-07-27 14:51:16 922 country/country.csv
2016-07-27 18:11:22 367 staff/job=Clerk/l1469234296227-6-r-00000
2016-07-27 18:03:29 286 staff/job=Mgr/l1469234296227-6-r-00003
2016-07-27 18:13:50 362 staff/job=Sales/l1469234296227-6-r-00003
# hdfs dfs -ls s3a://s3atables/
Found 1 items
drwxrwxrwx - 0 2016-07-27 14:39 s3a://s3atables/country
# hdfs dfs -ls s3a://s3atables/country
Found 1 items
-rw-rw-rw- 1 922 2016-07-27 14:51 s3a://s3atables/country/country.csv
By using an S3a client, you can create a bucket and add files in the S3 Object Storage service.
# hdfs dfs -ls -R s3a://s3atables/staff
drwxrwxrwx - 0 2016-07-27 18:10 s3a://s3atables/staff/job=Clerk
-rw-rw-rw- 1 367 2016-07-27 18:11 s3a://s3atables/staff/job=Clerk/l1469234296227-6-r-00000
drwxrwxrwx - 0 2016-07-27 17:57 s3a://s3atables/staff/job=Mgr
-rw-rw-rw- 1 286 2016-07-27 18:03 s3a://s3atables/staff/job=Mgr/l1469234296227-6-r-00003
drwxrwxrwx - 0 2016-07-27 18:08 s3a://s3atables/staff/job=Sales
-rw-rw-rw- 1 362 2016-07-27 18:13 s3a://s3atables/staff/job=Sales/l1469234296227-6-r-00003
Creating a Big SQL table with S3a Object Storage location
About this task
CREATE EXTERNAL HADOOP TABLE country (
SALESCOUNTRYCODE INT,
COUNTRY VARCHAR(40),
ISOTHREELETTERCODE VARCHAR(3),
ISOTWOLETTERCODE VARCHAR(2),
ISOTHREEDIGITCODE VARCHAR(3),
CURRENCYNAME VARCHAR(50),
EUROINUSESINCE TIMESTAMP
) LOCATION 's3a://s3atables/country';
CREATE HADOOP TABLE staff (
ID SMALLINT,
NAME VARCHAR(9),
DEPT SMALLINT,
YEARS SMALLINT,
SALARY DECIMAL(7,2),
COMM DECIMAL(7,2)
) PARTITIONED BY (JOB VARCHAR(5))
LOCATION 's3a://s3atables/staff';
LOAD HADOOP USING FILE URL 's3a://data-files/staff.csv'
WITH SOURCE PROPERTIES (
'field.indexes'='1,2,3,5,6,7,4',
'replace.with.null'='NULL',
'replace.string.with.null'='NULL')
INTO TABLE staff;