Automatic Hive catalog syncing to the Big SQL catalog
Tables that are created, altered, or dropped by Hive clients can have the associated catalog changes automatically applied to Big SQL.
Enabling automatic syncing
Automatic syncing of the Big SQL catalog and the Hive metastore is enabled by default. If you have disabled automatic syncing (see Disabling automatic syncing), you can enable this feature again by selecting the Enable Metadata Sync service actions menu item.
db2 "CALL SYSPROC.ADMIN_TASK_ADD ('Synchronise MetaData Changes from Hive',
NULL, NULL, NULL, NULL, 'SYSHADOOP', 'HCAT_AUTOSYNC_OBJECTS', NULL, NULL, NULL)
db2 "CALL SYSHADOOP.HCAT_SYNC_OBJECTS('.*', '.*', 'a', 'REPLACE', 'CONTINUE', 'TRANSFER OWNERSHIP TO HIVEOWNER')"
Running Metadata Sync
You can invoke the HCAT_SYNC_OBJECTS stored procedure for all current schemas and tables by selecting the Run Metadata Sync service actions menu item. With impersonation enabled, this action simply reports that impersonation is turned on. Users are responsible for synchronizing their own tables and schemas. The bigsql user does not necessarily have permission to query a table in the HDFS, and invoking this action as bigsql for all user schemas and tables might result in permissions errors.
Monitoring the synchronization
The bigsql.log file receives log information directly from the HCAT_AUTOSYNC_OBJECTS process.
Automatic syncing uses a scheduled DB2 administrative task scheduler (ATS) sync procedure. The
description of the sync procedure is Synchronise MetaData Changes from Hive
. You can monitor
this procedure like any other ATS scheduled procedures. For information about ATS and how to monitor
such procedures, see Administrative task scheduler.
By default, event files are placed in the HDFS under /user/bigsql/sync. The file pattern for those files is fixed, and only files that match this pattern are processed. The general file pattern is DDL-timestamp-UUID.json. For example: DDL-[0-9]{17}-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.json. If files are left in this directory for some time, it is good practice to investigate. If you disable automatic syncing, files collect in this directory but are not processed. You must determine whether or not such files should be retained. Alternatively, you can manually remove the hive.metastore.event.listeners entry and restart Hive. In this case, files do not accumulate in the sync directory.
Disabling automatic syncing
db2 "CALL SYSPROC.ADMIN_TASK_REMOVE ('Synchronise MetaData Changes from Hive',NULL)"
Synchronization errors
In the event of a synchronization failure, the associated event-file is moved from the events-directory (by default: /user/bigsql/sync) to an "errors" sub-directory (by default: /user/bigsql/sync/errors). It is good practice to monitor this directory for any synchronization errors and address any issues where necessary.
When any issues have been addressed, and where synchronization for the associated object(s) is still required, the relevant event-files can then be moved back into the events-directory (by default: /user/bigsql/sync) where automatic synchronization will take place.