Hive Metastore overview

Hive Metastore (HMS) is a service that stores metadata that is related to Presto (Java) and other services in a backend Relational Database Management System (RDBMS) or Hadoop Distributed File System (HDFS).

watsonx.data Developer edition

watsonx.data on Red Hat® OpenShift®

watsonx.data SaaS on AWS

When you create a new table, information that is related to the schema such as column names and data types is stored in the metastore relational database. A metastore enables the user to see the data files in the HDFS object storage as if they are stored in tables with HMS.

Metastore acts as a bridge between the schema of the table and the data files that are stored in object storages. HMS holds the definitions, schema, and other metadata for each table and maps the data files and directories to the table representation that is viewed by the user. Therefore, HMS is used as a storage location for the schema and tables. HMS is a metastore server that connects to the object storage to store data and keeps its related metadata on PostgreSQL.

Any database with a JDBC driver can be used as a metastore. Presto (Java) makes requests through thrift protocol to HMS. The Presto (Java) instance reads and writes data to HMS. HMS supports 5 backend databases as follows. In watsonx.data, PostgreSQL database is used.
  • Derby
  • MySQL
  • MS SQL Server
  • Oracle
  • PostgreSQL
Currently HMS in watsonx.data supports the Iceberg table format.
The following three modes of deployment are supported for HMS. In watsonx.data the remote mode is used.
  • Embedded Metastore - Derby with singe session.
  • Local Metastore - MySQl with multiple sessions accessible locally.
  • Remote Metastore - metastore runs on its own separate JVM and is accessible by using thrift network APIs.

For more information about exposing Hive metastore ports, see Exposing Hive metastore ports.

For more information about managing access, see Managing access to the Hive Metastore.

For more information about accessing HMS from outside of the OpenShift Container Platform cluster, see Accessing Hive Metastore (HMS) using NodePort.