Configuring access to NoSQL data sources through NoSQL wrapper

To configure the federated server to access NoSQL data sources, you must provide the federated server with information about the data sources and objects that you want to access.

No-SQL (A.K.A not only SQL) database systems are distributed, non-relational databases designed for large-scale data storage and for massively-parallel data processing across a large number of commodity servers. NoSQL database is becoming more and more popular because of the advantage of processing ‘Big Data’. However, each NoSQL system has its own API and does not typically support standards as SQL, ODBC and JDBC. And the data returned by NoSQL system is always schema-less and the data type can’t be recognized from the returned data by programs automatically. So, the query result is hard to be handled by the formal relational based system.

Federation server NoSQL wrapper are designed to access different kind of NoSQL data sources, the NoSQL wrapper can utilize RESTFul API or NoSQL database native Java API to retrieve data from NoSQL databases. And NoSQL wrapper can translate SQL query to the corresponding NoSQL Java API or RESTFul API depends on the NoSQL database type. Since the data in NoSQL database are schema-less and the data format is various, for example, the data format can be JSON, AVRO, Parquet etc.

And since the primary target of NoSQL wrapper is transforming NoSQL result set to RDBMS result set, so different kind of parser are implemented to parse different kind of format of data. Once the data has been parsed, the NoSQL wrapper will transform NoSQL data to RDBMS result set, it takes data from parser as input and then transforms NoSQL data to RDBMS result set. During transforming, NoSQL wrapper first needs to get the schema of the query columns. The schema of NoSQL database is different with RDBMS in following ways: first some of the NoSQL database doesn’t have schema at all, and second some of the NoSQL database has schema bundled with data. For those databases which doesn’t have schema the NoSQL wrapper implements a special mechanism to let user to pre-define flexible schema before retrieving data. For those databases which have built-in schema the NoSQL wrapper can automatically retrieve the schema and stores it in catalog table for following using.

Federation server NoSQL wrapper supports access any kind of NoSQL database theoretically, it is especially good at accessing databases which stores schema-less data or semi-structure data. The ability of letting user define flexible schema for their data gives much advantages over other techniques.

The NoSQL database supported by NoSQL wrapper can be categorized by the data type in NoSQL database, at the current stage federation server NoSQL wrapper supports document store, parquet files.

Depending on data source type, currently, we divided NoSQL data sources into 2 types, one is document store data source, another one is HDFS parquet file.