Preparing to use HDFS
Processing jobs can store data in a Hadoop Distributed File System (HDFS) data lake.
About this task
Restriction: The Developer Edition does not support HDFS data storage.
HDFS is required if you enable the processing job for ingestion of raw events, typically for further reuse of your business data.
You can install IBM® Business Automation Insights with no HDFS storage and enable it later, as described in Configuring HDFS long-term storage.
New in 18.0.2 You can enable HDFS at initial configuration and disable it later, as described in Advanced updates.
- Supported HDFS versions
- IBM Business Automation Insights supports HDFS 2.7.x, 2.8.x, and 2.9.x.
- Storage bucket
- IBM Business Automation Insights requires a dedicated storage bucket for processing jobs to store data in HDFS.
- Permissions
- Processing jobs access HDFS with a user named bai. However, when Kerberos is enabled with
HDFS, processing jobs access HDFS with the name of the Kerberos principal. Therefore, depending on
your work case, make sure that the following prerequisites are met.
- A bai user or Kerberos user name exists on your Hadoop distribution file system (HDFS) system.
- A /user/bai or /user/<kerberos_user_name> directory exists.
- The user has write access to that directory.