Analyzing Apache Hadoop data (Execution Engine for Apache Hadoop)

You can build and train models on a Hadoop cluster. If you have data in a Hive or HDFS storage system on a Hadoop cluster, you can work with that data directly on the Hadoop cluster.

Service The Execution Engine for Apache Hadoop service is not available by default. An administrator must install this service on the IBM Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.

Within an analytics project with Watson Studio, you can find Hadoop environment definitions on the Environments page. See Hadoop environments.

You can use Hadoop environments in these ways:

This diagram shows how data scientists working in an analytics project on a Cloud Pak for Data cluster can train a notebook on a Hadoop cluster with data on the Hadoop cluster.

Hadoop architecture

Outside of Cloud Pak for Data, you can manage models and data on Hadoop clusters in these ways:

Learn more

Parent topic: Analyzing data and building models