Hadoop Storage Tiering mode without native HDFS federation
This topic shows how to architect and configure a Hadoop Storage Tiering solution with a suite of test cases executed based on this configuration.
The architecture for the Hadoop Storage Tiering has a native HDFS cluster (local cluster), seen on the left hand side, and an IBM Storage Scale HDFS Transparency cluster (remote cluster), seen on the right hand side. The jobs running on the native HDFS cluster can access the data from the native HDFS or from the IBM Storage Scale HDFS Transparency cluster according to the input or output data path or from the metadata path. For example, Hive job from Hive metadata path.
Note: The Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side is not a requirement for
Hadoop Storage Tiering with IBM Storage Scale solution. This Hadoop cluster deployed on the IBM Storage Scale HDFS Transparency cluster side shows that a
Hadoop cluster can access data via HDFS or POSIX from the IBM Storage Scale file system.
This documentation configuration setup was done without the HDP components on the remote cluster.
This document used the following software versions for testing:
Clusters | Stack | Version |
---|---|---|
HDP cluster | Ambari | 2.6.1.0 |
HDP | 2.6.4.0 | |
HDP-Utils | 1.1.0.22 | |
IBM Storage Scale & HDFS Transparency cluster | IBM Storage Scale | 5.0.0 |
HDFS Transparency | 2.7.3-2 | |
IBM Storage Scale Ambari management pack | 2.4.2.4 |