Summary of changes

This topic summarizes changes to IBM Spectrum® Scale Big Data and Analytics (BDA) support section.

For information about IBM Spectrum Scale changes, see the IBM Spectrum Scale Summary of changes.

For information about BDA feature support, see the List of stabilized, deprecated, and discontinued features section under the Summary of changes.

For information about the resolved IBM Spectrum Scale APARs, see IBM Spectrum Scale APARs Resolved.

Summary of changes as updated, June 2021

Changes in HDFS Transparency 3.1.1-5 in IBM Spectrum Scale 5.1.1.1
  • Fixed the handling of the file listing. Therefore, the java.nio.file.NoSuchFileException warning messages will no longer occur.
  • Fixed the handling of getBlockLocation RPC on files that do not exist. This prevented the YARN ResourceManager to start after configuring the node labels directory.
  • From HDFS Transparency 3.1.1-5, the gpfs_tls_configuration.py script automates the configuration of Transport Layer Security (TLS) on the CES HDFS Transparency cluster.
Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.3.1 in IBM Spectrum Scale 5.1.1.1
  • From Toolkit version 1.0.3.1, creating multiple CES HDFS clusters using the IBM Spectrum Scale installation toolkit during the same deployment run is supported.

Summary of changes as updated, May 2021

Changes in Cloudera Data Platform Private Cloud Base
  • From CDP Private Cloud Base 7.1.6, Impala is certified on IBM Spectrum Scale 5.1.1 on x86_64.

Summary of changes as updated, April 2021

Changes in Cloudera Data Platform Private Cloud Base
  • CDP Private Cloud Base 7.1.6 is certified with IBM Spectrum Scale 5.1.1.0. This CDP Private Cloud Base version supports Transport Layer Security (TLS) and HDFS encryption.
Changes in HDFS Transparency 3.1.1-4
  • Fixed the mmhdfs command to recognize short hostname configuration for NameNodes and Data Nodes. Therefore, The node is not a namenode or datanode error message will no longer occur.
  • The IBM Spectrum Scale file systems is now explicitly checked in mount and unmount callbacks during HDFS Transparency startup and shutdown process. Unrelated IBM Spectrum Scale file systems no longer affect HDFS Transparency. This means HDFS Transparency will start only if the relevant mount point is properly mounted and will stop if the relevant mount point is unmounted based on the HDFS Transparency status checking in the IBM Spectrum Scale event callback process.
  • HDFS Transparency NameNode log now contains the HDFS Transparency full version information and the gpfs.encryption.enable value.
  • Added general security fixes and CVE-2020-4851 in IBM® Support.
  • Added new custom json file method for the Kerberos script. For more information, see Configuring Kerberos using the Kerberos script provided with IBM Spectrum Scale.
Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.3.0

Summary of changes as updated, March 2021

Changes in IBM Spectrum Scale CES HDFS Transparency
  • IBM Spectrum Scale CES HDFS Transparency now supports both the NameNode HA and non-HA options. Also, DataNode can now have Hadoop services colocated within the same node. For more information, see Alternative architectures.
Changes in Mpack version 2.7.0.9
  • The Ambari maintenance mode for clusters is now supported by the IBM Spectrum Scale service on gpfs.storage.type with shared or remote environments. Earlier, when the user performed a Start all or Stop all operation from the Ambari GUI, the IBM Spectrum Scale service or its components used to start or stop respectively even when they were set to maintenance mode. For more information and limitations, see the Ambari maintenance mode support for IBM Spectrum Scale service section.
  • The Mpack upgrade process does not re-initialize the following HDFS parameters to the Mpack’s recommended settings:
    • dfs.client.read.shortcircuit
    • dfs.datanode.hdfs-blocks-metadata.enabled
    • dfs.ls.limit
    • dfs.datanode.handler.count
    • dfs.namenode.handler.count
    • dfs.datanode.max.transfer.threads
    • dfs.replication
    • dfs.namenode.shared.edits.dir

    Earlier any updates to these parameters by the end user were overwritten. As this issue is now fixed, any customized hdfs-site.xml configuration will not be changed during the upgrade process.

  • In addition to Check Integration Status option in the Ambari service, you can now view the Mpack version/build information in version.txt in the Mpack tar.gz package.
  • The hover message for the GPFS Quorum Nodes text field within the IBM Spectrum Scale service GUI has been updated. The hostnames to be entered for the Quorum Nodes should be from the IBM Spectrum Scale Admin network hostnames.
  • The Mpack uninstaller script cleans up the IBM Spectrum Scale Ambari stale link that is no longer required. Therefore, the Ambari server restart will not fail because of the Mpack dependencies.
  • The Mpack install, upgrade and uninstall script now supports the sudo root permission.
  • The anonymous UID verification is checked only if hadoop.security.authentication is not set to Kerberos.
  • The IBM Spectrum Scale service can now monitor the status of configured filesystem mount point (gpfs.mnt.dir).
    In earlier releases of Mpack, the IBM Spectrum Scale service was able to monitor only the status of the IBM Spectrum Scale runtime daemon.
    If any of the configured filesystem is not mounted on the IBM Spectrum Scale node, the status for the GPFS_NODE component for that node will now appear as down in the Ambari GUI.

Summary of changes as updated, January 2021

Changes in Cloudera Data Platform Private Cloud Base

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale is supported on Power®. For more information, see Support Matrix.

Changes in HDFS Transparency 3.1.0-7
  • Fixed the NullPointerException error message appearing in the NameNode logs.
  • Fixed the JMX output to correctly report "open" operations when the gpfs.ranger.enabled parameter is set to scale.

Documentation update

Configuration options for using multiple threads to list a directory and load the metadata of its children are provided for HDFS Transparency 3.1.1-3 and 3.1.0-6. For more information, see the list option.

Summary of changes as updated, December 2020

Changes in HDFS Transparency 3.1.1-3
  • HDFS Transparency implements performance enhancement using fine-grained file system locking mechanism. After HDFS Transparency 3.1.1-3 is installed, ensure that the gpfs.ranger.enabled field is set to scale in /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml. For more information, see Setting configuration options in CES HDFS.
  • The create Hadoop users and groups script and the create Kerberos principals and keytabs script in IBM Spectrum Scale now reside in the /usr/lpp/mmfs/hadoop/scripts directory.
  • Requires Python 3.6 or later.
Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.2-1
  • The toolkit installation failure due to nodes that are not a part of the CES HDFS cluster and does not have JAVA installed and do not have JAVA_HOME set is now fixed.
  • The following proxyuser configurations were added into core-site.xml by the installation toolkit to configure a CES HDFS cluster:
    hadoop.proxyuser.livy.hosts=*
    hadoop.proxyuser.livy.groups=*
    hadoop.proxyuser.hive.hosts=*
    hadoop.proxyuser.hive.groups=*
    hadoop.proxyuser.oozie.hosts=*
    hadoop.proxyuser.oozie.groups=*
Changes in IBM Spectrum Scale Cloudera Custom Service Descriptor (CDP CSD) 1.0.0-0
  • Integrates IBM Spectrum Scale service into CDP Private Cloud Base Cloudera Manager.
Start of change

Summary of changes as updated, November 2020

Changes in HDFS Transparency 3.1.1-2
  • Start of changeSupports CDP Private Cloud Base. For more information, see Support Matrix.End of change
  • Includes Hadoop sample scripts to create users and groups in IBM Spectrum Scale and set up the Kerberos principals and keytabs. Requires Python 3.6 or later.
  • Start of changeSummary operations (for example, du, count, etc.) in HDFS Transparency can be now done multi-threaded based on the number of files and sub-directories. It improves the performance when performing the operation on a path that contains a lot of files and sub-directories. The performance improvement depends on the system environment. For more information, see Functional limitations.End of change
Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.2-0
  • Added support to deploy CES HDFS in SLES 15 and Ubuntu 20.04 on x86_64 platforms.
  • Package renamed from bda_integration-<version>.noarch.rpm to gpfs.bda-integration-<version>.noarch.rpm .
  • Requires Python 3.6 or later.
Start of changeChanges in IBM Spectrum Scale Cloudera Custom Service Descriptor (CDP CSD) 1.0.0-0 EA
  • Integrates IBM Spectrum Scale service into CDP Private Cloud Base Cloudera Manager.
End of change
End of change

Summary of changes as updated, October 2020

Changes in HDFS Transparency 3.1.0-6
  • HDFS Transparency now implements performance enhancement using the fine-grained file system locking mechanism instead of using the Apache Hadoop global file system locking mechanism. From HDFS Transparency 3.1.0-6, set gpfs.ranger.enabled to scale from the HDP Ambari GUI under the IBM Spectrum Scale service config panel. If you are not using Ambari, set gpfs.ranger.enabled in /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml as follows:
    <property>
    <name>gpfs.ranger.enabled</name>
    <value>scale</value>
    <final>false</final>
    </property>
    Note: The scale option replaces the original true/false values.
  • Start of changeSummary operations (for example, du, count, etc.) in HDFS Transparency can be now done multi-threaded based on the number of files and sub-directories. It improves the performance when performing the operation on a path that contains a lot of files and sub-directories. The performance improvement depends on the system environment. For more information, see Functional limitations.End of change

Summary of changes as updated, August 2020

Changes in Mpack version 2.7.0.8

For Mpack 2.7.0.7 and earlier, a restart of the IBM Spectrum Scale service would overwrite the IBM Spectrum Scale customized configuration if the gpfs.storage.type parameter was set to shared.

From Mpack 2.7.0.8, if the gpfs.storage.type parameter is set to shared or shared,shared, the IBM Spectrum Scale service will not set the IBM Spectrum Scale tunables, that are seen under the IBM Spectrum Scale service, back to the IBM Spectrum Scale cluster or file system. For more information, see Support Matrix.

Summary of changes as updated, July 2020

Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.1.1
  • Supports rolling upgrade of HDFS Transparency through installation toolkit.
    Note: If the SMB protocol is enabled, all protocols are required to be offline for some time because the SMB does not support the rolling upgrade.
  • Requires IBM Spectrum Scale 5.0.5.1 and HDFS Transparency 3.1.1-1. For more information, see CES HDFS Support matrix.
  • From IBM Spectrum Scale 5.0.5.1, only one CES-IP is needed for one HDFS cluster during installation toolkit deployment.
Changes in HDFS Transparency 3.1.0-5
  • When gpfs.replica.enforced is set to gpfs, client replica setting is not honored. Convert the WARN namenode.GPFSFs (GPFSFs.java:setReplication(123)) - Set replication operation invalid when gpfs.replica.enforced is set to gpfs message to Debug, because this message can occur many times in the NameNode log.
  • Fixed NameNode hangs when you are running the mapreduce jobs because of the lock synchronized issue.
  • From IBM Spectrum Scale 5.0.5, the gpfs.snap --hadoop can access the HDFS Transparency logs from the user configured directories.
  • From HDFS Transparency 3.1.0-5, the default value for dfs.replication is 3 and gpfs.replica.enforced is gpfs. Therefore, it uses the IBM Spectrum Scale file system replication and not the Hadoop HDFS replication. Also, increasing the dfs.replication value to 3 helps the hdfs client to tolerate the DataNode failures.
    Note: You need to have at least three DataNodes when you set the dfs.replication to 3.
  • Changed permission mode for editlog files to 640.
  • For two file systems, HDFS Transparency ensures that the NameNodes and DataNodes are stopped before unmounting the second file system mount point.
    Note: The local directory path for the second file system mount usage is not removed. Ensure this local directory path is empty before starting the NameNode.
  • HDFS Transparency does not manage the storage. Therefore, the Apache Hadoop block function call used for native HDFS gives a false metric information. Therefore, HDFS Transparency does not execute the Apache Hadoop block function calls.
  • Start of changeDelete operations in HDFS Transparency can be now done multi-threaded based on the number of files and sub-directories. It improves performance when deleting a path that contains a lot of files and sub-directories. The performance improvement depends on the system environment. For more information, see Functional limitations.End of change
Changes in Mpack version 2.7.0.7
  • Supports HDP upgrade with Mpack 2.7.0.7 without unintegrating HDFS Transparency. For more information, see Support Matrix and the upgrade procedures in Upgrading HDP overview.
  • The Mpack 2.7.0.7 supports Ambari version 2.7.4 or later. For more information, see Support Matrix.
  • The installation and upgrade scripts now support complex KDC password when Kerberos is enabled.
  • You can now upgrade from older Mpacks (versions 2.7.0.x) to Mpack 2.7.0.7 if Kerberos is enabled without executing the workaround Upgrade failures from Mpack 2.7.0.3 or earlier to Mpack 2.7.0.4 - 2.7.0.6.
  • The upgrade postEU process is now simplified and can now automatically accept the user agreement license.
  • The upgrade postEU option now requests the user inputs only once during the upgrade process.
  • During the Mpack install or upgrade process, the backup directory created by the Mpack installer now includes a date timestamp added to the directory name.
  • The Check Integration Status UI action in Spectrum Scale service now shows the unique Mpack build ID.
  • If you are enabling Kerberos after integrating Spectrum Scale service, ZKFC start used to fail because the hdfs_jaas.conf file was missing. The zkfc fails to start when Kerberos is enabled workaround is no longer required.
  • Ambari now supports rolling restart for NameNodes and DataNodes.
  • The configuration changes will be in effect after you restart the NameNodes and DataNodes and do not require all the HDFS Transparency nodes to be restarted.
  • If the SSL is enabled, the upgrade script asks for the hostname instead of the IP address.
  • The upgrade script requesting true/false inputs are no longer case sensitive.
  • When deployment type is set to gpfs.storage.type=shared, a local GPFS cluster would be created even if the bi-directional passwordless ssh was not set up properly between the GPFS Master and the ESS contact node. This issue is now fixed. The deployment fails in such scenarios and an error message is displayed.
  • If you are using IBM Spectrum Scale 4.2.3.2, Ambari service hangs because the mmchconfig would be prompting for an ENTER feedback for the LogFileSize parameter. From Mpack 2.7.0.7, the LogFileSize configuration cannot be modified. The LogFileSize parameter can be configured only through the command line using the mmchconfig command.

Summary of changes as updated, May 2020

Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.1.0
  • Supports offline upgrade of HDFS Transparency.
  • Requires IBM Spectrum Scale 5.0.5 and HDFS Transparency 3.1.1-1. For more information, see CES HDFS Support matrix.

Changes in HDFS Transparency 3.1.1-1

  • A check is performed while you are running the mmhdfs config upload command to ensure that the ces_group_name is consistent with the HDFS Transparency dfs.nameservices values.
  • From IBM Spectrum Scale 5.0.5, the gpfs.snap --hadoop can now access the HDFS Transparency logs from the user configured directories.
  • From HDFS Transparency 3.1.1-1, the default value for dfs.replication is 3 and gpfs.replica.enforced is gpfs. Therefore, it uses the IBM Spectrum Scale file system replication and not the Hadoop HDFS replication. Also, increasing the dfs.replication value to 3 helps the hdfs client to tolerate the DataNode failures.
    Note: You need to have at least three DataNodes when you set the dfs.replication to 3.
  • Start of changeFixed NameNode hangs when you are running the mapreduce jobs because of the lock synchronized issue.End of change
CES HDFS changes
  • From IBM Spectrum Scale 5.0.5, HDFS Transparency version 3.1.1-1 and Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) version 1.0.1.0, HDFS Transparency and Toolkit for HDFS packages are signed with a GPG (GNU Privacy Guard) key and can be deployed by the IBM Spectrum Scale installation toolkit.
    For more information, go to IBM Spectrum Scale Knowledge Center and see the following topics:
    • Installation toolkit changes subsection under the Summary of changes topic.
    • Limitations of the installation toolkit topic under the Installing > Installing IBM Spectrum Scale on Linux nodes and deploying protocols > Installing IBM Spectrum Scale on Linux nodes with the installation toolkit.

Summary of changes as updated, March 2020

Changes in IBM Spectrum Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) 1.0.0.1
  • Supports deployment on ESS.
  • Supports remote mount file system only for CES HDFS protocol.
  • Requires IBM Spectrum Scale 5.0.4.3 and HDFS Transparency 3.1.1-0. For more information, see CES HDFS Support matrix.

Summary of changes as updated, January 2020

Changes in HDFS Transparency 3.1.1-0
  • Integrates with CES protocol and IBM Spectrum Scale installation toolkit.
  • Supports Open Source Apache Hadoop distribution and RedHat Enterprise Linux® operating systems.
Changes in HDFS Transparency 3.1.0-4
  • Export NODE_HDFS_MAP_GPFS commented line into hadoop-env.sh file for mmhadoopctl multi-network usage.
  • Fixed data replicate with AFM DR disk usage due to shrinkfit.
  • Fixed Job will not fail if one DataNode failed when using gpfs.replica, enforced=gpfs, gpfs.storage.type and dfs.replication > 1 in shared mode.
  • Change to log warning messages for outdated clusterinfo and diskinfo files.
  • Fixed deleting a file issue on the 2nd file system when trash is enabled in a two file system configuration.
  • Use default community defined port number for dfs.datanode (address, ipc.address and http.address) to reduce port conflicts with ephemeral ports.
  • Fixed hadoop df output that was earlier not consistent with the POSIX df output when 2 FS is configured.
  • Fixed dfs -du that was earlier displaying wrong free space value.
Changes in Mpack version 2.7.0.6
  • Supports HDP 3.1.5.

Summary of changes as updated, November 2019

Changes in Mpack version 2.7.0.5
  • The Mpack Installation script SpectrumScaleMPackInstaller.py will no longer ask for the KDC credentials, even when the HDP Hadoop cluster is Kerberos enabled. The KDC credentials are only required to be setup before executing the IBM Spectrum Scale service Action "Unintegrated Transparency".
  • If you are deploying the IBM Spectrum Scale service in a shared storage configuration (gpfs.storage.type=shared), the Mpack will check for consistency of UID, GID of the anonymous user only on the local GPFS nodes. The Mpack will not perform this check on the ESS nodes.
  • If you are deploying the IBM Spectrum Scale service with two file system support with gpfs.storage.type=shared,shared or gpfs.storage.type=remote,remote, then the Block Replication in HDFS (dfs.blocksize) will default to 1.
  • From Mpack 2.7.0.5, the issue of having all the nodes managed by Ambari to be set as GPFS nodes during deployment is fixed. For example, if you set some nodes as Hadoop client nodes and some nodes as GPFS nodes for HDFS Transparency NameNode and DataNodes, the deployment will succeed.
  • In Mpack 2.7.0.4, if the gpfs.storage.type was set to shared, stopping the Scale service from Ambari would report a failure in the UI even if the operation had succeeded internally. This issue has been fixed in Mpack 2.7.0.5.
  • IBM Spectrum Scale Ambari deployment can now support gpfs.storage.type=shared,shared mode. For more information, see Configuring multiple file system mount point access.

Summary of changes as updated, October 2019

IBM Erasure Code Edition (ECE) is supported as shared storage mode for Hadoop with HDFS Transparency 3.1.0-3 and IBM Spectrum Scale 5.0.3.

Summary of changes as updated, September 2019

Changes in HDFS Transparency 3.1.0-3
  • Validate open file limit when starting Transparency.
  • mmhadoopctl supports dual network configuration when NODE_HDFS_MAP_GPFS is set in /var/mmfs/hadoop/etc/hadoop/hadoop-env.sh. See section mmhadoopctl supports dual network for more details.
Changes in Mpack version 2.7.0.4
  • For FPO clusters, the restripeOnDiskFailure value will be set to NO regardless of the original set value during the stopping of GPFS Master components. Once the GPFS Master stop completes, the restripeOnDiskFailure value will be set back to its original value.
  • The IBM Spectrum Scale service will do a graceful shutdown and will no longer do a force unmount of the GPFS file system via mmunmount -f.
  • Seeing intermittent failure of one of the HDFS Transparency NameNodes at the startup due to the timing issue when both the NameNode HA and Kerberos are enabled has now been fixed.
  • The HDFS parameter dfs.replication is set to the mmlsfs -r value (Default number of data replicas) of the GPFS file system for gpfs.storage.type=shared instead of the Hadoop replication value of 3.
  • The Mpack installer (*.bin) file can now accept the license silently when the --accept-licence option is specified.

Summary of changes as updated, May 2019

Changes in HDFS Transparency 3.1.0-2
  • Issue fixed when a map reduce task fails after running for one hour when the Ranger is enabled.
  • Issue fixed when Hadoop permission settings do not work properly in a kerberized environment.
Documentation updates
  • Updated the Migrating IOP to HDP for BI 4.2.5 and HDP 2.6 information.

Summary of changes as updated, March 2019

Changes in Mpack version 2.7.0.3
  • Supports dual network configuration
  • Issue fixed to look only at the first line in the shared_gpfs_node.cfg file to get the host name for shared storage so the deployment of shared file system would not hang.
  • Removed gpfs_base_version and gpfs_transparency_version fields from the IBM Spectrum Scale service configuration GUI. This removes the restart all required after IBM Spectrum Scale is deployed.
  • Mpack can now find the correct installed HDP version when multiple HDP versions are seen.
  • IBM Spectrum Scale service is now be able to handle hyphenated file system names so that the service will be able to start properly during file system mount.
  • IBM Spectrum Scale entry into system_action_definitions.xml is fixed. Therefore, the IBM Spectrum Scale </actionDefinition> ending tag is not on the same line as the </actionDefinitions> tag. Otherwise, there is a potential install issue when a new service is added after IBM Spectrum Scale service because the new service is added in between the IBM Spectrum Scale entry and the </actionDefinition></actionDefinitions> line.
HDFS Transparency 3.1.0-1
  • Fixed Hadoop du to calculate all files under all sub-directories for the user even when the files have not been accessed.
  • Supports ViewFS in HDP 3.1 with Mpack 2.7.0.3.

Summary of changes as updated, February 2019

Changes in Mpack version 2.7.0.2
  • Supports HDP 3.1.
  • SLES 12 SP3 support for new installs on x86 64 only.
  • Upgrade the HDFS Transparency on all nodes in the IBM Spectrum Scale cluster instead of just upgrading it only on the NameNode and DataNodes.

Summary of changes as updated, December 2018

Changes in Mpack version 2.7.0.1
  • Supports HDP 3.0.1.
  • Supports preserving Kerberos token delegation during NameNode failover.
  • IBM Spectrum Scale service Stop All/Start All service actions now support the best practices for IBM Spectrum Scale stop/start as per Restarting a large IBM Spectrum Scale cluster topic in the IBM Spectrum Scale: Administration Guide.
  • The HDFS Block Replication parameter, dfs.replication, is automatically set to match the actual value of the IBM Spectrum Scale Default number of data replicas parameter, defaultDataReplicas, when adding the IBM Spectrum Scale service for remote mount storage deployment model.
HDFS Transparency 3.1.0-0
  • Supports preserving Kerberos token delegation during NameNode failover.
  • Fixed CWE/SANS security exposures in HDFS Transparency.
  • Supports Hadoop 3.1.1

Summary of changes as updated, October 2018

Changes in Mpack version 2.4.2.7
  • Supports preserving Kerberos token delegation during NameNode failover.
  • IBM Spectrum Scale service Stop All/Start All service actions now support the best practices for IBM Spectrum Scale stop/start as per Restarting a large IBM Spectrum Scale cluster topic in the IBM Spectrum Scale: Administration Guide.
HDFS Transparency 2.7.3-4
  • Supports preserving Kerberos token delegation during NameNode failover.
  • Supports native HDFS encryption.
  • Fixed CWE/SANS security exposures in HDFS Transparency.

Summary of changes as updated, August 2018

Changes in Mpack version 2.7.0.0
  • Supports HDP 3.0.
Changes in HDFS Transparency version 3.0.0-0
  • Supports HDP 3.0 and Mpack 2.7.0.0.
  • Supports Apache Hadoop 3.0.x.
  • Support native HDFS encryption.
  • Changed IBM Spectrum Scale configuration location from /usr/lpp/mmfs/hadoop/etc/ to /var/mmfs/hadoop/etc/ and default log location for open source Apache from /usr/lpp/mmfs/hadoop/logs to /var/log/transparency.
New documentation sections
  • Hadoop Scale Storage Architecture
  • Hadoop Performance tuning guide
  • Hortonworks Data Platform 3.X for HDP 3.0
  • Open Source Apache Hadoop

Summary of changes as updated, July 2018

Changes in Mpack version 2.4.2.6
  • HDP 2.6.5 is supported.
  • Mpack installation resumes from the point of failure when the installation is re-run.
  • The Collect Snap Data action in the IBM Spectrum Scale service in the Ambari GUI can capture the Ambari agents' logs into a tar package under the /var/log/ambari.gpfs.snap* directory.
  • Use cases where the Ambari server and the GPFS Master are colocated on the same host but are configured with multiple IP addresses are handled within the IBM Spectrum Scale service installation.
  • On starting IBM Spectrum Scale from Ambari, if a new kernel version is detected on the IBM Spectrum Scale node, the GPFS portability layer is automatically rebuilt on that node.
  • On deploying the IBM Spectrum Scale service, the Ambari server restart is not required. However, the Ambari server restart is still required when executing the Service Action > Integrate Transparency or Unintegrate Transparency from the Ambari UI.

Summary of changes as updated, May 2018

Changes in HDFS Transparency 2.7.3-3
  • Non-root password-less login of contact nodes for remote mount is supported.
  • When the Ranger is enabled, uid greater than 8388607 is supported.
  • Hadoop storage tiering is supported.
Changes in Mpack version 2.4.2.5
  • HDP 2.6.5 is supported.

Summary of changes as updated, February 2018

Changes in HDFS Transparency 2.7.3-2
  • Snapshot from a remote mounted file system is supported.
  • IBM Spectrum Scale fileset-based snapshot is supported.
  • HDFS Transparency and IBM Spectrum Scale Protocol SMB can coexist without the SMB ACL controlling the ACL for files or directories.
  • HDFS Transparency rolling upgrade is supported.
  • Zero shuffle for IBM ESS is supported.
  • Manual update of file system configurations when root password-less access is not available for remote cluster is supported.
Changes in Mpack version 2.4.2.4
  • HDP 2.6.4 is supported.
  • IBM Spectrum Scale admin mode central is supported.
  • The /etc/redhat-release file workaround for CentOS deployment is removed.

Summary of changes as updated, January 2018

Changes in Mpack version 2.4.2.3
  • HDP 2.6.3 is supported.

Summary of changes as updated, December 2017

Changes in Mpack version 2.4.2.2
  • The Mpack version 2.4.2.2 does not support migration from IOP to HDP 2.6.2. For migration, use the Mpack version 2.4.2.1.
  • From IBM Spectrum Scale Mpack version 2.4.2.2, new configuration parameters have been added to the Ambari management GUI. These configuration parameters are as follows:

    gpfs.workerThreads defaults to 512.

    NSD threads per disk defaults to 8.

    For IBM Spectrum Scale version 4.2.0.3 and later, gpfs.workerThreads field takes effect and gpfs.worker1Threads field is ignored. For versions lower than 4.2.0.3, gpfs.worker1Threads field takes effect and gpfs.workerThreads field is ignored.

    Verify if the disks are already formatted as NSDs - defaults to yes

  • Default values of the following parameters have changed. The new values are as follows:

    gpfs.supergroup defaults to hdfs,root now instead of hadoop,root.

    gpfs.syncBuffsPerIteration defaults to 100. Earlier it was 1.

    Percentage of Pagepool for Prefetch defaults to 60 now. Earlier it was 20.

    gpfs.maxStatCache defaults to 512 now. Earlier it was 100000.

  • The default maximum log file size for IBM Spectrum Scale has been increased to 16 MB from 4 MB.

Summary of changes as updated, October 2017

Changes in Mpack version 2.4.2.1 and HDFS Transparency 2.7.3-1
  • The GPFS Ambari integration package is now called the IBM Spectrum Scale Ambari management pack (in short, management pack or MPack).
  • Mpack 2.4.2.1 is the last supported version for BI 4.2.5.
  • IBM Spectrum Scale Ambari management pack version 2.4.2.1 with HDFS Transparency version 2.7.3.1 supports BI 4.2/BI 4.2.5 IOP migration to HDP 2.6.2.
  • The remote mount configuration in Ambari is supported. (For HDP only)
  • Support for two IBM Spectrum Scale file systems/deployment models under one Hadoop cluster/Ambari management. (For HDP only)

    This allows you to have a combination of IBM Spectrum Scale deployment models under one Hadoop cluster. For example, one file system with shared-nothing storage (FPO) deployment model along with one file system with shared storage (ESS) deployment model under single Hadoop cluster.

  • Metadata operation performance improvements for Ranger enabled configuration.
  • Introduction of Short circuit write support for improved performance where HDFS client and Hadoop DataNodes are running on the same node.