IBM Watson Machine Learning Accelerator interim fixes

Interim fixes for IBM Watson® Machine Learning Accelerator.

The following interim fixes are available for IBM Watson Machine Learning Accelerator:
  • Interim fix 536919: includes fixes for IBM Watson Machine Learning Accelerator 1.2.1 and the IBM Spectrum Conductor upgrade from version 2.3.0 or 2.4.0 to 2.4.1
  • Interim fix 527174: includes fixes for IBM Watson Machine Learning Accelerator 1.2.1 and the IBM Spectrum Conductor upgrade from version 2.3.0 to 2.4.0.
  • Interim fix 526695: includes fixes for IBM Watson Machine Learning Accelerator 1.2.1 and IBM Spectrum Conductor 2.3.0
  • Interim fix 531936: includes the elastic distributed inference package which must be applied after interim fix 526695

Interim fix 536919

Interim fix 536919 includes the following updates:
  • Support for IBM Spectrum Conductor 2.4.1 with IBM Watson Machine Learning Accelerator 1.2.1, including upgrading IBM Spectrum Conductor from version 2.3.0 or 2.4.0 to version 2.4.1
  • Support for hyperparameter search plugins (RFE 137803), which includes enhancements to the deep learning API, including:
    • POST: deeplearning/v1/hypersearch/algorithm/install
    • GET: deeplearning/v1/hypersearch/algorithm[&type=BUILT-IN|USER_PLUGIN]
    • GET: deeplearning/v1/hypersearch/algorithm/{plugin_algorithm_name}
    • DELETE: deeplearning/v1/hypersearch/algorithm/{plugin_algorithm_name}
  • Addition of a user defined hyperparameter experiment (RFE 137804)
  • Fixed issues that hyperparameter optimization jobs remain in running state. Previously, when the hyperparameter optimization token expired after 8 hours, and tasks in RUNNING state were stopped and no longer updated. In this fix, the number of hours until expiry can be changed, and job tasks no longer remain in RUNNING state when a job is stopped.
  • Fixed performance issues with deep learning insight metrics in the cluster management console.
  • Fixed issues with obtaining the best search experiment result when loss value is negative.
  • Fixed issues involving job failure and the ps.conf in the temporary working directory.
  • Fixed issues with creating a hyperparameter optimizing task to use the set values provided by the experiment when running training job.
  • Enhancement to the deep learning API to include a new worker_logger parameter for elastic distributed training jobs when initializing the FabricModel class. The new parameter handles the callback from a job's test metric.

To apply these fixes for IBM Watson Machine Learning Accelerator, download this interim fix from IBM Fix Central and refer to the readme for installation instructions.

Interim fix 527174

Interim fix 527174 includes the following updates:
  • Fixed hyperparameter issue when finding the best hyperparameters during model creation.
  • Fixed CSV files dataset creation issue for split data.
  • Update the default Python version from version 2 to version 3.
  • Upgrade IBM Spectrum Conductor™ 2.3.0 to version 2.4.0.
  • Security fix to enable IBM Spectrum Conductor 2.4.0 to work with IBM Spectrum Conductor Deep Learning Impact 1.2.3.
To apply this fix and upgrade from IBM Spectrum Conductor 2.3.0 to version 2.4.0, download this interim fix from IBM Fix Central and refer to the readme for installation instructions.
Note: Interim fix 527174 cannot be applied to the evaluation version of IBM Watson Machine Learning Accelerator 1.2.1.
After applying this fix, note the following key changes:
  • After upgrading to IBM Spectrum Conductor version 2.4.0, the following new instance group templates are available: dli-sig-template-2.3.3, wmla-ig-template-2.3.3, and wmla-ig-edt-template-2.3.3.
  • Existing instance groups that were created with the old templates (dli-sig-template-2.2.0, wmla-ig-template-2.3.1, or wmla-ig-edt-template-2.3.1) are still available, however, the templates themselves are no longer available.
  • Existing instance group, which did not explicitly set a Python version (using the PYTHON_VERSION environment variable) prior to applying this fix, are updated to use Python 3.6.
  • For any instance groups where the Python version was explicitly set, the Python version remains unchanged after applying this fix.
  • For any new Spark instance groups, the default Python version is set to 3.6. To change the Python version of a new Spark instance group, you must configure and set the PYTHON_VERSION environment variable.

Interim fix 526695

Interim fix 526695 includes the following updates:
  • Fixed the functionality to find the best hyperparameters on model creation.
  • Fixed the CSV files dataset creation issue for split data.
  • Improved the accuracy of elastic distributed training.
  • Enhanced the train function in elastic distributed training to include the following options: validation_freq, checkpoint_freq and effective_batch_size.
  • Update the default Python version from version 2 to version 3.

To apply these fixes for IBM Watson Machine Learning Accelerator, download this interim fix from IBM Fix Central and refer to the readme for installation instructions.

Note:
  • Interim fix 526695 does not include the latest version of IBM Spectrum Conductor. If you want to upgrade to the latest version of IBM Spectrum Conductor, you must install interim fix 527174 instead.
  • If you are uninstalling this interim fix, any Spark instance groups, datasets, or models that are created after this interim fix is applied are still available in your environment after the fix is uninstalled.
  • Interim fix 526695 can be applied to the evaluation version of IBM Watson Machine Learning Accelerator 1.2.1.

Interim fix 531936

Obtain and install the elastic distributed inference package to have inference available as a service. To learn more about elastic distributed inference, see Elastic distributed inference. Inference is available in IBM Watson Machine Learning Accelerator as either a service using elastic distributed inference, or as a one time test. If elastic distributed inference is not installed, only the default test functionality is available.

Elastic distributed inference is available for IBM Watson Machine Learning Accelerator 1.2.1. It must be applied to a IBM Watson Machine Learning Accelerator 1.2.1 environment that has IBM Spectrum Conductor Deep Learning Impact 1.2.3 installed with interim fix 526695 applied. To start using elastic distributed inference, get it from IBM Fix Central.

To install elastic distributed inference, see: Installing the elastic distributed inference package.
Note: When installing the elastic distributed inference package, note the following:
  • You must first install the elastic distributed inference package on the master host before installing it on any compute hosts.
  • During installation, you must set EDI_SHARED_FS to the same value as DLI_SHARED_FS for the test functionality to be available in the cluster management console.

To uninstall elastic distributed inference, see: Uninstalling the elastic distributed inference package.