Getting started with IBM Distributed Accelerated ML library

The pai4sk conda package includes the IBM® accelerated Machine Learning library. The main component of this library includes SnapML APIs. Snap ML is a library for training generalized linear models. It is being developed at IBM with the vision to remove training time as a bottleneck for machine learning applications. The pai4sk package is available on x86 and ppc64le.

Snap ML supports many classical machine learning models and scales gracefully to data sets with billions of examples or features. Snap ML training can be performed in a single machine or distributed across a cluster of machines. It also offers GPU acceleration and supports sparse data structures. The library is exposed through a Python API compatible with sklearn and can seamlessly be integrated into existing Python application. The following APIs are supported:

LogisticRegression
LinearRegression
SupportVectorMachine
SnapBoost - WML CE comes with a boosting algorithm that can be used to construct an ensemble of decision trees. It can be used for both classification and regression tasks. In contrast to other boosting frameworks, SnapBoost does not use a fixed maximal tree depth at each boosting iteration. Instead, the tree depth is sampled at each boosting iteration according to a discrete uniform distribution.
DecisionTreeClassifier (Single GPU)
RandomForestClassifier (MultiGPU, Single Node)

SnapML uses a proprietary data format named snap for efficient data loading for both single and multiple node training. The following list is a set of APIs provided to load and store the data sets in snap format,

Because pai4sk is built on scikit-learn library version 0.22.1, it can be used as a replacement for scikit-learn. Some of the APIs are accelerated by making use of SnapML and cuML. Accelerated algorithms from cuML can be used through pai4sk only if the cuML package is installed on the Power® architecture. This module will automatically fall back to original scikit-learn behavior when SnapML or cuML does not provide the necessary support. The following links are a list of such APIs:

Notes:

With snaprun or mpirun, even when running on multiple nodes in WML Accelerator, the usual MPI usage restrictions and guidelines apply. For example:
- To run as root user, pass --allow-run-as-root to mpirun.
- In a distributed environment, you can have more than one network interfaces for each host. In such a scenario, by default, OpenMPI uses any and all interfaces that are up to communicate with a host. To avoid problems in such cases, you can tell MPI to use specific interfaces. For example:
```
snaprun --mpiarg "--mca btl_tcp_if_include ib0" ...
```
```
snaprun --mpiarg "--mca btl_tcp_if_exclude lo,enp1s0f2" ...
```
- Spectrum MPI requires the usage of ptrace. By default, Ubuntu does not allow this. To turn on this capability, run the following command on the Ubuntu host system:
```
sudo bash -c "echo '0' > /proc/sys/kernel/yama/ptrace_scope"
```
More details are available on the Spectrum MPI website.
If you are using mpirun instead of snaprun, consider the following recommendations:
- On single system without an InfiniBand set up, use --pami_noib option of mpirun.
- On multiple systems without an InfiniBand set up, use -mca btl tcp,self instead of -tcp option of mpirun.
On IBM Power machines, it is recommended to run the similarity search applications in distributed mode using MPI when using the CPU functions.

Without WML Accelerator, in WML CE, pai4sk can use up to two GPUs on a single node. To run pai4sk applications in a distributed way, use snaprun to start the application as follows:

Determines the necessary arguments to pass to MPI based on the current environment and version of MPI.
Tests connections to the hosts, including the correct setup of ssh keys.
Verifies that pai4sk is installed across the hosts.
Detects the hardware configuration of the hosts, including GPU count, and generates a valid topology.
Generates the necessary rankfile, providing options to specify more specific topology details.
Constructs, displays, and executes the mpirun command needed to distribute jobs to each node.

Run snaprun -h to get the usage details of this tool.

Example programs for each of the above mentioned APIs are provided as part of the conda package. To find out how to run the sample programs, refer to the READMEs placed under $CONDA_PREFIX/pai4sk/local-examples/ and $CONDA_PREFIX/pai4sk/mpi-examples/.

Sample Jupyter notebooks are provided in this github repository.

Note: SnapBoost is a technology preview.