Getting started with IBM Distributed Accelerated ML library
The pai4sk conda package includes the IBM® accelerated Machine Learning library. The main component of this library includes SnapML APIs. Snap ML is a library for training generalized linear models. It is being developed at IBM with the vision to remove training time as a bottleneck for machine learning applications. The pai4sk package is available on x86 and ppc64le.
Snap ML supports many classical machine learning models and scales gracefully to data sets with billions of examples or features. Snap ML training can be performed in a single machine or distributed across a cluster of machines. It also offers GPU acceleration and supports sparse data structures. The library is exposed through a Python API compatible with sklearn and can seamlessly be integrated into existing Python application. The following APIs are supported:
- LogisticRegression
- LinearRegression
- SupportVectorMachine
- SnapBoost - WML CE comes with a boosting algorithm that can be used to construct an ensemble of decision trees. It can be used for both classification and regression tasks. In contrast to other boosting frameworks, SnapBoost does not use a fixed maximal tree depth at each boosting iteration. Instead, the tree depth is sampled at each boosting iteration according to a discrete uniform distribution.
- DecisionTreeClassifier (Single GPU)
- RandomForestClassifier (MultiGPU, Single Node)
SnapML uses a proprietary data format named snap
for efficient data loading for
both single and multiple node training. The following list is a set of APIs provided to load and
store the data sets in snap
format,
Because pai4sk
is built on scikit-learn
library version 0.22.1, it can be used as a replacement for
scikit-learn. Some of the APIs are accelerated by making use of SnapML and cuML. Accelerated algorithms from cuML
can be used through pai4sk only if the cuML package is installed on the Power® architecture. This module will automatically
fall back to original scikit-learn behavior when SnapML or cuML
does not provide
the necessary support. The following links are a list of such APIs:
- Ridge
- Lasso
- LogisticRegression
- SupportVectorMachine LinearSVC
- Clustering KMeans (Dependents on cuML)
- Clustering DBSCAN (Dependents on cuML)
- Decomposition PCA (Dependents on cuML)
- Decomposition TruncatedSVD (Dependents on cuML)
- Dataset load_svmlight_file
- Metrics log_loss
- Metrics accuracy_score
- Metrics hinge_loss
- Metrics mean_squared_error
- Metrics Similarity Search
- With
snaprun
ormpirun
, even when running on multiple nodes in WML Accelerator, the usual MPI usage restrictions and guidelines apply. For example:- To run as root user, pass
--allow-run-as-root
tompirun
. - In a distributed environment, you can have more than one network interfaces for each host. In
such a scenario, by default, OpenMPI uses any and all interfaces that are
up
to communicate with a host. To avoid problems in such cases, you can tell MPI to use specific interfaces. For example:snaprun --mpiarg "--mca btl_tcp_if_include ib0" ...
snaprun --mpiarg "--mca btl_tcp_if_exclude lo,enp1s0f2" ...
- Spectrum MPI requires the usage of ptrace. By default, Ubuntu does not allow this.
To turn on this capability, run the following command on the Ubuntu host
system:
sudo bash -c "echo '0' > /proc/sys/kernel/yama/ptrace_scope"
More details are available on the Spectrum MPI website.
- To run as root user, pass
- If you are using mpirun instead of snaprun, consider the
following recommendations:
- On single system without an InfiniBand set up, use
--pami_noib
option of mpirun. - On multiple systems without an InfiniBand set up, use
-mca btl tcp,self
instead of-tcp
option of mpirun.
- On single system without an InfiniBand set up, use
- On IBM Power machines, it is recommended to run the similarity search applications in distributed mode using MPI when using the CPU functions.
Without WML Accelerator, in WML CE, pai4sk can use up to two GPUs on a single node.
To run pai4sk
applications in a distributed way, use snaprun to
start the application as follows:
- Determines the necessary arguments to pass to MPI based on the current environment and version of MPI.
- Tests connections to the hosts, including the correct setup of ssh keys.
- Verifies that
pai4sk
is installed across the hosts. - Detects the hardware configuration of the hosts, including GPU count, and generates a valid topology.
- Generates the necessary
rankfile
, providing options to specify more specific topology details. - Constructs, displays, and executes the mpirun command needed to distribute jobs to each node.
Run snaprun -h
to get the usage details of this tool.
Example programs for each of the above mentioned APIs are provided as part of the conda package.
To find out how to run the sample programs, refer to the READMEs placed under
$CONDA_PREFIX/pai4sk/local-examples/
and
$CONDA_PREFIX/pai4sk/mpi-examples/
.
Sample Jupyter notebooks are provided in this github repository.