Prerequisites for installing IBM Visual Insights
Before you can install either IBM® Visual Insights stand-alone or IBM Visual Insights with IBM Cloud® Private, you must configure Red Hat Enterprise Linux® (RHEL), enable the Fedora Extra Packages for Enterprise Linux (EPEL) repository, and install NVIDIA CUDA drivers.
See Planning for IBM Visual Insights to ensure that your environment meets all software and hardware requirements.
- Red Hat Enterprise Linux operating system and repository setup
- Ubuntu operating system and repository setup
- NVIDIA Components: IBM POWER9 specific udev rules (Red Hat only)
- Remove previously installed CUDA and NVIDIA drivers
- Install the GPU driver (RHEL)
- Install the GPU driver (Ubuntu)
- Verify the GPU driver
- Install Docker and nvidia-docker2 (RHEL)
- Install Docker and nvidia-docker2 (Ubuntu)
Red Hat Enterprise Linux operating system and repository setup
- Enable
common
,optional
, andextra
repo channels.IBM POWER8®:sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-le-rpms
IBM POWER9™:sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-9-rpms
x86:sudo subscription-manager repos --enable=rhel-7-server-optional-rpms
sudo subscription-manager repos --enable=rhel-7-server-extras-rpms
sudo subscription-manager repos --enable=rhel-7-server-rpms
- Install packages needed for the
installation.
sudo yum -y install wget nano bzip2
- Enable the Fedora Project Extra Packages for Enterprise Linux (EPEL)
repository:
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ihv epel-release-latest-7.noarch.rpm
- Load the latest kernel or do a full update:
- Load the latest kernel:
- For x86:
sudo yum install kernel-devel sudo yum update kernel kernel-devel kernel-tools kernel-tools-libs reboot
- For POWER:
sudo yum install kernel-devel sudo yum update kernel kernel-devel kernel-tools kernel-tools-libs kernel-bootwrapper reboot
- For x86:
- Do a full update:
sudo yum install kernel-devel sudo yum update sudo reboot
- Load the latest kernel:
- Install
Docker and configure it so that IBM Visual Insights
containers can use NVIDIA GPUs. For instructions, see Install Docker and nvidia-docker2 (RHEL).Note: docker-1.13.1-108.git4ef4b30.el7 has a known issue with the Nvidia GPUs. The docker-1.13.1-104.git4ef4b30.el7 version can explicitly be installed, or newer versions of RHEL Docker work as well. Ensure that docker-1.13.1-108.git4ef4b30.el7 is NOT installed.
Ubuntu operating system and repository setup
- Install packages needed for the
installation
sudo apt-get install -y wget nano apt-transport-https ca-certificates curl software-properties-common
- Ensure the kernel headers are installed and match the running
kernel. Compare the outputs of:
anddpkg -l | grep linux-headers kernel-package kernel-headers
Ensure that theuname -r
linux-headers
package version exactly match the version of the running kernel. If they are not identical, bring them in sync as appropriate:- Install missing packages.
- Update down level packages.
- Reboot the system if the packages are newer than the active kernel.
- Alternatively, do a full
update:
sudo apt-get update sudo apt-get dist-upgrade sudo reboot
NVIDIA Components: IBM POWER9 specific udev rules (Red Hat only)
- Copy the
/lib/udev/rules.d/40-redhat.rules
file to the directory for user overridden rules:sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
- Edit the
/etc/udev/rules.d/40-redhat.rules
file:sudo nano /etc/udev/rules.d/40-redhat.rules
- Comment out the entire "Memory hotadd request" section and save the change:
# Memory hotadd request #SUBSYSTEM!="memory", ACTION!="add", GOTO="memory_hotplug_end" #PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end" #ENV{.state}="online" #PROGRAM="/bin/systemd-detect-virt", RESULT=="none", ENV{.state}="online_movable" #ATTR{state}=="offline", ATTR{state}="$env{.state}" #LABEL="memory_hotplug_end"
- Optionally, delete the first line of the file, since the file was copied to a directory where it
cannot be
overwritten:
# do not edit this file, it will be overwritten on update
- Restart the system for the changes to take effect:
sudo reboot
Remove previously installed CUDA and NVIDIA drivers
Before installing the updated GPU driver, uninstall any previously-installed CUDA and NVIDIA drivers. Follow these steps:
- Remove all CUDA Toolkit and GPU driver packages.
You can display installed CUDA and driver packages by running these commands:
rpm -qa | egrep 'cuda.*(9-2|10-0|10-1)'
rpm -qa | egrep '(cuda|nvidia).*(396|410|418)\.'
Verify the list and remove with yum remove.
- Remove any CUDA Toolkit and GPU driver repository packages.
These should have been included in step 1, but you can confirm with this command:
rpm -qa | egrep '(cuda|nvidia).*repo'
Use yum remove to remove any that remain.
- Clean the yum repository:
sudo yum clean all
- Remove cuDNN and
NCCL:
sudo rm -rf /usr/local/cuda /usr/local/cuda-9.2 /usr/local/cuda-10.0 /usr/local/cuda-10.1
- Reboot the system to unload the GPU driver:
sudo shutdown -r now
Install the GPU driver (RHEL)
- Download the NVIDIA GPU driver:
- Go to NVIDIA Driver Download.
- Select Product Type: Tesla.
- Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
- Select Product: Tesla P100 or Tesla V100.
- Select Operating System, click Show all Operating Systems, then choose
the appropriate value:
- Linux POWER LE RHEL 7 for Power®
- Linux 64-bit RHEL7 for x86
- Select CUDA Toolkit: 10.2.
- Click SEARCH to go to the download link.
- Click Download to download the driver.Important: An rpm file should be downloaded. If a different type of file is downloaded, verify that you chose the correct options and try again.
- Install CUDA and the GPU
driver.Note: For AC922 systems: OS and system firmware updates are required before you install the latest GPU driver.
sudo rpm -ivh nvidia-driver-local-repo-rhel7-440.*.rpm
sudo yum install nvidia-driver-latest-dkms
- Set nvidia-persistenced to start at boot (required for
ppc64le, recommended for x86):
sudo systemctl enable nvidia-persistenced
- Restart to activate the driver.
- Verify the setup:
- IBM Power
-
docker run --rm nvidia/cuda-ppc64le nvidia-smi
- x86_64
-
docker run --rm nvidia/cuda nvidia-smi
Install the GPU driver (Ubuntu)
Many of the deep learning packages require the GPU driver packages from NVIDIA.
Install the GPU driver by following these steps:
- Download the NVIDIA GPU driver.
- Go to NVIDIA Driver Download.
- Select Product Type: Tesla
- Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
- Select Product: Tesla P100 or Tesla V100.
- Select Operating System, click Show all Operating Systems, then choose
the correct value:
- Linux POWER LE Ubuntu 18.04 for POWER
- Linux 64-bit Ubuntu 18.04 for x86
- Select CUDA Toolkit: 10.2
- Click SEARCH to go to the download link.
- Click Download to download the driver. Important: A deb file should be downloaded. If a different type of file is downloaded, verify that you chose the correct options and try again.
- The driver file name is
NVIDIA-Linux-ppc64le-440.87.01.run
. Give this file execute permission and execute it on the Linux image where the GPU driver is to be installed.When the file is executed, you are asked two questions. It is recommended that you answer
Yes
to both questions. If the driver fails to install, check the /var/log/nvidia-installer.log file for relevant error messages. - Install the GPU driver repository and
cuda-drivers:
sudo dpkg -i nvidia-driver-local-repo-ubuntu1804-440.*.deb
sudo apt-key add /var/nvidia-driver-local-repo-440.*/*.pub
sudo apt-get update
sudo apt-get install cuda-drivers
- Set nvidia-persistenced to start at
boot
sudo systemctl enable nvidia-persistenced
- Reboot the system
Verify the GPU driver
Verify that the CUDA drivers are installed by running the /usr/bin/nvidia-smi application.
# nvidia-smi
Fri Mar 15 12:23:50 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.29 Driver Version: 418.29 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... On | 00000002:01:00.0 Off | 0 |
| N/A 50C P0 109W / 300W | 2618MiB / 16280MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... On | 00000003:01:00.0 Off | 0 |
| N/A 34C P0 34W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-SXM2... On | 0000000A:01:00.0 Off | 0 |
| N/A 48C P0 44W / 300W | 5007MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-SXM2... On | 0000000B:01:00.0 Off | 0 |
| N/A 36C P0 33W / 300W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 114476 C /opt/miniconda2/bin/python 2608MiB |
| 2 114497 C /opt/miniconda2/bin/python 958MiB |
| 2 114519 C /opt/miniconda2/bin/python 958MiB |
| 2 116655 C /opt/miniconda2/bin/python 2121MiB |
| 2 116656 C /opt/miniconda2/bin/python 958MiB |
+-----------------------------------------------------------------------------+
Install Docker and nvidia-docker2 (RHEL)
Follow these steps to install Docker on RHEL. For full details, refer to https://github.com/NVIDIA/nvidia-docker#rhel-docker.
- Install Docker:
sudo yum install docker
Note: docker-1.13.1-108.git4ef4b30.el7 has a known issue with the Nvidia GPUs. The docker-1.13.1-104.git4ef4b30.el7 version can explicitly be installed, or newer versions of RHEL Docker work as well. Ensure that docker-1.13.1-108.git4ef4b30.el7 is NOT installed. - Reboot the system.
- Add the package repositories:
- On x86:
-
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo sudo yum install -y nvidia-container-toolkit sudo systemctl restart docker
- On IBM Power:
-
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/docker/$distribution/docker.repo | sudo tee /etc/yum.repos.d/docker.repo sudo yum install -y nvidia-container-runtime-hook sudo systemctl restart docker
Install Docker and nvidia-docker2 (Ubuntu)
Use these steps to install Docker and nvidia-docker 2.
- For Ubuntu platforms, a Docker runtime must be installed. If
there is no Docker runtime installed yet, install Docker-CE on Ubuntu.
- IBM Power
-
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=ppc64el] https://download.docker.com/linux/ubuntu bionic stable" sudo apt-get update sudo apt-get install docker-ce=18.06.1~ce~3-0~ubuntu
- x86_64
-
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" sudo apt-get update sudo apt-get install docker-ce
- Install nvidia-docker
2.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install nvidia-docker2 sudo systemctl restart docker.service
- For each userid that will run docker, add the userid to the docker
group:
sudo usermod -a -G docker <userid>
Users must log out and log back in to pick up this group change.
- Verify the setup.
- IBM Power
-
nvidia-docker run --rm nvidia/cuda-ppc64le nvidia-smi
- x86_64
-
nvidia-docker run --rm nvidia/cuda nvidia-smi
The nvidia-docker run command must be used with docker-ce
(in
other words, an Ubuntu host) to leverage the GPUs from within a container.