Add a notebook to register it to IBM® Spectrum
Conductor and make it available for
selection when you create a new instance group.
Before you begin
- You must be a cluster administrator or have the Notebook Management Configure permission.
- You must create the notebook package containing the scripts and binaries required for the
notebook to run. See Creating notebook packages.
- You must install cURL 7.28.0 or higher must be installed on all hosts that run the notebook.
- If you want to enable Docker support for your notebook, enabling the notebook services to run
within Docker containers, you must Dockerize the notebook. See Adding Dockerized notebooks.
About this task
Adding a notebook to IBM Spectrum
Conductor registers the notebook to the
cluster. This task is not required for the built-in notebooks that are installed with IBM Spectrum
Conductor, unless you removed and want
to re-add them, or if you want to add an updated version. The built-in notebooks are typically
available when you create a instance group. To add a built-in notebook that
you previously deleted or to add an updated version, download the notebook package (and the
metadata.yml file if it exists for the notebook) from
IBM
Fix Central and follow
the instructions in the accompanying readme file.
Procedure
-
From the cluster management console, click
.
-
Click Add.
-
Enter the following fields in the Deployment Settings tab. If a
metadata.yml file exists for the notebook, you can automatically fill in all of
the required fields and some of the optional fields by dragging and dropping the
metadata.yml file and the notebook package into the Add
Notebook dialog.
You can also manually enter the values as necessary. The notebook requires a name, a version, a
package, a start command, a stop command, and a job monitor command. All other fields are
optional:
- Name: Enter a name for the notebook. The notebook name must not exceed 64
characters and can contain any of the following characters:
a-z A-Z 0-9
- Version: Enter the version of the notebook. The notebook version must not
exceed 12 characters, must start with a number, and can contain any of the following characters:
0-9 .
(cannot contain only periods).
- Package: Upload the package that contains the scripts and files that are
required for the notebook to run. The package name must be unique within a consumer, must not exceed
1024 characters, and can contain any of the following characters:
0-9 A-Z a-z . _
-
- Run notebook in a Docker container (optional): Select the check box to
configure your notebook services to run inside a Docker container, instead of on the host. Running
Dockerized notebook services inside a container can simplify the library dependencies for each type
of notebook. As a result, you can use Docker’s flexibility and portability to run its notebook
services on any host, in any environment. If you select this option, the configuration settings for
the notebook change. To add a Dockerized notebook, see Adding Dockerized notebooks.
- Enable monitoring for the notebook (optional): Select the check box to
enable monitoring for the notebook. Monitoring provides the number of cores that are used, the
amount of memory used for the notebook in MB, and the number of executors.
Note: Monitoring works best for
notebooks that launch in one browser window. It is not supported for Jupyter notebooks, where each
notebook opens in a separate browser window.
- Enable collaboration for the notebook (optional): Select the check box to
enable collaboration of the notebook service. Collaboration allows multiple notebook collaborators
to create, edit, and delete notebook files at the same time as other assigned notebook
collaborators, and view the changes made to notebook files by other collaborators. For more
information, see Notebook collaboration.
- Supports
SSL (optional): Select the check box to indicate that SSL is supported for the notebook,
if SSL is configured for the cluster.
- Supports
user impersonation (optional): Select the check box to indicate that the notebook
supports running notebook services and their Spark workload as the notebook owner OS user.
For notebooks
enabled with Kerberos user authentication and user impersonation, you must also specify your
principal and the location of your keytab file, which you can do through environment variables for
your notebook. Refer to Adding environment variables to notebooks for details on how to configure
this.
- Anaconda required (optional): Select the check box to make an Anaconda distribution and
environment mandatory for the notebook. Instance groups that use this notebook must
specify an Anaconda distribution and
environment.
- Prestart command (optional): Specify the command to prestart the
notebook. The path to the script in the command must be relative to the notebook deployment
directory. The prestart script is called when you start the notebook service to perform
preconfiguration of the notebook running environment. These configurations include retrieving the
available port on the host for the notebook web service to start.
- Start command: Specify the command to start the notebook. The path to the
script in the command must be relative to the notebook deployment directory.
- Stop command: Specify the command to stop the notebook. The path to the
script in the command must be relative to the notebook deployment directory. The stop script is called when you perform a
service stop to stop the notebook server.
- Job Monitor command: Specify the command to monitor the notebook
activity. The path to the script in the command must be relative to the notebook deployment
directory. The job monitor script is started automatically
by the resource orchestrator when you start the notebook service. This script monitors the notebook
process and retrieves the notebook web service port.
- Base port (optional): Specify the base port from which to find an
available port for the notebook service instance. For Zeppelin notebooks, the default value is 8380.
For Jupyter notebooks, the default value is 8888.
- Longest update interval for job monitor (optional): Enter the interval
(in seconds) within which the resource orchestrator expects to receive the activity state from the
job monitor.
- Job control wait period (optional): Enter the interval (in seconds) that
the resource orchestrator waits before it ends an activity.
-
Set any environment variables for use by your scripts in the Environment
Variables tab.
Note:
- For Jupyter 5.4.0 notebooks (or
later), kernel culling environment variables are configured by default. For more information about
kernel culling, see Kernel culling for Jupyter notebooks.
- To deploy the notebook only with specific Spark versions, define the
supported_spark_versions environment variable to a comma-separated list of
Spark versions. For example, to deploy a notebook with Spark versions 2.1.0 and 1.5.2, specify
supported_spark_versions as 2.1.0,1.5.2.
-
Click Add.
Results
The notebook is added to the cluster.
What to do next
Create a instance group and enable
the notebook that you added. See Creating instance groups.
To enable an existing instance group
to use this updated notebook package, see Updating instance groups to use an updated Spark version or notebook package.