What's new and changed in IBM Spectrum Conductor 2.3.0

IBM® Spectrum Conductor 2.3.0 includes various new functionality and enhancements to existing functions.

Product rebranding

New for 2.3.0, the product name has changed from IBM Spectrum Conductor with Spark to IBM Spectrum Conductor.

Apache Spark 2.3.1

Apache Spark version 2.3.1 is now publicly available and is supported by IBM Spectrum Conductor 2.3.0. The following Apache Spark versions are now built in with IBM Spectrum Conductor 2.3.0:

Apache Spark 2.3.1
Apache Spark 2.2.0
Apache Spark 2.1.1
Apache Spark 1.6.1

For more information, see the Spark 2.3.1 release notes at Spark Release 2.3.1.

For a list of updated or newer Spark versions (which you can download from IBM Fix Central and add to your cluster), see Supported Spark, Anaconda, and notebook versions.

Anaconda management

IBM Spectrum Conductor now offers better integration of Anaconda with Spark workload and Jupyter notebooks that includes enterprise security. You can manage an existing Anaconda distribution or create a new one to deploy to your cluster. You can also share an Anaconda distribution cluster that is deployed across multiple Spark instance groups to save disk space. See Anaconda.

IBM Spectrum Conductor Anaconda version 5.1.0 Python 3 is built in to IBM Spectrum Conductor 2.3.0.

Installation, upgrading, and configuration

You can now install IBM Spectrum Conductor as non-root: For production clusters, log in with root or sudo to root permission. For evaluation clusters, you can now install as any user, who becomes the cluster administrator.

egoinstallfixes command

You can now install and manage IBM Spectrum Conductor fixes by using the new egoinstallfixes command. This command offers options to complete these tasks:

Install fixes, which includes checking the system, backing up the current files, and then installing the specified fix packages on top of your existing IBM Spectrum Conductor cluster.
Roll back the most recent fix (either by fix package name or by build number) so that your host returns to its original state before you install the fix.
Check fix packages, without installing them. This preliminary checking includes determining if your existing cluster is compatible with the fix you want to install, verifying that the user account has appropriate permissions, and listing files that will be overwritten by and added by the fix.

For command syntax and option details, see Installing interim fixes from IBM Fix Central on Linux hosts.

New EGO configuration to allow LIM to start MELIM and PIM processes as a cluster administrator:: A new EGO_ELIM_RUNAS_CLUSTER_ADMIN parameter within the ego.conf configuration file allows the LIM daemon to start MELIM and PIM processes as a cluster administrator user. If enabled, all processes also can run as this administrative user. By default, this setting is not enabled. To set this parameter to Y, see EGO_ELIM_RUNAS_CLUSTER_ADMIN in ego.conf reference.

Installing or upgrading with other products

IBM Spectrum Conductor 2.3.0 supports:

IBM Spectrum Symphony 7.2.1
IBM Spectrum LSF® Standard Edition for Symphony 10.1.0.2

Prepaid licensing and monitoring

IBM Spectrum Conductor now supports prepaid usage licensing for compute workloads that extend to the cloud. Prepaid usage licensing enables you to prepay for hours used by a host's cores (core hours) and consume those hours however you want. To audit prepaid consumption when your cluster bursts to the cloud, you must configure IBM Spectrum Conductor to upload usage metrics through an external load index monitor (ELIM) to a metering service in IBM Cloud Private. For more information on downloading and installing prepaid usage monitoring, see Installing prepaid usage monitoring for cloud.

Security

SSL support extended to include the local RS: In addition to SSL communication between the primary repository server (RS) and the RS client, IBM Spectrum Conductor now supports SSL between the local RS and the (primary) RS, and between the local RS and the RS client. If you use multiple repository servers, the RS client connects to RS first, and then the system redirects the RS client to the local RS. In this case, you edit the egosc service profile (for example RSmyregion.xml) for this configuration. The steps are very similar to setting up SSL support between RS and the RS client. For details on this set-up, see Enabling SSL communication between RS (or the local RS), and the RS client.

Spark instance group enhancements

GPU monitoring enhancements: With GPU monitoring enabled, you can now see for the amount of memory and GPU utilization that is used and the total available for all Spark instance groups. For Spark applications, you can now see the amount of memory and GPU utilization that is used and the total available for the applications. You can also see the current values (total, average, maximum, and minimum) across all GPU devices that are used by the applications.

GPU computing with R: You can now write a Spark application that uses a Resilient Distributed Dataset (RDD) API in R, in addition to Python and Scala, to create a new GPU RDD; whose tasks run on GPU resources in your cluster (see Submitting a Spark application with GPU RDD). An R GPU RDD API example is also provided. For more information, see GPU RDD sample and API examples.

Configuring additional parameters and environment variables: If necessary, you can set additional parameters and environment variables when you are configuring a Spark instance group. If the parameter or variable is invalid, it is ignored.

Notebook enhancements

Jupyter notebook language kernels: Inside the Jupyter notebook, you can see three new kernel options that are called Spark Python (Spark Cluster Mode), Spark R (Spark Cluster Mode), and Spark Scala (Spark Cluster Mode). These kernels have a Spark context that is ready to use and that runs outside of the Jupyter notebook service. Select one of these kernels if you want to utilize Spark when you submit jobs from the notebook. With these options, all the monitoring features of Spark applications are available, as they are Spark cluster applications instead of Spark client mode applications. For more information, see Jupyter notebooks.

Kernel culling for Jupyter notebooks: Kernel culling is now configured by default for Jupyter 5.4.0 notebooks (or later) from the Notebook Management page in the cluster management console in order to reclaim kernel resources. With kernel culling, you can specify timeout environment variables for the Jupyter kernel so that the kernel does not idle and resources are reclaimed within a specific time frame. For more information, see Kernel culling for Jupyter notebooks.

Jupyter 5.4.0: IBM Spectrum Conductor 2.3.0 includes a Jupyter 5.4.0 notebook package that works for Linux and Linux on POWER. Jupyter 5.4.0 supports built-in Spark versions that are included in IBM Spectrum Conductor version 2.3.0 except Spark 1.6.1.

Reporting

Explorer Reports: With Explorer reports, IBM Spectrum Conductor additionally aggregates Elasticsearch data to make it consumable to be used for the Spark Charge Back report and then puts the data back into Elasticsearch. For more information, see Explorer reports.

Data loader enhancements: The new Spark resource usage aggregation data loader sparkresusageaggloader aggregates resource usage for cores, slots, and memory for both GPU and CPU by application ID, user name, and top consumer across each Spark instance group, across one hour for one application. This data loader is disabled by default.

RESTful API references now available

Now you can view RESTful application programming interface (API) documentation to configure and maintain IBM Spectrum Conductor through RESTful APIs. For more information, see RESTful API references.

Tutorial

The tutorial has been updated for IBM Spectrum Conductor 2.3.0. The tutorial is short, easy to follow, and provides step-by-step instructions on how to create and manage a Spark instance group. The tutorial now covers Anaconda management.