Available now: IBM Spectrum Conductor version 2.3.0

Technical Blog Post

Abstract

Body

We are happy to announce the release of IBM Spectrum Conductor 2.3.0, which is available August 24^th 2018. New for 2.3.0, the product name has changed from IBM Spectrum Conductor with Spark to IBM Spectrum Conductor. Our development team has been working hard all year to deliver new features and enhancements in this release!

Check out our trial version!

Here’s a snapshot of what version 2.3.0 offers:

Version 2.3.1

Apache Spark version 2.3.1 is now publicly available and is supported by IBM Spectrum Conductor 2.3.0. The following Apache Spark versions are now built in with IBM Spectrum Conductor 2.3.0:

Apache Spark 2.3.1
Apache Spark 2.2.0
Apache Spark 2.1.1
Apache Spark 1.6.1

Anaconda management

IBM Spectrum Conductor now offers better integration of Anaconda with Spark workload and Jupyter notebooks that includes enterprise security. You can manage an existing Anaconda distribution or create a new one to deploy to your cluster. You can also share an Anaconda distribution cluster that is deployed across multiple Spark instance groups to save disk space.

Installation, upgrading, and configuration

You can now install IBM Spectrum Conductor as non-root.
You can now install and manage IBM Spectrum Conductor fixes by using the new egoinstallfixes command.
A new EGO_ELIM_RUNAS_CLUSTER_ADMIN parameter within the ego.conf configuration file allows the LIM daemon to start MELIM and PIM processes as a cluster administrator user. If enabled, all processes also can run as this administrative user.
IBM Spectrum Conductor 2.3.0 supports:
- IBM Spectrum Symphony 7.2.1
- IBM Spectrum LSF® Standard Edition for Symphony 10.1.0.2

Security

In addition to SSL communication between the master repository server (RS) and the RS client, IBM Spectrum Conductor now supports SSL between the local RS and the (master) RS, and between the local RS and the RS client.

Spark instance group enhancements

GPU monitoring enhancements

With GPU monitoring enabled, you can now see for the amount of memory and GPU utilization that is used and the total available for all Spark instance groups. For Spark applications, you can now see the amount of memory and GPU utilization that is used and the total available for the applications. You can also see the current values (total, average, maximum, and minimum) across all GPU devices that are used by the applications.

GPU computing with R

You can now write a Spark application that uses a Resilient Distributed Dataset (RDD) API in R, in addition to Python and Scala, to create a new GPU RDD; whose tasks run on GPU resources in your cluster. A sample and API examples are available.

Configuring additional parameters and environment variables

If necessary, you can set additional parameters and environment variables when you are configuring a Spark instance group. If the parameter or variable is invalid, it is ignored.

Notebook enhancements

Jupyter notebook language kernels

Inside the Jupyter notebook, you can see three new kernel options that are called Spark Python (Spark Cluster Mode), Spark R (Spark Cluster Mode), and Spark Scala (Spark Cluster Mode). These kernels have a Spark context that is ready to use and that runs outside of the Jupyter notebook service. Select one of these kernels if you want to utilize Spark when you submit jobs from the notebook. With these options, all the monitoring features of Spark applications are available, as they are Spark cluster applications instead of Spark client mode applications.

Kernel culling for Jupyter notebooks

Kernel culling is now configured by default for Jupyter 5.4.0 notebooks (or later) from the Notebook Management page in the cluster management console in order to reclaim kernel resources. With kernel culling, you can specify timeout environment variables for the Jupyter kernel so that the kernel does not idle and resources are reclaimed within a specific time frame.

Jupyter 5.4.0

IBM Spectrum Conductor 2.3.0 includes a Jupyter 5.4.0 notebook package that works for Linux and Linux on POWER. Jupyter 5.4.0 supports built-in Spark versions that are included in IBM Spectrum Conductor version 2.3.0 except Spark 1.6.1.

Reporting

Explorer Reports

With Explorer reports, IBM Spectrum Conductor additionally aggregates Elasticsearch data to make it consumable to be used for the Spark Charge Back report and then puts the data back into Elasticsearch.

Data loader enhancements

The new Spark resource usage aggregation data loader sparkresusageaggloader aggregates resource usage for cores, slots, and memory for both GPU and CPU by application ID, user name, and top consumer across each Spark instance group, across one hour for one application. This data loader is disabled by default.

Tutorial

The tutorial has been updated for IBM Spectrum Conductor 2.3.0. The tutorial is short, easy to follow, and provides step-by-step instructions on how to create and manage a Spark instance group. The tutorial now covers Anaconda management.

UID

ibm16163377

Tips

Available now: IBM Spectrum Conductor version 2.3.0

Technical Blog Post

Abstract

Body

Here’s a snapshot of what version 2.3.0 offers:

Version 2.3.1

Anaconda management

Installation, upgrading, and configuration

Security

Spark instance group enhancements

Notebook enhancements

Reporting

Tutorial

Read more...

UID

Share your feedback

Need support?