Known issues for Analytics Engine powered by Apache Spark

The following known issues and limitations apply to Analytics Engine powered by Apache Spark.

Spark 3.4-R43 notebook kernel does not support FIPS 140-2 compliant encryption

Applies to: 5.0.3

The Spark 3.4-R43 notebook kernel scenarios does not support FIPS (Federal Information Processing Standard) compliant encryption (FIPS 140-2).

Error while upgrading all Spark instances at the same time

Applies to: 5.0.0
Fixed in: 5.0.1

Occasionally, when you upgrade all of the service instances at the same time, the command fails with the following error:


http: ContentLength=157 with Body length 0

When this happens, the user must upgrade each service instance individually by using the following command:

 cpd-cli service-instance upgrade --instance-name <cpd-instance-name> \
--service-type=<cpd-service-type> \
--profile=${CPD_PROFILE_NAME}

Spark history server fails to consider the cores and memory configuration

Applies to: 5.0.1

The deployed Spark history server fails to consider the cores and memory configuration provided while starting the Spark history server. Instead, it uses the history server configuration properties that are set as instance default configuration.

Delay in starting Spark application

Applies to: 5.0.1

When using observability agents such as Instana or Dynatrace in the Kubernetes cluster, the agent processes running within the container might collect metrics, traces, and logs from both the application container(s) and the pod environment. This additional processing can occasionally lead to a delay in the startup time of Spark applications, particularly affecting the Spark driver process. To resolve this, allocate one additional CPU core to the Spark driver process.

Downtime for upgrade from previous Cloud Pak for Data Version to 5.0.x

Applies to: 5.0.0 and later

Upgrading Analytics Engine powered by Apache Spark from previous Cloud Pak for Data Version to 5.0.x causes downtime during the upgrade process.

Unable to connect to Spark server when you upgrade from 4.8.5 to 5.0

Applies to: 5.0.0
Fixed in: 5.0.1

When you upgrade Analytics Engine powered by Apache Spark from version 4.8.5 to 5.0 with HTTP proxy settings enabled, the worker node and master node continously restarts with the error '503 service unavailable for spark'. You will not be able to submit Spark jobs when HTTP proxy settings are enabled.

Job or kernel fails with PVC not found for tethered namespaces

Applies to: 5.0.0 and later

When a Spark instance is created from a non-dataplane tethered namespace, Spark jobs and kernels will fail with a message similar to the following:

{
  'type': 'server_error',
  'code': 'cluster_creation_error',
  'message': 'Could not complete the request. Reason - FailedScheduling. Detailed error - 0/9 nodes are available: 9 persistentvolumeclaim "volumes-home-vol-pvc" not found., From - ibm-cpd-scheduler'
}

Workaround: You must create a service instance using a dataplane tethered namespace to resolve this issue.

Timeout message when submitting a Spark job

Applies to: Spark applications using V2 or V3 APIs only.

The expected behaviour by the Spark service when you submit a Spark application is that you create a SparkContext or SparkSession at the beginning of your Spark application code. When you submit the Spark job via the REST API, it returns the Spark application ID once the SparkContext is successfully created.

However, if you don't create a SparkContext or SparkSession:

At the beginning of the Spark application
At all in the Spark application or if your application is in plain Python, Scala or R

the REST API will wait for your application to complete which can lead to a REST API timeout. The reason is that the Spark service expects the Spark application to have started, which is not the case if you are running a plain Python, Scala or R application. This application will be listed in Jobs UI even though the REST API timed out.