spark-defaults.conf

The spark-defaults.conf configuration file supports Spark on EGO in Platform ASC, setting up the default environment for all Spark jobs submitted on the local host.

This properties file serves as the default settings file, which is used by the spark-submit script to launch applications in a cluster. The spark-submit script loads the values specified in spark-defaults.conf and passes them on to your application.
Note: If you define environment variables in spark-env.sh, those values override any of the property values you set in spark-defaults.conf

Location

This file is located at $SPARK_HOME/conf/, where $SPARK_HOME is the location of your Apache Spark installation. For example:
SPARK_HOME=/opt/spark

Properties

When setting properties, take note of the deployment mode. A few properties can be set only in the cluster deployment mode.

Table 1. Property details
Property Description Default Deployment mode
spark.master Specifies the deployment mode, which determines whether the Spark Driver runs on the client side or in the EGO cluster. Valid values are:
  • ego-client: Runs the Spark Driver on the client side.
  • ego-cluster: Runs the Spark Driver in the EGO cluster.
No default Not applicable
spark.ego.app.name Specifies the application name, which forms part of the EGO client name.

Define this variable only when the Spark Driver is running as a service, to distinguish it from other Spark drivers. Otherwise, the Spark Driver whose client name is already registered with EGO is rejected.

An auto-generated UUID Client and Cluster
spark.ego.consumer Specifies the consumer used to request resources from EGO. SampleApplications/EclipseSamples Client and Cluster
spark.ego.uname Specifies the user name to log on to EGO. Guest Client and Cluster
spark.ego.passwd Specifies the password used to authenticate the user name specified in SPARK_EGO_UNAME. Guest Client and Cluster
spark.ego.executor.plan Specifies the resource group used to start Spark executors. ComputeHosts Client and Cluster
spark.ego.executor.resreq Specifies resource requirements (expressed as a string) based on which requests for specific resources are made to EGO to start Spark executors. No default Client and Cluster
spark.ego.enable.standby Enables services for Spark executors to be placed in standby mode, where Spark executors do not exit even when they do not occupy slots until the executor idle timeout expires. true Client and Cluster
spark.ego.executor.idle.timeout When standby services are enabled, specifies the duration (in seconds) that an executor stays alive when there is no workload on it. 600 Client and Cluster
spark.ego.executor.slots.max Specifies the maximum number of tasks that can run concurrently in one Spark executor.

To prevent the Spark executor process from running out of memory, define this variable only after evaluating Spark executor memory and memory usage per task.

No default Client and Cluster
spark.ego.client.timeout Specifies the duration (in seconds) that a client stays registered to EGO even when no workload is submitted. 900 Client and Cluster
spark.ego.driver.plan Specifies the resource group to start the Spark Driver. ManagementHosts Cluster
spark.ego.driver.resreq Specifies resource requirements (based on which requests for specific resources are made to EGO) to start the Spark Driver. No default Cluster
spark.driver.extraClassPath Specifies an additional classpath to the Driver's classpath.

Define this property to add an extra classpath when configuring (for example) GPFS.

No default Client and Cluster
spark.executor.extraClassPath Specifies an additional classpath to the Executor's classpath.

Define this property to add an extra classpath when configuring (for example) GPFS.

No default Client and Cluster
spark.driver.extraLibraryPath Specifies an additional library path to the Driver's library path.

Define this property to add an extra library path when configuring (for example) GPFS.

No default Client and Cluster
spark.executor.extraLibraryPath Specifies an additional library path to the Executor's library path.

Define this property to add an extra library path when configuring (for example) GPFS.

No default Client and Cluster
spark.shuffle.service.enabled Enables the Spark shuffle service (SPARKSS). No default Client and Cluster

Example

# Use EGO by default
spark.master ego-client
# Enable the external shuffle service
spark.shuffle.service.enabled true