spark-defaults.conf
The spark-defaults.conf configuration file supports Spark on EGO in Platform ASC, setting up the default environment for all Spark jobs submitted on the local host.
Location
SPARK_HOME=/opt/spark
Properties
When setting properties, take note of the deployment mode. A few properties can be set only in the cluster deployment mode.
Property | Description | Default | Deployment mode |
---|---|---|---|
spark.master | Specifies the deployment mode, which determines whether
the Spark Driver runs on the client side or in the EGO cluster. Valid values are:
|
No default | Not applicable |
spark.ego.app.name | Specifies the application name, which forms part of the
EGO client name. Define this variable only when the Spark Driver is running as a service, to distinguish it from other Spark drivers. Otherwise, the Spark Driver whose client name is already registered with EGO is rejected. |
An auto-generated UUID | Client and Cluster |
spark.ego.consumer | Specifies the consumer used to request resources from EGO. | SampleApplications/EclipseSamples | Client and Cluster |
spark.ego.uname | Specifies the user name to log on to EGO. | Guest | Client and Cluster |
spark.ego.passwd | Specifies the password used to authenticate the user name specified in SPARK_EGO_UNAME. | Guest | Client and Cluster |
spark.ego.executor.plan | Specifies the resource group used to start Spark executors. | ComputeHosts | Client and Cluster |
spark.ego.executor.resreq | Specifies resource requirements (expressed as a string) based on which requests for specific resources are made to EGO to start Spark executors. | No default | Client and Cluster |
spark.ego.enable.standby | Enables services for Spark executors to be placed in standby mode, where Spark executors do not exit even when they do not occupy slots until the executor idle timeout expires. | true | Client and Cluster |
spark.ego.executor.idle.timeout | When standby services are enabled, specifies the duration (in seconds) that an executor stays alive when there is no workload on it. | 600 | Client and Cluster |
spark.ego.executor.slots.max | Specifies the maximum number of tasks that can run
concurrently in one Spark executor. To prevent the Spark executor process from running out of memory, define this variable only after evaluating Spark executor memory and memory usage per task. |
No default | Client and Cluster |
spark.ego.client.timeout | Specifies the duration (in seconds) that a client stays registered to EGO even when no workload is submitted. | 900 | Client and Cluster |
spark.ego.driver.plan | Specifies the resource group to start the Spark Driver. | ManagementHosts | Cluster |
spark.ego.driver.resreq | Specifies resource requirements (based on which requests for specific resources are made to EGO) to start the Spark Driver. | No default | Cluster |
spark.driver.extraClassPath | Specifies an additional classpath to the Driver's
classpath. Define this property to add an extra classpath when configuring (for example) GPFS. |
No default | Client and Cluster |
spark.executor.extraClassPath | Specifies an additional classpath to the Executor's
classpath. Define this property to add an extra classpath when configuring (for example) GPFS. |
No default | Client and Cluster |
spark.driver.extraLibraryPath | Specifies an additional library path to the Driver's
library path. Define this property to add an extra library path when configuring (for example) GPFS. |
No default | Client and Cluster |
spark.executor.extraLibraryPath | Specifies an additional library path to the Executor's
library path. Define this property to add an extra library path when configuring (for example) GPFS. |
No default | Client and Cluster |
spark.shuffle.service.enabled | Enables the Spark shuffle service (SPARKSS). | No default | Client and Cluster |
Example
# Use EGO by default
spark.master ego-client
# Enable the external shuffle service
spark.shuffle.service.enabled true