Updating the Apache Spark configuration files
Complete this task to copy the Apache Spark configuration files to your new configuration directory and update them.
About this task
- spark-env.sh
- A shell script that is sourced by most of the other scripts in the Apache Spark installation. You can use it to configure environment variables that set or alter the default values for various Apache Spark configuration settings. For sample contents of this file, see Sample Apache Spark configuration files.
- spark-defaults.conf
- A configuration file that sets default values for the Apache Spark runtime components. You can override these default values on the command line when you interact with Spark using shell scripts. For sample contents of this file, see Sample Apache Spark configuration files.
- log4j.properties
- Contains the default configuration for log4j, the logging package that Apache Spark uses.
You can find templates of these configuration files in the $SPARK_HOME/conf directory. Note that spark-defaults.conf and log4j.properties files are ASCII files. If you have set _BPXK_AUTOCVT=ON as specified in Setting up a user ID for use with IBM z/OS Platform for Apache Spark, you can edit them without any explicit conversion.
If you plan to use the built-in Apache Spark support for Apache Hive, consider updating the Hive configuration files (hive-site.xml, core-site.xml, and hdfs-site.xml) to point to separate working directories. For example, the Spark SQL Command Line Interface (./bin/spark-sql) tool runs the Hive metastore service in local mode and, by default, it writes all temporary files into the /tmp directory.
For more information about Apache Spark support for Apache Hive, see Spark SQL programming guide.