IDAX.SPARK_SUBMIT - Launch a Spark application

Use this stored procedure to launch a Spark application, and to specify whether it is to run synchronously or asynchronously.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.SPARK_SUBMIT(out submission_id varchar(1024), in parameters varchar(32672), in configuration varchar(32672) default null)

Parameter descriptions

submission_id

The submission ID of the Spark application that was started in the cluster.

The submission ID is required to retrieve the status of a Spark application or to cancel a Spark application.

Data type: VARCHAR(1024)

parameters

Mandatory.

The parameters can be passed in the format of a JavaScript Object Notation (JSON) array, or in pipe format as a pipe-separated list of key=value pairs. If you use a JSON object, the keys that start with spark must be part of a child object that is called sparkProperties as shown in the Example section.

The following list shows the keys and values:

appResource=<JAR/py/R file name>

Optional.

The name of the JAR file, or py file, or R file that contains the Spark application code.

The name is relative to the $HOME/spark/apps directory of the user. You can upload the file to the server by using the IBM® Db2® Warehouse API endpoint /home.

For Scala or Java applications: If you want to omit the appResource value, the main class must be contained in another JAR file in the class path that is compiled by the Spark cluster manager, for example, in a JAR file that is specified through sparkJars.

This class path must contain all libraries that are required by the Spark application, and the main class must exist in one of the libraries in the class path.

The following libraries are included in the class path by the Spark cluster manager:

All libraries in the globallibs directory. The administrator can upload libraries to this directory by using the IBM Db2 Warehouse API endpoint /global.
All libraries in the spark/defaultlibs subdirectory of the home directory of the user.
All libraries that are specified by appResource and sparkJars that are located relatively to the $HOME/spark/apps directory.

mainClass=<full class name>

Mandatory.

The full name of the class that contains the main() function that invokes the spark application. The class must exist in the class path that is compiled by the Spark cluster manager.

appArgs=<list of arguments>

Optional.

A list of arguments for the Spark application as a JavaScript Object Notation (JSON) array or, if the pipe format is used, as a comma-separated list of values.

clientSparkVersion=<Spark version>

Optional.

The minimum version of Spark that the Spark application expects.

sparkAppName=<Spark application name>

Optional.

A name for the Spark application.

sparkJars=<list of JAR files>

Optional.

A comma-separated list of additional JAR files that are to be used by an application that is written in Java or Scala.

The names of the JAR files are relative paths, that is, the path is relative to the $HOME/spark/apps directory of the user. You can upload the file to the server by using the IBM Db2 Warehouse API endpoint /home.

sparkSubmitPyFiles=<list of Python files>

Optional.

A comma-separated list of additional PY, ZIP, or EGG files that are to be used by an application that is written in Python.

The names of the files are relative paths, that is, the path is relative to the $HOME/spark/apps directory of the user. You can upload the file to the server by using the IBM Db2 Warehouse API endpoint /home.

Data type: VARCHAR(32672)

configuration

A comma-separated list of configuration options as key=value pairs that control the application execution.

The following list shows the keys and values:

format = json/pipe/auto: Optional.; The input format for the submission parameters as a JavaScript Object Notation (JSON) array or as a pipe-separated list of key=value pairs.; Usually, this option is not needed because the procedure detects the input format automatically.; Default: auto
mode = sync/async: Optional.; Synchronous or asynchronous execution of the Spark application.; In a synchronous execution, the procedure waits until the application is completed. In an asynchronous execution, the procedure returns as soon as the Spark application is submitted to the cluster.; Default: sync
retries = <integer>: Optional.; Only in synchronous mode.; The maximum number of consecutive failures for application status calls before the procedure stops.; Default: 3
timeout = <integer>: Optional.; Only in synchronous mode.; The overall timeout for the status polling process before the procedure stops.; Default: 1 day
timeinterval = <integer>: Optional.; Only in synchronous mode.; The time interval between two status polling operations after which the procedure sends an application status request to the cluster.; Default: 5 seconds

Data type: VARCHAR(32672)

Returned information

The procedure does not return any SQL results. The submission ID is returned in the submission_id out parameter.

If the application submission fails, or if the application stops with an error, or if the application status polling fails, you get an error message, that is, a CDFAA message. To see the complete error message, use the VALUES IDAX.LAST_MESSAGE command.

Example

Pipe Format:
CALL IDAX.SPARK_SUBMIT(?, 'appResource=idax_examples.jar | mainClass=com.ibm.idax.spark.examples.ReadWriteExampleKMeans');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100', 'mode=async, timeout=3600');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100,200 | 
                          sparkJars=utility.jar | sparkAppName=My First Spark App', 'mode=async');

JSON Format:
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "idax_examples.jar", "mainClass" : "com.ibm.idax.spark.examples.ReadWriteExampleKMeans" }');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100 ] }');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100 ] }', 'mode=async, timeout=3600');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100, 200 ], 
                           sparkProperties : { "sparkJars" : "utility.jar", "sparkAppName" : "My First Spark App" } }', 'mode=async');