Use this stored procedure to launch a Spark application, and to specify whether it is to
run synchronously or asynchronously.
Authorization
The privileges held by the authorization ID of the statement must include the IDAX_USER role.
Syntax
IDAX.SPARK_SUBMIT(out submission_id varchar(1024), in parameters varchar(32672), in configuration varchar(32672) default null)
Parameter descriptions
- submission_id
- The submission ID of the Spark application that was started in the cluster.
- The submission ID is required to retrieve the status of a Spark application or to cancel a Spark
application.
- Data type: VARCHAR(1024)
- parameters
- Mandatory.
- The parameters can be passed in the format of a JavaScript Object Notation (JSON) array, or in
pipe format as a pipe-separated list of key=value pairs. If you use a JSON object, the keys that
start with spark must be part of a child object that is called
sparkProperties as shown in the Example section.
- The following list shows the keys and values:
- appResource=<JAR/py/R file name>
- Optional.
- The name of the JAR file, or py file, or R file that contains the Spark
application code.
- The name is relative to the $HOME/spark/apps directory of the user. You can
upload the file to the server by using the IBM® Db2® Warehouse API
endpoint /home.
- For Scala or Java applications: If you want to omit the appResource value,
the main class must be contained in another JAR file in the class path that is compiled by the Spark
cluster manager, for example, in a JAR file that is specified through sparkJars.
This class path
must contain all libraries that are required by the Spark application, and the main class must exist
in one of the libraries in the class path.
The following libraries are included in the class
path by the Spark cluster manager:
- All libraries in the globallibs directory. The administrator can upload
libraries to this directory by using the IBM Db2 Warehouse API endpoint
/global.
- All libraries in the spark/defaultlibs subdirectory of the
home directory of the user.
- All libraries that are specified by appResource and
sparkJars that are located relatively to the
$HOME/spark/apps directory.
- mainClass=<full class name>
- Mandatory.
- The full name of the class that contains the main() function that invokes the spark application.
The class must exist in the class path that is compiled by the Spark cluster manager.
- appArgs=<list of arguments>
- Optional.
- A list of arguments for the Spark application as a JavaScript Object Notation (JSON) array or,
if the pipe format is used, as a comma-separated list of values.
- clientSparkVersion=<Spark version>
- Optional.
- The minimum version of Spark that the Spark application expects.
- sparkAppName=<Spark application name>
- Optional.
- A name for the Spark application.
- sparkJars=<list of JAR files>
- Optional.
- A comma-separated list of additional JAR files that are to be used by an application that is
written in Java or Scala.
- The names of the JAR files are relative paths, that is, the path is relative to the
$HOME/spark/apps directory of the user. You can upload the file to the server
by using the IBM Db2 Warehouse API endpoint
/home.
- sparkSubmitPyFiles=<list of Python files>
- Optional.
- A comma-separated list of additional PY, ZIP, or EGG files that are to be used by an application
that is written in Python.
- The names of the files are relative paths, that is, the path is relative to the
$HOME/spark/apps directory of the user. You can upload the file to the server
by using the IBM Db2 Warehouse API endpoint
/home.
- Data type: VARCHAR(32672)
- configuration
- A comma-separated list of configuration options as key=value pairs that control the application
execution.
- The following list shows the keys and values:
- format = json/pipe/auto
- Optional.
- The input format for the submission parameters as a JavaScript Object Notation (JSON) array or
as a pipe-separated list of key=value pairs.
- Usually, this option is not needed because the procedure detects the input format
automatically.
- Default: auto
- mode = sync/async
- Optional.
- Synchronous or asynchronous execution of the Spark application.
- In a synchronous execution, the procedure waits until the application is completed. In an
asynchronous execution, the procedure returns as soon as the Spark application is submitted to the
cluster.
- Default: sync
- retries = <integer>
- Optional.
- Only in synchronous mode.
- The maximum number of consecutive failures for application status calls before the procedure
stops.
- Default: 3
- timeout = <integer>
- Optional.
- Only in synchronous mode.
- The overall timeout for the status polling process before the procedure stops.
- Default: 1 day
- timeinterval = <integer>
- Optional.
- Only in synchronous mode.
- The time interval between two status polling operations after which the procedure sends an
application status request to the cluster.
- Default: 5 seconds
- Data type: VARCHAR(32672)
Returned information
The procedure does not return any SQL results. The submission ID is returned in the
submission_id out parameter.
If the application submission fails, or if the application stops with an error, or if the
application status polling fails, you get an error message, that is, a CDFAA message. To see the
complete error message, use the VALUES IDAX.LAST_MESSAGE command.
Example
Pipe Format:
CALL IDAX.SPARK_SUBMIT(?, 'appResource=idax_examples.jar | mainClass=com.ibm.idax.spark.examples.ReadWriteExampleKMeans');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100', 'mode=async, timeout=3600');
CALL IDAX.SPARK_SUBMIT(?, 'appResource=myapp.jar | mainClass=com.example.app.Main | appArgs=100,200 |
sparkJars=utility.jar | sparkAppName=My First Spark App', 'mode=async');
JSON Format:
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "idax_examples.jar", "mainClass" : "com.ibm.idax.spark.examples.ReadWriteExampleKMeans" }');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100 ] }');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100 ] }', 'mode=async, timeout=3600');
CALL IDAX.SPARK_SUBMIT(?, '{ "appResource" : "myapp.jar" , "mainClass" : "com.example.app.Main" , "appArgs" : [ 100, 200 ],
sparkProperties : { "sparkJars" : "utility.jar", "sparkAppName" : "My First Spark App" } }', 'mode=async');