Apache Livy is a service that enables you to work with Spark applications by using a REST
API or a programmatic API. When you deploy the Db2® Warehouse image container, a Livy server is
automatically installed and configured for you.
About this task
After the Livy server is installed, you can launch a Spark application from a client system to
the Spark cluster that runs alongside Db2 Warehouse, inside the same container.
The Livy server version that comes with the latest Db2 Warehouse container is Livy
0.6.0-incubating.
Procedure
-
Ensure that port 8998 is open as described in IBM
Db2 Warehouse prerequisites (Linux on x86 hardware).
-
Start the server by using the following command:
docker exec -it Db2wh livy-server start
-
Before you launch your Spark application, consider the following restrictions:
- The maximum number of parallel Spark jobs for each installation of Db2 Warehouse is as follows:
- RAM < 120 GB: 3 parallel Spark jobs
- RAM ≥ 120 GB: 5 parallel Spark jobs
- After you open an interactive session or submit a batch job through Livy, wait 30 seconds before
you open another interactive session or submit the next batch job.
-
To upload dependencies, use the spark-submit.sh script as described in spark-submit.sh script.
For batch jobs and interactive sessions that are executed by using Livy, ensure that you use one
of the following absolute paths to reference your dependencies:
For batch jobs and interactive sessions that are executed by using Livy, you can use relative
paths or absolute paths to reference your dependencies.
If you use a relative path, and if there are several files with the same name, the order of
precedence is as follows:
- Files in the apps directory
- Files in the defaultlibs directory
- Files in the globallibs directory
The absolute paths are as follows:
-
Launch your Spark applications by using the REST API or the programmatic API.
Ensure that you use the absolute path to reference your application, as
described in Step 4. Also, you can execute Spark applications interactively through Jupyter
notebooks configured for Livy with Sparkmagic.
- When you use the REST API, do the following steps:
- Provide the credentials to authenticate the user through HTTP basic authentication.
For
example, when you use cURL, add --user
'user:password'
to the cURL arguments.
- Follow the descriptions on the REST API website.
- When you use a programmatic API, do the following steps:
- Add the credentials as a prefix to the URL that is passed to the HttpClient, for example,
http://user:password@host:8998
.
- Follow the descriptions on the Using the Programmatic API website.
- When you use a Jupyter notebook with Sparkmagic, do the following steps:
- To install and configure Sparkmagic, follow the steps described in the ibmdbanalytics repository on GitHub
- To learn more about Sparkmagic, visit the jupyter-incubator repository on GitHub
- Showcases for Db2 Warehouse by using Jupyter
notebooks through Livy are available in the dashdb_analytic_tools repository on GitHub
- View the video tutorials on the IBM Developer YouTube channel:
Results
You can now monitor your Spark applications through the web user interface by entering the
following URL:http://host:8998
.