Launching a Spark application through an Apache Livy server

Apache Livy is a service that enables you to work with Spark applications by using a REST API or a programmatic API. When you deploy the Db2® Warehouse image container, a Livy server is automatically installed and configured for you.

About this task

After the Livy server is installed, you can launch a Spark application from a client system to the Spark cluster that runs alongside Db2 Warehouse, inside the same container.

The Livy server version that comes with the latest Db2 Warehouse container is Livy 0.6.0-incubating.

Procedure

  1. Ensure that port 8998 is open as described in IBM Db2 Warehouse prerequisites (Linux on x86 hardware).
  2. Start the server by using the following command:
    docker exec -it Db2wh livy-server start 
    
  3. Before you launch your Spark application, consider the following restrictions:
    • The maximum number of parallel Spark jobs for each installation of Db2 Warehouse is as follows:
      • RAM < 120 GB: 3 parallel Spark jobs
      • RAM ≥ 120 GB: 5 parallel Spark jobs
    • After you open an interactive session or submit a batch job through Livy, wait 30 seconds before you open another interactive session or submit the next batch job.
  4. To upload dependencies, use the spark-submit.sh script as described in spark-submit.sh script.
    For batch jobs and interactive sessions that are executed by using Livy, ensure that you use one of the following absolute paths to reference your dependencies:
    • For the apps directory:
      /mnt/blumeta0/home/username/spark/apps
    • For the defaultlibs directory:
      /mnt/blumeta0/home/username/spark/defaultlibs
    • For the globallibs directory:
      /mnt/blumeta0/global

    For batch jobs and interactive sessions that are executed by using Livy, you can use relative paths or absolute paths to reference your dependencies.

    If you use a relative path, and if there are several files with the same name, the order of precedence is as follows:
    1. Files in the apps directory
    2. Files in the defaultlibs directory
    3. Files in the globallibs directory
    The absolute paths are as follows:
    • For the apps directory:
      /mnt/blumeta0/home/username/spark/apps
    • For the defaultlibs directory:
      /mnt/blumeta0/home/username/spark/defaultlibs
    • For the globallibs directory:
      /mnt/blumeta0/global
    Note: For Python dependencies that are installed as described in Installing Python packages on IBM Db2 Warehouse, a reference in the call to Livy is not required.
  5. Launch your Spark applications by using the REST API or the programmatic API.
    Ensure that you use the absolute path to reference your application, as described in Step 4. Also, you can execute Spark applications interactively through Jupyter notebooks configured for Livy with Sparkmagic.
    • When you use the REST API, do the following steps:
      1. Provide the credentials to authenticate the user through HTTP basic authentication.

        For example, when you use cURL, add --user 'user:password' to the cURL arguments.

      2. Follow the descriptions on the REST API website.
    • When you use a programmatic API, do the following steps:
      1. Add the credentials as a prefix to the URL that is passed to the HttpClient, for example, http://user:password@host:8998.
      2. Follow the descriptions on the Using the Programmatic API website.
    • When you use a Jupyter notebook with Sparkmagic, do the following steps:
      1. To install and configure Sparkmagic, follow the steps described in the ibmdbanalytics repository on GitHub
      2. To learn more about Sparkmagic, visit the jupyter-incubator repository on GitHub
      3. Showcases for Db2 Warehouse by using Jupyter notebooks through Livy are available in the dashdb_analytic_tools repository on GitHub
      4. View the video tutorials on the IBM Developer YouTube channel:

Results

You can now monitor your Spark applications through the web user interface by entering the following URL:
http://host:8998
.