Using a Jupyter notebook with Db2 Big SQL

In Cloud Pak for Data, you can use Python Jupyter notebooks to work with the Db2® family of products, including Db2 Big SQL.

For example, you can create a Jupyter notebook that runs SQL statements to analyze data.

Important: To use Jupyter notebooks, Watson Studio must be installed.

To create a Jupyter notebook, complete the following steps:

  1. Log in to IBM® Software Hub and select Projects > All projects, and then click New project.
  2. Select Create an empty project.
  3. After defining the project details and creating your project, select Add to project, and then choose the asset type Notebook.
  4. Specify a name for your notebook, select Python 3.7 as the language, and click Create.
  5. Paste the following two lines of code into the first cell of the notebook:
    !pip install ipython-sql
    !pip install ibm_db
    To install the Python packages that are required to work with Db2 Big SQL, run the cell by clicking the Run button in the toolbar.
  6. Create a new cell by clicking + in the toolbar. Paste the following Python code snippet into the new cell, and then replace the placeholders with appropriate values:
    import ibm_db
    conn = ibm_db.connect("DATABASE=bigsql;" +\
                          "HOSTNAME=<Big-SQL-head-pod-fqdn-host-name>;" \
                          "PORT=50000;" \
                          "PROTOCOL=TCPIP;" \
                          "UID=<username>;" \
                          "PWD=<password>",'','')
    stmt = "CREATE HADOOP TABLE testschema.testtab (key int, value varchar(100), ts timestamp)"
    # LOCATION ‘s3a://<bucket>/testschema.testtab’
    # If you want to create the table in an object store, uncomment the previous
    # line, substituting appropriate values for the placeholders. The bucket
    # placeholder must be replaced with the name of an existing bucket that can be
    # accessed with the object store credentials that were provided when Db2 Big SQL was
    # provisioned.
    ibm_db.exec_immediate(conn, stmt)
    This code illustrates the use of the Python ibm_db package, which enables you to work with Db2 Big SQL from any Python application, including a notebook. This type of Python code is required when you want to run CREATE TABLE (HADOOP) statements.
    Notes:

    The <username> and <password> are for an IBM Cloud Pak® for Data user that was granted access to Db2 Big SQL.

    If Watson Studio is deployed on a different IBM Software Hub cluster than Db2 Big SQL, then you can instead set the HOSTNAME to the publicly exposed or load balancer node for the cluster, and the PORT to the externally exposed NodePort. Ask an IBM Software Hub administrator to provide you with the information that was used to set up a connection to Db2 Big SQL.

  7. Run the cell by clicking the Run button next to the cell. The result should resemble the following example output:
    Figure 1. Output from the CREATE TABLE (HADOOP) statement
    Screen capture showing the CREATE TABLE (HADOOP) statement output
  8. If you are not planning to use CREATE TABLE (HADOOP) statements, you can use the more concise syntax of Jupyter magic functions. A %sql prefix in this syntax enables you to intersperse SQL statements alongside Python code in your notebooks. Create a new cell by clicking + in the toolbar. Paste the following Python code snippet into the new cell, and then replace the placeholders with appropriate values:
    import ibm_db
    %load_ext sql
    %sql db2+ibm_db://<user>:<password>@<host>:<port>/bigsql
    %sql insert into testschema.testtab values (647, 'a sample value', current timestamp)
    %sql select value from testschema.testtab where key = 647
  9. Run the cell by clicking the Run button next to the cell. The result should resemble the following example output:
    Figure 2. Output from using more concise Jupyter syntax
    Screen capture showing the output of more concise Jupyter syntax