Import a JDBC driver by command line for Version 1.1.3

If a DSX user plans to write notebook or RStudio code that uses a relational database beyond the data source options already available in DSX Local, then the DSX administrator must first import a JDBC driver for it by completing the following steps:

  1. In the /wdp/utils/ directory of the master node of the DSX cluster, enter the following command to import the driver JAR file:
    moveJarClasspath.sh -add jarfile
    
    where jarfile represents the fully qualified path to the JAR file that contains the JDBC driver, for example:
    moveJarClasspath.sh -add /wdp/tmp/postgresql-42.1.1.jar
    

To remove a JAR file from DSX cluster: in the /wdp/utils directory of the master node, enter the following command:

moveJarClasspath.sh -remove jarfile-name

where jarfile-name represents the name of the JAR file, for example:

moveJarClasspath.sh -remove postgresql-42.1.1.jar

Update the CLASSPATH to the JDBC driver in the notebooks for V1.1.3

DSX users can now include the imported database driver to their CLASSPATH. Notebook example:

import os
os.environ["CLASSPATH"] = os.environ["CLASSPATH"]+":/usr/local/spark/jars/postgresql-42.1.1.jar"
os.environ["CLASSPATH"]

Python 3.5 example:

import os
os.environ["CLASSPATH"] = os.environ["CLASSPATH"]+":/user-home/_global_/spark/jars/postgresql-42.1.1.jar"
os.environ["CLASSPATH"]

Troubleshooting tip: If the CLASSPATH does not work, DSX users should restart their kernel (for Jupyter), interpreter (for Zeppelin), or session (for RStudio) and try setting the CLASSPATH again.

Example connections

The following interfaces can connect to a relational database beyond what is supported in the Connections page.

Scala Python R
Spark APIs, java.sql.{Connnection,DriverManager} Spark APIs, jaydebeapi Spark APIs

The following examples connect to a relational database beyond what is supported in the Connections page.

Scala syntax for V1.1.3

java.sql.{Connection,DriverManager}

import java.sql.{Connection,DriverManager}
val url = "jdbc:postgresql://9.87.654.321:5432/postgres"
val driver = "org.postgresql.Driver"
val username = "postgres"
val password = "password"
var connection:Connection = _
try {
   Class.forName(driver)
   connection = DriverManager.getConnection(url, username, password)
   val statement = connection.createStatement
   val rs = statement.executeQuery("SELECT * FROM myschema.cars")
   while (rs.next) {
      val make = rs.getString("make")
      println("make = %s".format(make))
   }
} catch {
   case e: Exception => e.printStackTrace
}
connection.close

Python syntax for V1.1.3

import jaydebeapi, sys
import pandas as pd
df1 = None

#Enter the values for you database connection
driverClass = "org.postgresql.Driver"
URL = "jdbc:postgresql://<host-name>:27422/compose"
user = "admin"
password = "password"
schema = "public"
table = "cars"

if (sys.version_info >= (3,0)):
   conn = jaydebeapi.connect(driverClass, URL, [user, password])
else:
   conn = jaydebeapi.connect(driverClass, [URL, user, password])

query = 'select * from ' + (schema + '.' if (len(schema.strip()) != 0) else '') + table
df1 = pd.read_sql(query, con=conn)
df1.head()

R syntax for v1.1.3

url <- "jdbc:postgresql://9.87.654.321:5432/postgres"
user <- "postgres"
password <- "password"
dbtable <- "myschema.cars"

spark<-sparkR.session();
df <- read.jdbc(url, dbtable, user=user, password=password)
collect(df)