Import a JDBC driver by command line for Version 1.1.3
If a DSX user plans to write notebook or RStudio code that uses a relational database beyond the data source options already available in DSX Local, then the DSX administrator must first import a JDBC driver for it by completing the following steps:
- In the
/wdp/utils/
directory of the master node of the DSX cluster, enter the following command to import the driver JAR file:
wheremoveJarClasspath.sh -add jarfile
jarfile
represents the fully qualified path to the JAR file that contains the JDBC driver, for example:moveJarClasspath.sh -add /wdp/tmp/postgresql-42.1.1.jar
To remove a JAR file from DSX cluster: in the /wdp/utils directory of the master node, enter the following command:
moveJarClasspath.sh -remove jarfile-name
where jarfile-name
represents the name of the JAR file, for example:
moveJarClasspath.sh -remove postgresql-42.1.1.jar
Update the CLASSPATH to the JDBC driver in the notebooks for V1.1.3
DSX users can now include the imported database driver to their CLASSPATH. Notebook example:
import os
os.environ["CLASSPATH"] = os.environ["CLASSPATH"]+":/usr/local/spark/jars/postgresql-42.1.1.jar"
os.environ["CLASSPATH"]
Python 3.5 example:
import os
os.environ["CLASSPATH"] = os.environ["CLASSPATH"]+":/user-home/_global_/spark/jars/postgresql-42.1.1.jar"
os.environ["CLASSPATH"]
Troubleshooting tip: If the CLASSPATH does not work, DSX users should restart their kernel (for Jupyter), interpreter (for Zeppelin), or session (for RStudio) and try setting the CLASSPATH again.
Example connections
The following interfaces can connect to a relational database beyond what is supported in the Connections page.
Scala | Python | R |
---|---|---|
Spark APIs, java.sql.{Connnection,DriverManager} | Spark APIs, jaydebeapi | Spark APIs |
The following examples connect to a relational database beyond what is supported in the Connections page.
Scala syntax for V1.1.3
java.sql.{Connection,DriverManager}
import java.sql.{Connection,DriverManager}
val url = "jdbc:postgresql://9.87.654.321:5432/postgres"
val driver = "org.postgresql.Driver"
val username = "postgres"
val password = "password"
var connection:Connection = _
try {
Class.forName(driver)
connection = DriverManager.getConnection(url, username, password)
val statement = connection.createStatement
val rs = statement.executeQuery("SELECT * FROM myschema.cars")
while (rs.next) {
val make = rs.getString("make")
println("make = %s".format(make))
}
} catch {
case e: Exception => e.printStackTrace
}
connection.close
Python syntax for V1.1.3
import jaydebeapi, sys
import pandas as pd
df1 = None
#Enter the values for you database connection
driverClass = "org.postgresql.Driver"
URL = "jdbc:postgresql://<host-name>:27422/compose"
user = "admin"
password = "password"
schema = "public"
table = "cars"
if (sys.version_info >= (3,0)):
conn = jaydebeapi.connect(driverClass, URL, [user, password])
else:
conn = jaydebeapi.connect(driverClass, [URL, user, password])
query = 'select * from ' + (schema + '.' if (len(schema.strip()) != 0) else '') + table
df1 = pd.read_sql(query, con=conn)
df1.head()
R syntax for v1.1.3
url <- "jdbc:postgresql://9.87.654.321:5432/postgres"
user <- "postgres"
password <- "password"
dbtable <- "myschema.cars"
spark<-sparkR.session();
df <- read.jdbc(url, dbtable, user=user, password=password)
collect(df)