RStudio overview
R is a popular statistical analysis and machine-learning package that enables data management and includes tests, models, analyses, and graphics, and enables data management. RStudio, included in Watson Studio Local, provides an IDE for working with R.
An RStudio instance created in Cloud Pak for Data is allocated with 1 CPU and 1 GB of RAM by
default. The maximum amount of CPU and RAM compute resources available to the active RStudio session
is constrained only by the Kubernetes resource limits defined for the dsx cluster
namespace by the administrator and worker nodes' physical resource capacity.
Tasks related to setting up and working with RStudio:
- Install a package to connect R Studio to relational data sources
- Change the Spark version
- Transfer files to and from your user project folder
- Using sample scripts
Install a package to connect R Studio to relational data sources
The steps you follow to connect RStudio to relational data sources depends on whether the cluster has Internet access.
If the cluster has Internet access, the administrator must complete the following steps:
- In the Tools shell, install the database driver in the
/user-home/directory. Example:pwd /user-home/1003/DSX_Projects/project-nb-test/rstudio cd /user-home/1003/;wget https://jdbc.postgresql.org/download/postgresql-42.2.0.jar - Configure Java on the
pod:
R CMD javareconf - Return to the RStudio script and install the RJDBC module and
dependencies:
install.packages("RJDBC",dep=TRUE)
PostgreSQL example:
library(RJDBC)
driverClassName <- "org.postgresql.Driver"
driverPath <- "/user-home/1003/postgresql-42.2.0.jar"
url <- "jdbc:postgresql://9.876.543.21:27422/compose"
databaseUsername <- "admin"
databasePassword <- "ABCDEFGHIJKLMNOP"
databaseSchema <- "public"
databaseTable <- "cars"
drv <- JDBC(driverClassName, driverPath)
conn <- dbConnect(drv, url, databaseUsername, databasePassword)
#dbListTables(conn)
data <- dbReadTable(conn, databaseTable)
#data <- dbReadTable(conn,
paste(databaseSchema,'.',databaseTable, sep='')
data
Change the Spark version
To change the version of Spark for RStudio, follow the set of steps appropriate for your library:
- Sparklyr library
- To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the
spark_connect()function.
- SparkR library
- To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the $SPARK_HOME
environment variable to specify the Spark 2.2.1 installation location in
RStudio:
Sys.setenv("SPARK_HOME"="/usr/local/spark-2.2.1-bin-hadoop2.7") # import SparkR library(SparkR, lib.loc = "/usr/local/spark-2.2.1-bin-hadoop2.7/R/lib") # initial sc sc = sparkR.session(master="spark://spark-master221-svc:7077", appName="dsxlRstudioSpark221")
See SparkR (R on Spark) for more information.
Transfer files to and from your user project folder
Using the File Explore in RStudio, a Watson Studio Local user can upload and download files between their project folder and a local disk outside of the cluster:
- RStudio files
-
- To download an RStudio file, select the file, click more, and click Export to save the file to your local disk.
- To upload an RStudio file, click Upload and select the file to upload.
- Jupyter files
-
- To download a Jupyter file, click ..., type ~/../jupyter, select the file, click more, and click Export to save the file to your local disk.
- To upload an Jupyter file, click Upload and select the file to upload.
Using sample scripts
Sample scripts are provided as examples of how to work with Spark. There is a readme.txt file included that provides an overview of the samples. To run any of these sample scripts you must first add your own R code to connect to your Spark cluster.
Learn more
Read and write data to and from IBM Cloud object storage in RStudio