RStudio overview

R is a popular statistical analysis and machine-learning package that enables data management and includes tests, models, analyses, and graphics, and enables data management. RStudio, included in Watson Studio Local, provides an IDE for working with R.

An RStudio instance created in Cloud Pak for Data is allocated with 1 CPU and 1 GB of RAM by default. The maximum amount of CPU and RAM compute resources available to the active RStudio session is constrained only by the Kubernetes resource limits defined for the dsx cluster namespace by the administrator and worker nodes' physical resource capacity.

Tip: For information about how to set up and start using RStudio, see the blog post Using RStudio in IBM Data Science Experience, and Using RStudio on the RStudio Support site.

Tasks related to setting up and working with RStudio:

Install a package to connect R Studio to relational data sources
Change the Spark version
Transfer files to and from your user project folder
Using sample scripts

Install a package to connect R Studio to relational data sources

The steps you follow to connect RStudio to relational data sources depends on whether the cluster has Internet access.

For clusters without Internet access: To connect to relational data sources from RStudio from a cluster without Internet access, the Watson Studio Local administrator must copy the required packages to your RStudio pods and then follow the installation steps specific to installing from a downloaded package.

If the cluster has Internet access, the administrator must complete the following steps:

In the Tools shell, install the database driver in the /user-home/ directory. Example:

pwd
/user-home/1003/DSX_Projects/project-nb-test/rstudio 
cd /user-home/1003/;wget
https://jdbc.postgresql.org/download/postgresql-42.2.0.jar

Configure Java on the pod:
```
R CMD javareconf
```
Return to the RStudio script and install the RJDBC module and dependencies:
```
install.packages("RJDBC",dep=TRUE)
```

PostgreSQL example:

library(RJDBC)

driverClassName <- "org.postgresql.Driver"
driverPath <- "/user-home/1003/postgresql-42.2.0.jar"
url <- "jdbc:postgresql://9.876.543.21:27422/compose"
databaseUsername <- "admin"
databasePassword <- "ABCDEFGHIJKLMNOP"
databaseSchema <- "public"
databaseTable <- "cars"
drv <- JDBC(driverClassName, driverPath)
conn <- dbConnect(drv, url, databaseUsername, databasePassword)
#dbListTables(conn)
data <- dbReadTable(conn, databaseTable)
#data <- dbReadTable(conn,
paste(databaseSchema,'.',databaseTable, sep='')
data

Change the Spark version

To change the version of Spark for RStudio, follow the set of steps appropriate for your library:

Sparklyr library

To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the spark_connect() function.

To connect to the Spark 2.2.1 cluster:

sc <- spark_connect( master =
"spark://spark-master221-svc:7077",
spark_home="/usr/local/spark-2.2.1-bin-hadoop2.7" )`

To connect to the Spark 2.0.2 cluster:

sc <- spark_connect( master =
"spark://spark-master-svc:7077" )

SparkR library

To change the default Spark 2.0.2 service to a Spark 2.2.1 service, use the $SPARK_HOME environment variable to specify the Spark 2.2.1 installation location in RStudio:

Sys.setenv("SPARK_HOME"="/usr/local/spark-2.2.1-bin-hadoop2.7")
# import SparkR
library(SparkR, lib.loc =
"/usr/local/spark-2.2.1-bin-hadoop2.7/R/lib")
# initial sc
sc = sparkR.session(master="spark://spark-master221-svc:7077",
appName="dsxlRstudioSpark221")

See SparkR (R on Spark) for more information.

Transfer files to and from your user project folder

Using the File Explore in RStudio, a Watson Studio Local user can upload and download files between their project folder and a local disk outside of the cluster:

RStudio files

To download an RStudio file, select the file, click more, and click Export to save the file to your local disk.
To upload an RStudio file, click Upload and select the file to upload.

Jupyter files

To download a Jupyter file, click ..., type ~/../jupyter, select the file, click more, and click Export to save the file to your local disk.
To upload an Jupyter file, click Upload and select the file to upload.

Using sample scripts

Sample scripts are provided as examples of how to work with Spark. There is a readme.txt file included that provides an overview of the samples. To run any of these sample scripts you must first add your own R code to connect to your Spark cluster.

Learn more

Read and write data to and from IBM Cloud object storage in RStudio