Tutorial: Getting started with Data Science Experience on IBM Integrated Analytics System

This tutorial shows how to get started with analyzing data using IBM Data Science Experience (DSX) Local on Integrated Analytics System.

This tutorial shows you how to carry out the following tasks:

Time required

5 minutes

Scenario

You are a data scientist who has been given access to Integrated Analytics System, and you want to learn how to use the integrated Data Science Experience to run analytics on data.

Difficulty

Beginner

Audience

Data scientists or anyone interested in exploring DSX on Integrated Analytics System.

Prerequisites

You will need to have a user account on Integrated Analytics System. If you don't have one, contact your Integrated Analytics System system administrator to create one for you. You will also need to have the link (URL) to the console login page of your Integrated Analytics System.

Log in to Integrated Analytics System

There are two ways to launch DSX on IAS:

Procedure

  • Launch DSX from the IAS web console:
    1. Click or enter the URL for the Integrated Analytics System web console login page. When the login page appears, enter your user ID and password.
      A successful login will bring you to the Integrated Analytics System web console home page.
      Remember: This is the web console home page for user accounts. Administrators use the admin login for administration tasks.
    2. Click on the web console menu in the upper-left corner and then click the DEVELOP ANALYTICS entry.
      The Develop analytics and machine learning applications page opens.
    3. Click the Launch DSX button.
    The Data Science Experience login page is opened.
  • Launch DSX from your web browser:
    1. In your browser address field, enter the URL of IAS, but specify the port as 8444.
      For example:
      https://9.1.2.3:8444
    2. Log into DSX.

Logging in to DSX for the first time

DSX on Integrated Analytics System requires separate credentials (user ID and password) from your Integrated Analytics System login.

Procedure

  • If you have been assigned a DSX user ID and password by your administrator, enter it now to launch DSX.
  • If you do not yet have your own DSX user ID, you can attempt to log in using the default DSX user ID and password, which are admin and password respectively. If you find that the password for the admin ID has been changed, contact your system administrator for assistance.

Results

After a successful login, you will see the DSX Community page.

Create your own user account in DSX

It is best to keep your work separate from the work of others on the system, so use the IBM Data Platform Manager to create a personal DSX account. If you already have your own DSX user ID, you can skip this task.

Procedure

  1. At the top of the DSX page, click the IBM Data Science Experience Local drop-down, and then select IBM Data Platform Manager.
    The Dashboard page appears.
  2. Click the menu in the upper-left corner and select User Management.
  3. On the User Management page, click Add users.
  4. In the Add users window, enter the requested information and then click Add.
    Restriction: The Username value needs to be unique in the DSX instance, so if you receive an error, try a different name.
    Once you click the Add button, you will get a message that the user ID has been created.
    Important: Copy down the temporary password provided in the message, since you will need it later. (You can edit the DSX user account later to change the password if you wish.)
  5. You need to log out of the default DSX administrator account so you can log in to your new DSX account. Click on the round A in the upper-right corner of the page and select Sign Out in the drop-down list.
    This is the Sign out selection in the DSX account menu.
    The DSX login page opens again. Enter you new user ID and the temporary password.

Results

A successful login will bring up the DSX Community page again, but this time you are logged in with your own account.

Run a sample analytics notebook

Procedure

  1. Download a sample notebook from the DSX Community page:
    1. On the Community page, click on the example notebook labeled Use Spark for Python to load data and run....
    2. In the upper-right corner of the page, click the Download icon:
      Download icon
      The Download window opens.
    3. Save the notebook file to a location of your choice.
  2. Create a project:
    1. Click the main web console drop-down menu at the upper-left of the page, and select My Projects. This brings you to the My Projects page.
      Since you don't have a project yet, the page shows the message:
      You currently have no projects. Let's get going.
    2. Click Create Project.
      The New Project page opens.
    3. Provide a name for your project, such as Project 1, then click Create.
  3. Add a sample notebook to the project:
    1. The Overview tab for your new project is displayed. Click add notebooks on the right side of the Notebooks section of the page.
      Tip: When you click add notebooks, use the appropriate browser gesture to open the link in a new browser tab. This allows you to have multiple notebooks open at the same time.
    2. On the Create Notebook page, click the From File tab.
    3. Enter a name for the notebook in the Name field. You can use the full name Use Spark for Python to load data and run SQL queries or a short name such as Notebook1.
    4. Click Browse. In the file dialog, locate and select the notebook (.ipynb) file that you downloaded.
    5. Click Create Notebook at the bottom right of the Create Notebook page.
      This should open the notebook and launch the Python2 runtime service.
  4. Run the sample notebook by clicking on the Run Cell, Select Below button to execute each cell in turn.

Results

The notebook demonstrates how to use Python to interact with the Spark service built into DSX. This self-contained example uses a simple data set of facts about cars to show how to do basic analytic operations using Spark. It shows how to create a Spark DataFrame, execute Spark functions on the DataFrame, and how to run SQL expressions against a DataFrame. Results of operations are shown in tabular form, for example:
Example sample notebook cell output