March 19, 2018 By IBM Blog 5 min read

A predictive Machine Learning model from Build to Retrain

This post is an excerpt from our solution tutorial that walks you through the process of building a predictive machine learning model, deploying it as an API to be used in applications, testing the model and retraining the model with feedback data. All of this happening in an integrated and unified self-service experience on IBM Cloud.


In this post, the famous Iris flower data set is used for creating a machine learning model to classify species of flowers.

In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available.

For a deep dive into the differences between supervised vs. unsupervised learning, check out “Supervised vs. Unsupervised Learning: What’s the Difference?

Import data to a project

A project is how you organize your resources to achieve a particular goal within Watson Data Platform. Your project resources can include data, collaborators, and analytic tools like Jupyter notebooks and machine learning models.

You can create a project to add data and open a data asset in the data refiner for cleansing and shaping your data.

Create a project:

  1. Go to the IBM® Cloud catalog and select Data Science Experience under the Data & Analytics section. Create the service. Click on the Get Started button to launch the Data Science Experience dashboard.


  2. Create a New Project (Projects > All Projects > New Project). Add a name say iris_project and optional description for the project.

  3. Leave the Restrict who can be a collaborator checkbox unchecked as there’s no confidential data.

  4. Under Define Storage, Click on Add and choose an existing object storage service or create a new one (Select Lite plan > Create). Hit Refresh to see the created service.

  5. Under Define compute engine, Click on Add and choose an existing Spark service or create a new one.

  6. Click Create. Your new project opens and you can start adding resources to it.

Import data:

As mentioned earlier, you will be using the Iris data set. The Iris dataset was used in R.A. Fisher’s classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. This small dataset is often used for testing out machine learning algorithms and visualizations. The aim is to classify Iris flowers among three species (Setosa, Versicolor or Virginica) from measurements of length and width of sepals and petals. The iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.


Courtesy: DataCamp

Download iris_initial.csv which consists of 40 instances of each class. You will use the rest 10 instances of each class to re-train your model.

  1. Under Assets in your project, click the Find and Add Data icon
     


2. Under Load, click on browse and upload the downloaded iris_initial.csv.
 


3. Once added, you should see iris_initial.csv in the Data assets section of the project. Click on the name to see the contents of the data set.

Build a machine learning model

  1. Back in the Assets overview, under Models click on New model. In the dialog, add iris-model as name and an optional description.

  2. Under Machine Learning Service section, click on Associate a Machine Learning service instance to bind a machine learning service (Lite plan) to your project. Click Reload.
     


     

3. Select Model builder as your model type and Manual to manually create a model. Click Create.

For the automatic method, you rely on automatic data preparation (ADP) completely. For the manual method, in addition to some functions that are handled by the ADP transformer, you can add and configure your own estimators, which are the algorithms used in the analysis.

4. On the next page, select iris_initial.csv as your data set and click Next.

5. On the Select a technique page, based on the data set added, Label columns and feature columns are pre-populated. Select species (String) as your Label Col and petal_length (Decimal) and petal_width (Decimal) as your Feature columns.

6. Choose Multiclass Classification as your suggested technique.
 


 

7. For Validation Split configure the following setting:

  • Train: 50%,

  • Test 25%,

  • Holdout: 25%

8. Click on Add Estimators and select Decision Tree Classifier, then Add.

You can evaluate multiple estimators in one go. For example, you can add Decision Tree Classifier and Random Forest Classifier as estimators to train your model and choose the best fit based on the evaluation output.

9. Click Next to train the model. Once you see the status as Trained & Evaluated, click Save.


 

10. Click on Overview to check the details of the model.

Your journey doesn’t halt here.Following the steps below, you will deploy your model as an API, test it and retrain by creating a feedback data connection.

Was this article helpful?
YesNo

More from Cloud

The power of embracing distributed hybrid infrastructure

2 min read - Data is the greatest asset to help organizations improve decision-making, fuel growth and boost competitiveness in the marketplace. But today’s organizations face the challenge of managing vast amounts of data across multiple environments. This is why understanding the uniqueness of your IT processes, workloads and applications demands a workload placement strategy based on key factors such as the type of data, necessary compute capacity and performance needed and meeting your regulatory security and compliance requirements. While hybrid cloud has become…

Serverless vs. microservices: Which architecture is best for your business?

7 min read - When enterprises need to build an application, one of the most important decisions their leaders must make is what kind of software development to use. While there are many software architectures to choose from, serverless and microservices architectures are increasingly popular due to their scalability, flexibility and performance. Also, with spending on cloud services expected to double in the next four years, both serverless and microservices instances should grow rapidly since they are widely used in cloud computing environments. While…

Seamless cloud migration and modernization: overcoming common challenges with generative AI assets and innovative commercial models

3 min read - As organizations continue to adopt cloud-based services, it’s more pressing to migrate and modernize infrastructure, applications and data to the cloud to stay competitive. Traditional migration and modernization approach often involve manual processes, leading to increased costs, delayed time-to-value and increased risk. Cloud migration and modernization can be complex and time-consuming processes that come with unique challenges; meanwhile there are many benefits to gen AI assets and assistants and innovative commercial models. Cloud Migration and Modernization Factory from IBM Consulting®…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters