Quick start: Evaluate and track a prompt template

Take this tutorial to learn how to evaluate and track a prompt template. You can evaluate prompt templates in projects or deployment spaces to measure the performance of foundation model tasks and understand how your model generates responses. Then, you can track the prompt template in an AI use case to capture and share facts about the asset to help you meet governance and compliance goals.

Required services
watsonx.ai
watsonx.governance

Your basic workflow includes these tasks:

  1. Open a project that contains the prompt template to evaluate. Projects are where you can collaborate with others to work with assets.
  2. Evaluate a prompt template using test data.
  3. Review the results on the AI Factsheet.
  4. Track the evaluated prompt template in an AI use case.
  5. Deploy and test your evaluated prompt template.

Read about prompt templates

With watsonx.governance, you can evaluate prompt templates in projects to measure how effectively your foundation models generate responses for the following task types:

  • Classification
  • Summarization
  • Generation
  • Question answering
  • Entity extraction

Read more about evaluating prompt templates in projects

Read more about evaluating prompt templates in deployment spaces

Watch a video about evaluating and tracking a prompt template

Watch Video Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.


Try a tutorial about evaluating and tracking a prompt template

In this tutorial, you will complete these tasks:




Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.



Task 1: Create a model inventory and AI use case

A model inventory is for storing and reviewing AI use cases. AI use cases collect governance facts for AI assets that your organization tracks. You can view all the AI use cases in an inventory.

Task 1a: Create a model inventory

Follow these steps to create a model inventory:

  1. From the Navigation Menu Navigation menu, choose AI governance > AI use cases.

  2. Manage your inventories:

    • If you have existing inventory, then you can skip to Create a new AI use case to use that inventory.
    • If you don't have any inventories, then click Manage inventories.
    1. Click New inventory.

    2. For the name, copy and paste the following text:

      Golden Bank Insurance Inventory
      
    3. For the description, copy and paste the following text:

      Model inventory for insurance related processing
      
    4. Clear the Add collaborators after creation option. You can restrict access at the inventory and AI use case level.

    5. Click Create.

  3. Close the Manage inventories page.

Checkpoint icon Check your progress

The following image shows the model inventory. You are now ready to create an AI use case.

Model inventory

Task 1b: Create an AI use case

This tutorial uses OpenPages to create and manage AI use cases. If you are not using OpenPages, then refer to Setting up an AI use case for the steps to create a use case without OpenPages.

An AI use case is a defined business problem that you can solve with the help of AI. Usually these are defined before any AI asset gets developed. Follow these steps to create an AI use case with OpenPages:

  1. Click New AI use case.

  2. For the Name, copy and paste the following text:

    Insurance claims processing AI use case
    
  3. For the Owner field, select your user name.

  4. For the Description, type Use case for evaluating the prompt templates for insurance claims processing for Golden Bank.

  5. For the Primary Business Entity, click Add.

    1. Select Catalogs.

    2. Click Done.

  6. Click Save.

Checkpoint icon Check your progress

The following image shows the AI use case. You are now ready to track the prompt template.

AI use case




Task 2: Create a project

You need a project to store the prompt template and the evaluation. Follow these steps to create a project based on a sample:

  1. Download the getting-started-with-watsonx-governance.zip file.

  2. From the Navigation Menu Navigation menu, choose Projects > All projects.

  3. On the Projects page, click New project.

  4. Select Local file.

  5. Upload the previously downloaded ZIP file.

  6. On the Create a project page, copy and paste the project name and add an optional description for the project.

    Getting started with watsonx governace
    
  7. Click Create.

  8. Click View new project to verify that the project and assets were created successfully.

  9. Click the Assets tab to view the project's assets.

Checkpoint icon Check your progress

The following image shows the project Assets tab. You are now ready to evaluate the sample prompt template in the project.

Sample project assets




Task 3: Evaluate the sample prompt template

The sample project contains a few prompt templates and CSV files used as test data. Follow these steps to download the test data and evaluate one of the sample prompt templates:

  1. Download the test data from the sample project. You need to provide a local file for the test data during evaluation.

    1. Click the Assets tab.
    2. For the Insurance claim summarization test data.csv file, click the Overflow menu Overflow menu, and choose Download.
    3. Save the CSV file locally.
  2. Click Insurance claim summarization to open the prompt template in Prompt Lab, and then click Edit.

  3. Click the Prompt variables Prompt variables.

    Note: To run evaluations, you must create at least one prompt variable.
  4. Scroll to the Try section. Notice the {input} variable in the Input field. You must include the prompt variable as input for testing your prompt.

  5. Click the Evaluate icon Evaluate.

  6. If prompted, click Associate a service instance to select the service to use for the evaluation.

    1. Select the appropriate service.
    2. Click Associate.
  7. Expand the Generative AI Quality section to see a list of dimensions. The available metrics depend on the task type of the prompt. For example, summarization has different metrics than classification.

  8. Click Next.

  9. Select the test data:

    1. Click Browse.
    2. Select the Insurance claim summarization test data.csv file that you previously downloaded.
    3. Click Open.
    4. For the Input column, select Insurance_Claim.
    5. For the Reference output column, select Summary.
    6. Click Next.
  10. Click Evaluate. When the evaluation completes, you see the test results on the Evaluate tab.

  11. Click the AI Factsheet tab.

    1. View the information on each of the sections on the tab.
    2. Click Evaluation > Develop > Test to see the test results again.

Checkpoint icon Check your progress

The following image shows the results of the evaluation. Now you can start tracking the prompt template in an AI use case.

Prompt template evaluation test results




Task 4: Start tracking the prompt template

You can track your prompt template in an AI use case to report the development and test process to your peers. Follow these steps to start tracking the prompt template:

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.
  2. Select the Getting started with watsonx governance project.
  3. Click the Assets tab.
  4. From the Overflow menu overflow menu for the Insurance claim summarization prompt template, select View AI Factsheet. Every AI asset has an AI factsheet; which includes detailed information about how the asset was built, it’s evaluation results across the AI lifecycle, and additional attachments.
  5. On the AI Factsheet tab, click the Governance page.
  6. Click Track an AI use case.
  7. Select the Insurance claims processing AI use case.
  8. Click Next.
  9. Select an approach. An approach is one facet of the solution to the business problem represented by the AI use case. For example, you might create approaches to track several prompt templates in a use case.
  10. Click Next.
  11. If you are using OpenPages, you are prompted to define an asset record. Select New asset record, and click Next. Create AI use case in OpenPages
  12. For the model version, select Experimental.
  13. Accept the default value for the version number.
  14. Click Next.
  15. Review the information, and then click Track asset.
  16. When model tracking successfully begins, click the View details icon View details to open the AI use case.
  17. Click the Lifecycle tab to see the prompt template in the Develop phase.

Checkpoint icon Check your progress

The following image shows the Lifecycle tab in the AI use case with the prompt template in the Develop phase. You are now ready to continue to the Validate phase.

The Lifecycle tab in the AI use case




Task 5: Create a new project for validation

Typically, the prompt engineer evaluates the prompt with test data, and the validation engineer validates the prompt. The validation engineer has access to the validation data that prompt engineers might not have. In this case, validation data occurs in a different project. Follow these steps to export the development project and import it as a new validation project to move the asset into the validation phase of the AI lifecycle:

  1. From the Navigation Menu Navigation menu, choose Projects > View all projects.

  2. Select the Getting started with watsonx governance project.

  3. Click the Import/Export icon Import/Export > Export project.

  4. Check the box to select all assets.

  5. For the project name, copy and paste the following text, and then click Save.

    validation project.zip
    ```1. Click **Export**.
    
  6. When the project export completes, click Back to project.

  7. From the Navigation Menu Navigation menu, choose Projects > View all projects.

  8. Click New project.

  9. Select Local file.

    1. Click Browse.

    2. Select the validation project.zip, and click Open.

    3. For the project name, copy and paste the following text:

      Validation project
      
    4. Click Create.

  10. When the project is created, click View new project.

Checkpoint icon Check your progress

The following image shows the validation project Assets tab. You are now ready to evaluate the sample prompt template in the validation project.

Validation project assets




Task 6: Validate the prompt template

Now you are ready to evaluate the prompt template in this validation project using the same evaluation process as before. Use the same test data set for evaluation. And select the same Input and Output columns as before. Follow these steps to validate the prompt template:

  1. Click the Assets tab in the Validation project.
  2. Repeat the steps in Task 2 to evaluate the Claims processing summarization prompt template.
  3. Click the AI Factsheet tab when the evaluation is complete.
  4. View both sets of test results:
    1. Click Evaluation > Develop > Test.
    2. Click Evaluation > Validate > Test.

Checkpoint icon Check your progress

The following image shows the validation test results. You are now ready to promote the prompt template to a deployment space, and then deploy the prompt template.

Prompt template evaluation test results




Task 7: Deploy the prompt template

Task 7a: Promote the prompt template to a deployment space

You promote the prompt template to a deployment space in preparation for deploying it. Follow these steps to prompte the prompt template:

  1. Click Validation project in the projects navigation trail.
  2. From the Overflow menu overflow menu for the Insurance claim summarization prompt template, select Promote to space.
  3. For the Target space, select Create a new deployment space.
    1. For the Space name, copy and paste the following text:

      Insurance claims deployment space
      
    2. For the Deployment stage, select Production.

      Important: You must select Production for the Deployment stage if you wish to move the deployment from the Evaluate stage to the Operate stage.
    3. Select your machine learning service from the list.

    4. Click Create.

    5. Click Close.

  4. Select the Insurance claims deployment space deployment space from the list.
  5. Check the option to Go to the space after promoting the prompt template.
  6. Click Promote.

Checkpoint icon Check your progress

The following image shows the prompt template in the deployment space. You are now ready to create a deployment.

Prompt template in deployment space

Task 7b: Deploy the prompt template

Now you can create an online deployment of the prompt template from inside the deployment space. Follow these steps to create a deployment:

  1. From the Insurance claims summarization asset page in the deployment space, select New deployment.

  2. For the deployment name, copy and paste the following text:

    Insurance claims summarization deployment
    
  3. Click Create.

Checkpoint icon Check your progress

The following image shows the deployed prompt template.

Deployed prompt template

Task 7c: View the deployed prompt template

Follow these steps to view the deployed prompt template in its current phase of the lifecycle:

  1. View the deployment when it is ready. The API reference tab provides information for you to use the prompt template deployment in your application.
  2. Click the Test tab. The Test tab allows you to submit an instruction and Input to test the deployment.
  3. Click Generate. Close the results window.
  4. Click the AI Factsheet tab. The AI Factsheet shows that the prompt template is now in the operate phase.
  5. Scroll down to the bottom of the AI Factsheet page, and click the arrow for more details.
  6. Select the Evaluation > Operate > Deployment 1 page.
  7. Click the View details icon View details at the top of the factsheet to open the AI use case.
  8. Click the Lifecycle tab.
  9. Click the Insurance claim summarization prompt template in the Operate phase. When you are done, click Cancel.
  10. Click the Insurance claims summarization deployment prompt template deployment in the Operate phase.

Checkpoint icon Check your progress

The following image shows the prompt template prompt template in the Operate phase of the lifecycle.

Prompt template in the Operate phase




Next steps

Try one of the other tutorials:

Additional resources

Parent topic: Quick start tutorials