What's new and changed in watsonx.governance

IBM watsonx.governance updates can include new features and fixes. Releases are listed in reverse chronological order so that the latest release is at the beginning of the topic.

watsonx.ai and watsonx.governance Version 2.0.0

A new version of watsonx.governance was released in June 2024 with watsonx.ai and watsonx.governance 2.0.0.

Version 2.0.0 is installed on IBM Cloud Pak for Data 5.0.0 with the service operand version 2.0.0.

This release includes the following changes:

New features

This release of watsonx.governance includes the following features:

Assess use cases for EU AI Act applicability

By using the new EU AI Act applicability assessment, you can complete a simple questionnaire to assess your AI use cases and determine whether they are within the scope of the EU AI Act. The assessment can also help you to identify the risk category that your use cases align to: prohibited, high, limited, or minimal.

For more information, see Applicability Assessment in Solution components in Governance Console.

Create detached deployments for governing prompts for externally hosted large language models (LLMs)

A detached prompt template is a new asset for evaluating a prompt template for an LLM that is hosted by a third-party provider, such as Google Vertex AI, Azure OpenAI, or AWS Bedrock. The inferencing that generates the output for the prompt template is done on the remote model, but you can evaluate the prompt template output by using watsonx.governance metrics. You can also track the detached deployment and detached prompt template in an AI use case as part of your governance solution.

For more information, see:

New metrics for evaluating prompt templates

When you evaluate prompt templates in your watsonx.governance deployment spaces or projects, you can now run generative AI quality evaluations to measure how well your model performs retrieval-augmented generation (RAG) tasks with the following new metrics:

Faithfulness
Answer relevance
Unsuccessful requests

Results from these new evaluations are captured in factsheets in AI use cases.

For more information, see Generative AI quality evaluations.

Updates

The following updates were introduced in this release:

Calculate RAG metrics with Python SDK

You can now use the Watson OpenScale Python SDK to calculate metrics that can evaluate how well your LLM performs RAG tasks. These metrics include:

Content analysis
Keywords inclusion
Question robustness

Generate drift v2 evaluation metrics for additional data types

When you enable drift v2 evaluations, you can now generate the prediction drift, output drift, and input metadata drift metrics to measure the performance of unstructured text and unstructured image models. You can also generate the prediction drift and input metadata drift metrics for structured models.