May 26, 2020 By Emma Tucker
Rohan Vaidyanathan
4 min read

Artificial Intelligence has penetrated every industry in some form or another. From powering recommendation engines for consumer products to helping extend credit products in a more efficient manner, AI is becoming an imperative that no C-level executive can choose to delay. Even amid the COVID-19 pandemic in the last few months, we have seen encouraging use of AI for tracking the spread of disease as well as accelerating the discovery of vaccines.

As businesses start to scale the use of AI as a transformative power to innovate and be more efficient, they have to manage the risks that come from it. Specifically, when dealing with sensitive customer data and in regulated industries, governance is a mandatory aspect of operations. However, as AI becomes more prevalent there are new gaps which need to be addressed in governing the lifecycle of data as well as the models trained on those data. At the same time, governance processes should not impede the iterative nature of data science experiments that help build and operate AI applications.

Governing for control

In the data management space, governance processes are important to comply with enterprise or industry regulations. They are also important to protect sensitive customer data, loss of which can invite financial as well as reputational damage to a brand. The same extends to AI models which may exhibit behavior that may be unfair or downright harmful for consumers.

While the best practices to govern data have been improved over the years, we need similar best practices for models. The additional complexity of governing models is that they are frequently trained; as a result there are a number of versions of the model and corresponding data sets on which they are trained. The provenance of data, models and the associated metadata of any glue code and pipelines have to be traced and documented for audits. In addition, it is important to document the techniques used to train the model, the hyperparameters used, the metrics from testing phases etc. in order to provide complete transparency of the model’s behavior. Before the model is pushed into production, they have to be validated by an independent group in order to evaluate the risks to business. When they are in production, they have to be continuously monitored for fairness, quality, drift as well as provide easy to use explanations of the predictions.

A side effect of this requirement is that data scientists and model operations teams now have to create an extensive set of documents to describe the model. According to Brandon Purcell, Principal Analyst at Forrester, explainable AI isn’t just in explaining each output and how it was determined. It also requires explaining how the AI model was built, what data was used, whether you can trust that data, if it was biased, if it complied with policies and regulations, and ensure it’s in production only for its intended use.

One financial services company that we spoke to writes an approximately 40 pages long document for every model that needs to be pushed into production. In order to be effective, the collection of this information and documentation needs to be automated. Apart from documentation, active enforcement of policies and rules are required in order to ensure that models exhibiting biased behavior do not go into production and do not lead to unfavorable outcomes. Even before models are developed these policies and rules should prevent the use of bad data through data quality checks and actively preventing use to build models.

Coordinated actions to track models, associated data and metadata across the lifecycle automatically, are thus imperative to be prepared to mitigate business risks from unwanted model behavior as well as be prepared for compliance audits.

Governing for efficiency and outcomes

While the importance of data governance as a practice is clear from the perspective of compliance and risk management, in the AI lifecycle it can also help to improve process efficiency.

Data science practitioners experiment with a large number of models before converging on a few that will drive the required business outcomes. Through that process, they access, combine and transform a large amount of data. During the hyperparameter optimization phase they experiment with a wide variation of model training parameters. Reproducibility of all this information and experiments is key for successful deployment, collaboration and for further enhancement of these models. Another characteristic of data science teams is the talent pool – there is a significant churn in this space due to high demand in the market; another issue is new data scientists joining a team or ones who are not as experienced may need to take these models and deploy or extend them. Without detailed information on data, the transformations applied to them, model training experiments etc., it is impossible for them to be efficient.

Fortunately, metadata captured through the governance process can be used for knowledge management to improve the efficiency of data science teams. You could imagine comparing the outcomes of several different models and then tracing back to the data, modeling techniques and experiments to reproduce a similar process for more models. This not only helps a DS team manager stay sane but also codifies and operationalizes best practices through tools.

IBM is innovating to enhance governance in the AI lifecycle

Over the last couple of years, IBM has been at the forefront of innovating in the space of AI Governance. IBM Cloud Pak for Data provides a full stack of components for every stage of the AI lifecycle. It comes with built-in governance tools like Watson Knowledge Catalog as well as purpose built AI model risk management tools like Watson OpenScale. IBM Research has been at the industry frontier through its work in AI fairness, explainability and standardized documentation of AI models. Through 2020, we will be shipping enhancements to the Cloud Pak for Data platform in order to further push the boundaries of governance of AI applications.

Register for the upcoming webinar featuring Forrester and IBM titled, AI Governance: Drive compliance, efficiency and outcomes

Interested in test driving the product? Sign up to receive updates and an invitation to the AI Governance beta program.

Accelerate your journey to AI.

Was this article helpful?
YesNo

More from Analytics

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters