Updated: 20 September 2024
Contributors: Jim Holdsworth, Matthew Kosinski
Data governance is the data management discipline that focuses on the quality, security and availability of an organization’s data. Data governance helps ensure data integrity and data security by defining and implementing policies, standards and procedures for data collection, ownership, storage, processing and use.
The goal of data governance is to maintain safe, high-quality data that is easily accessible for data discovery and business intelligence initiatives. Acting rather like an air traffic control hub, the data governance function helps ensure that verified data flows through secured pipelines to trusted endpoints and users.
Artificial intelligence (AI), big data and digital transformation efforts are the primary drivers of data governance programs. As the volume of data increases from new data sources, such as Internet of Things (IoT) technologies, organizations need to reconsider their data management practices to scale their business intelligence (BI) efforts.
Data governance programs can help organizations protect and manage large amounts of data by improving data quality, reducing data silos, enforcing compliance and security policies and distributing data access appropriately.
This ebook explores topics related to data governance and privacy such as scalability, enterprise-wide standards and data lineage.
Govern generative AI models built in watsonx.ai and those built on third-party platforms
Register for the ebook on generative AI
Data governance is a subset of data management, which is the overarching practice of collecting, processing and using data securely and efficiently to support strategic decision-making and improve business outcomes.
While data management includes data governance, it also includes other areas of the data lifecycle, such as data processing, data storage and data security. Moreover, the various aspects of the data management process all influence one another.
Because these other areas of data management can impact data governance, various teams must work together to design and follow a data governance strategy.
For example, a data governance team might identify commonalities across disparate data sets. If they want to integrate that data, they’ll usually work with a data management team to define the data model and data architecture to facilitate those linkages.
Another example is data access, where a data governance team might set the policies concerning access to specific types of data, such as personally identifiable information (PII). Then, a data management team will provide that access directly or create the mechanism to provide that access, often through role-based access control (RBAC).
A data governance framework details an organization’s structures and processes for managing critical data assets. It defines data ownership and responsibilities and specifies how data should be handled to maintain data quality, security and compliance.
There is no one-size-fits-all framework, as frameworks are typically tailored roadmaps for a particular organization’s unique data systems, data sources, industry protocols and government regulations.
Data governance frameworks commonly address items such as:
Data governance programs typically define a specific goal or set of goals, such as enhancing data quality, supporting compliance or enabling data-driven decision-making. They also select metrics to measure progress toward these goals. Key governance metrics might include:
Reductions in data errors and redundancy.
Cost reductions from greater efficiency and faster time-to-market.
Data consistency and completeness.
The level of data literacy and process compliance of employees.
Governance programs also define the roles and responsibilities of all involved: steering committee, data owners, data stewards and stakeholders.
Governance frameworks set parameters around the data to be governed and the wanted outcomes. This includes setting guidelines for data formats, data models, master data management (MDM), metadata, naming conventions and more.
Governance frameworks often map data flows and define how data will be collected, stored, moved and archived. They might also identify the hardware, software and services that will support governance efforts and the organization's broader data architecture.
Some governance frameworks might define data scopes, which are access parameters for specific data assets, such as master data, metadata and historical data. A data scope can help ensure that users and apps only have access to the data they need and no one has access to data they shouldn't.
Governance frameworks outline testing, auditing and record-keeping procedures to maintain the governance program's transparency and explainability.
Regular audits can help verify that users are complying with the data governance framework. Audits can also help identify ways that the governance program must evolve to account for new data, processes or technologies.
Finally, audits can also help organizations achieve—and prove—regulatory compliance.
Technology plays an important role in effective data governance. Enterprise data governance tools can vary from comprehensive platforms to specialized point solutions. Organizations choose different tools depending on their unique data architectures and governance frameworks.
Common capabilities of data governance solutions include:
Automatically discover and classify data.
Enforce data protection rules and role-based access controls.
Address privacy and compliance requirements.
Automation of metadata management, data cataloging and data lineage tracking.
Support for a business glossary.
Data governance solutions can handle various data formats. Some offer visualization capabilities to enhance the understanding of complex data sets and relationships, making it easier to identify trends, outliers and areas that require attention.
Implementing a strong data governance framework can help organizations realize a wide variety of benefits:
Organizations cannot make effective business decisions if those decisions are based on flawed data. Data governance can help ensure data integrity, accuracy, completeness and consistency through the creation of a framework that supports robust data stewardship a strong end-to-end data management process.
Trustworthy data helps organizations discover new opportunities, better understand their customers and workflows and optimize overall business performance.
A lack of data governance might lead to errors in performance metrics that steer an organization in the wrong direction, while data governance tools can help address inaccuracies before they influence business strategy.
For example, data lineage tools can help data owners trace data throughout its lifecycle, including any transformations the data experience during extract, transform, load (ETL) or extract, load, transform (ELT) processes. This enables organizations to identify and remedy the root causes of data errors.
When data access is restricted across an organization, it can limit innovation, create dependencies on subject matter experts (SMEs) and slow business processes.
Data governance programs distribute data access appropriately, giving each department or individual access only to the data they need. This enables cross-functional teams to work together more closely and efficiently while keeping data safe.
A properly governed data system can provide a single source of truth across an entire organization. Decision-making can be improved when all parties are working with the same data sets.
Centralizing data definitions and metadata in a single data catalog can help reduce confusion and inefficiencies. This documentation, in turn, becomes the foundation for self-service solutions that enable consistent data and data access across the organization.
Data governance policies often include operations to more easily meet government regulations regarding sensitive data and privacy, such as the EU’s General Data Protection Regulation (GDPR), the US Health Insurance Portability and Accountability Act (HIPAA) and industry requirements such as the Payment Card Industry Data Security Standards (PCI DSS). Violations of these regulatory requirements might result in costly government fines and public backlash.
Data governance tools help organizations set guardrails that can prevent data breaches, leaks and misuse. Governance frameworks help build data systems that are clear, explainable, fair and inclusive. In turn, these data systems safeguard privacy and security and maintain customer loyalty and trust.
In an IDC survey, only 45.3% of respondents said they had "rules, policies and processes to enforce their responsible AI principles" to protect against security breaches, liability concerns, exposed customer data and regulatory risk.1
Data governance involves understanding the origin, sensitivity and lifecycle of all the data that an organization uses. This is the foundation for any AI governance practice and is crucial in mitigating various enterprise risks.
Data governance helps organizations bring high-quality data to AI and ML initiatives while protecting that data and complying with relevant rules and regulations. For example, governance tools can help ensure that sensitive personal data is not fed to an AI when it shouldn't be.
Having the right data is the foundation for advanced data analytics and data science initiatives. Carefully governed data enables valuable initiatives such as business intelligence reporting or more complex predictive machine learning (ML) projects.
For example, properly profiling data—reviewing and cleansing data to better understand how it is structured—can help make better sense of the relationship between different data sets and sources.
Data governance initiatives can face many hurdles in implementation. Some of these challenges include:
Effective data governance programs generally require sponsorship at two levels: executives and individual contributors. Chief data officers (CDOs) and data stewards are critical in the communication and prioritization of data governance within an organization.
The CDOs can provide oversight and enforce accountability across data teams to help ensure that data governance policies are adopted. Data stewards can help promote awareness of these policies to data producers and consumers to encourage compliance across the organization.
Without appropriate sponsorship, data users might be unaware of, or unconcerned with, governance policies. This can result in non-compliance, poor data integrity and compromised data security.
Without the correct tools and data architecture, organizations might struggle to deploy an effective data governance program.
For example, teams might discover redundant data across different functions. To enable effective governance, data architects need to develop appropriate data models and data architectures to merge and integrate data across storage systems.
Teams might also need to adopt a data catalog to create an inventory of data assets across an organization. Or if they already have one, they might need to create a process for metadata management, which helps ensure that the underlying data is relevant and up-to-date.
Data governance, especially in hybrid and multicloud environments, often involves data stored in multiple formats across multiple providers and locations. Moreover, data might reside in different types of data stores, such as data lakes, data lakehouses and data warehouses.
Shadow IT can throw an additional wrench into the process. In a TechTarget study, the second-most common data security challenge reported was that employees were signing up for cloud applications and services without IT approval.2
This distribution of data can make it difficult to track and monitor data flows and data usage. Data governance requires a clear understanding of data sources, destinations, transformations, dependencies, ownership, access rights and responsibilities.
Enforcing data governance policies across multiple environments might require coordination among different stakeholders, such as data owners, data stewards, data consumers and data regulators.
The rise of self-service analytics and business intelligence presents data governance with new challenges.
Access requests from more users are coming in faster than before, but governance teams need to balance speed and accessibility with privacy and security concerns. Furthermore, streaming data systems and procedures must be finely tuned to avoid data leakage.
When providing the data that powers AI training and operations, many data storage and governance tools fall short.
After all, AI is inherently more complex than standard IT-driven processes and capabilities—raising the importance of active and informed data governance. A KPMG report highlights the AI governance gap as one of the top risks currently threatening businesses.3 For example, without appropriate guardrails in place, AI might inadvertently expose sensitive PII or corporate secrets.
To reduce AI risks and complexities, organizations can combine AI-optimized data storage capabilities with data governance programs devised with AI in mind.
Planning and creation of a data governance framework takes time and effort across multiple stakeholders and teams. Common practices that organizations use when implementing governance programs include:
Automating certain parts of the data governance process can help improve efficiency and reduce errors. Data governance and management tools can help automate routine tasks such as:
Strong data security and access controls are fundamental to any data governance framework. At the same time, organizations want data access to be as frictionless as possible for those with the authorization to see and use specific data sets. Without this easy access to self-service information, collaboration and new insights are hampered.
Many organizations struggle to manage their data due to a lack of visibility. A central data catalog can operate as the single source of truth, enabling data integration and governance initiatives.
According to a Gartner report, demand for data catalogs is rising as organizations struggle with finding, inventorying and analyzing distributed and diverse data assets. With a robust data catalog, organizations can more easily locate and classify information at scale, allowing for better enforcement of data governance policies.
Many organizations find it helpful to create a clear governance roadmap. Maturity models can provide this roadmap.
A data governance maturity model is a tool that helps organizations assess the current state of their data governance program, set goals and track progress over time.
Organizations can establish regular assessment and reporting mechanisms to monitor data and governance metrics over time. These assessments can help the organization identify issues and make improvements to governance processes.
Regularly reviewing the framework and adjusting it based on feedback, new regulations or changes in business strategy helps the framework stay relevant and effective.
Additionally, assessments can foster a culture that values data as a strategic asset, supporting effective business intelligence and data use across the organization.
Predict outcomes faster using a platform built with data fabric architecture. Collect, organize and analyze data, no matter where it resides.
Activate business-ready data for AI and analytics with intelligent cataloging, backed by active metadata and policy management.
Software-defined storage for building a global AI / HPC / analytics data platform.
The hybrid, open data lakehouse to power AI and analytics with all your data, anywhere.
See one of IBM's data integration tools, IBM® DataStage® and IBM's next-gen data store, watsonx.data, in action.
Cloud Pak for Data provides the methods your enterprise needs to automate data governance. Help ensure data accessibility, trust, protection, security and compliance.
Thanks to its holistic approach to data quality management, IBM was named a leader in augmented data quality solutions.
1 IDC MarketScape: Worldwide AI Governance Platforms 2023 Vendor Assessment, IDC, 2023.
2 The Need for Data Compliance in Today’s Cloud Era, Enterprise Strategy Group by TechTarget, April 2023. (PDF, 867 KB).
3 Top risk forecast, KPMG, 2024. (Link resides outside ibm.com).