June 5, 2019 By Emma Tucker 4 min read

Here’s a common scenario: Alex is working on a data analytics project for her retail company to better understand success of shoe sales versus jewelry and predict future sales. Her company is split up by departments, so she has to go to the shoe line-of-business and the jewelry line-of-business for the data she needs for her analysis. She submits a form for each requesting data that would meet her business needs. She waits. And she waits. She meets with the team to clarify her request. She waits. And she waits.

Finally, they say she will have access to her data soon – the team is just masking the data for her. It ends up taking a few weeks to get her hands on the right data that she needs. Then it takes her another week to figure out what each column means and prepare the data for her project.

A few things that could have been sped up here: finding the right data, masking that data, then explaining the data.

A few things that could have been improved upon: validation that she has all the relevant, current data for her project and trust that the data is of high quality.

If Alex’s company used an enterprise data catalog, all of those pain points could potentially disappear.

What is a data catalog?

data catalog organizes your company’s information assets so it’s easy for people like Alex to find what they’re looking for. Libraries use catalogs to help readers find all of the books available in each of their branches. Readers can search on genre, reviews, and popularity; learn more about the book they want to check out; read the librarian’s reviews of the book; and then find that book in one of the library’s branches.

A data catalog is similar. A data catalog lets data analysts find all the data available in each database or application maintained by their company. Business analysts can search on data type, reviews, and popularity; preview the data; see what others say about it; better understand its quality; and then download the data asset for their project and analyze it.

On top of that, data catalogs which are tightly integrated with a governance platform, help your business comply with changing regulations and policies and help provide your data citizens access to governed data. After classification of data assets, rules can be created that anonymize or restrict access to certain data, so data personally-identifying information does not end up in the wrong hands.

5 reasons to have an enterprise data catalog

Speed and self-service. Rather than submitting requests to an IT group for data that will meet analysts’ business needs, Analysts simply search through a data catalog themselves. This frees up more time for the IT group and means that the analyst wouldn’t need to wait for them to get back to him or her. It provides self-service access to data to data citizens.


Meaningful context. When an analyst finds a data asset that would be useful to them, they can read a description, view business metadata and business term definitions, and read comments provided by others about the data. That way, the analyst can put each column in a data asset in the context of their business.

Improves trust and confidence in data. By previewing the data and profiling it, an analyst can very quickly see if certain fields have null or incorrect values. This makes cleansing the data even easier. The quality scores and social recommendations on the data asset help improve the confidence in data for an analyst to use.

Protects data while staying compliant. Instead of an IT professional masking each column, data rules automatically run based on automatic classification of data. So companies never have to worry about the wrong data getting into the wrong hands.

Why IBM Watson Knowledge Catalog?

It can sometimes feel like the wild West out there in the data catalog market. But remember, a standalone data catalog which cannot integrate tightly with your enterprise governance platform could potentially give bad quality data in the hands of your data citizens.

IBM Watson Knowledge Catalog, a machine learning powered data catalog, satisfies all of the key data catalog capabilities as well as provides seamless integration with IBM’s data integration, quality and governance products and other IBM Watson technologies for analysts and data scientists to use their data in reports, analytics projects, and models.

With IBM Watson Knowledge Catalog, Alex would’ve found out that jewelry is way more profitable than shoes in the time it took to submit her request to the departments. She then would have had another month to predict buying trends in other lines-of-business. Her IT department would have had more time to finish their data projects since they would have been less distracted with data requests. Everybody wins with a data catalog.

Try a no-cost trial today on IBM Cloud.

Was this article helpful?
YesNo

More from Analytics

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters