Experian’s Business Information Services (BIS) unit spent years building a valuable and unique database of corporate relationships to better serve customers. The unit created this database of corporate hierarchies using technology to do entity matching, and people who evaluated the matches and updated them based on research and human evaluation.

This process was time-consuming and limited the number of company hierarchies the team could evaluate and match. It maintained a subset of corporate hierarchies out of a full universe of companies available to them due to the intense manual effort required. Plus, because of how much human involvement was needed, they were limited on how often they could refresh the hierarchies.

The IBM Data Science Elite team had a simple mission: apply AI to learn what Experian has done over the years building corporate hierarchies and then apply that to the full universe of companies that they traditionally couldn’t evaluate. The goal was to increase the number of corporate hierarchies and increase the frequency of corporate hierarchy matching.

The results? AI and machine learning are now helping Experian solve a problem building and maintaining business families and corporate linkages with a potential 500 percent increase in coverage and 80 percent reduction in cost.

Our team began with a discovery workshop to understand the problem. This included digging into the data with an open discussion of the current process and what they would like it to be. This included a team of business experts from Experian, plus stakeholders who understood the value of a new solution and an IBM Data Scientist Elite who could help develop a new approach. After the workshop, we put together a plan which included leveraging machine learning. The plan was to train new machine learning models using Experian’s existing, validated hierarchies. Each hierarchy is a company with thousands of sub companies matched with years of Experian expertise, intellectual property and software.

We then documented our approach, defined some agile sprints and moved to project kickoff.

Next we needed a platform – something with key open source data science and machine learning libraries that would meet Experian’s strict guidelines on encryption and key management. It would have to scale with the horsepower needed for such a complex problem. We especially needed to use GPUs, and we needed something that was quick to get started.

With that in mind, we spun up a Watson Studio environment to perform our modeling; Watson Machine Learning for our model deployment and scoring; Object Storage for Data Storage and Key Protect for Data Encryption and Security. All components were spun up on IBM Cloud for the project.

We started with the data. We uploaded several extracts which included Experian’s base data files. Those files contained corporate hierarches and relevant features such as address, city and website and many others. This added up to millions of rows of data. The team had to perform several cycles of data sampling, understanding, preparation and definition. We did this working very closely with business stakeholders.

In the next sprint, the team performed modeling, which included feature engineering, blocking and evaluation of several machine learning techniques, including binary classification algorithms, logistic regression, neural networks and recurrent neural networks (RNN).

The team determined that RNN in a binary classification achieved the best results with 95 percent accuracy. Matching the hierarchies previously took years of application and manual work. But now with a new RNN model, the model found more matches than the existing process with very good accuracy. In the final sprint, the team deployed, validated and scored additional hierarches using the IBM Watson Machine Learning deployment service.

In a few months, with the goal of scaling AI to impact all corporate hierarchies in BIS, the team had a validated an approach to a new, innovative AI system for corporate hierarchy matching. An aspect of project was to estimate the computational needs and system design for a full entity matching system. We estimated that to launch a full entity matching system with the current data and a 4-way ensemble of RNNs, Experian would initially have to train hundreds of models. This would require access to a great amount of GPU processing, and we would need to build several components that would have to interact.

We sketched a workflow for the entity matching system that we proposed to be run on IBM Cloud.

This was just the start. In a short time, the team developed 16 notebooks for data preparation, blocking, modeling and predictions. The language of choice was Python, with a heavy reliance on libraries including Pandas, NumPy and Keras with a Tensorflow backend.

The work set the BIS team on path to free them from a manual process by using AI.

Schedule a one-one-one consultation with experts who have worked with thousands of clients to build winning data, analytics and AI strategies. Visit ibm.com/analytics.

Was this article helpful?
YesNo

More from Cloud

New 4th Gen Intel Xeon profiles and dynamic network bandwidth shake up the IBM Cloud Bare Metal Servers for VPC portfolio

3 min read - We’re pleased to announce that 4th Gen Intel® Xeon® processors on IBM Cloud Bare Metal Servers for VPC are available on IBM Cloud. Our customers can now provision Intel’s newest microarchitecture inside their own virtual private cloud and gain access to a host of performance enhancements, including more core-to-memory ratios (21 new server profiles/) and dynamic network bandwidth exclusive to IBM Cloud VPC. For anyone keeping track, that’s 3x as many provisioning options than our current 2nd Gen Intel Xeon…

IBM and AWS: Driving the next-gen SAP transformation  

5 min read - SAP is the epicenter of business operations for companies around the world. In fact, 77% of the world’s transactional revenue touches an SAP system, and 92% of the Forbes Global 2000 companies use SAP, according to Frost & Sullivan.   Global challenges related to profitability, supply chains and sustainability are creating economic uncertainty for many companies. Modernizing SAP systems and embracing cloud environments like AWS can provide these companies with a real-time view of their business operations, fueling growth and increasing…

Experience unmatched data resilience with IBM Storage Defender and IBM Storage FlashSystem

3 min read - IBM Storage Defender is a purpose-built end-to-end data resilience solution designed to help businesses rapidly restart essential operations in the event of a cyberattack or other unforeseen events. It simplifies and orchestrates business recovery processes by providing a comprehensive view of data resilience and recoverability across primary and  auxiliary storage in a single interface. IBM Storage Defender deploys AI-powered sensors to quickly detect threats and anomalies. Signals from all available sensors are aggregated by IBM Storage Defender, whether they come…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters