Take a second to think about all the ways data has changed in the last 20 years. In the hardware space, our mobile phones started out as large handhelds with pull-out antennas and limited processing power. Now they are advanced pieces of technology with a computational power 32,600 times faster than the computers we used to reach the moon. The transformation in our phones is analogous to the evolution of the modern data architecture for enterprises. As front-end consumer applications have evolved, the number of resources needed to collect, store, and analyze the information flowing from consumers has grown. The average company has 110 SaaS applications, providing connections to an average of 400 data sources. To scale with this expansion, companies like IBM have proposed a new architectural approach known as “data fabric” that provides a unified platform to connect this growing number of applications. A data fabric can be thought of as what the name implies — a “fabric that connects data from multiple locations, types, and sources to facilitate data growth and management. IBM delivers this flexible architecture through Cloud Pak® for Data, an enterprise insights platform that provides the flexibility for companies to scale across any infrastructure using the world’s leading open-source orchestrator, Red Hat.

I will outline the data fabric architectural approach through the lens of a basic stock trading platform. Consumer-oriented trading platforms have gained traction over the past couple years, enabling users to drive their own financial destiny. But to bring the individual investor this power, there must be a strong data architecture in place to connect the live price feeds and analytics to advanced backend systems. Data virtualization facilitates this movement by working behind the scenes to unify multiple disparate systems.

Data virtualization

Data virtualization integrates data sources across multiple locations (on-prem, cloud or hybrid) and returns a logical view without the need for data movement or replication. The real value of data virtualization is that it creates a centralized data platform without large data movement cost. In terms of our stock trading platform, we have customer data, financial trading data and account data in separate storage locations.

Figure 1

As evidenced in Figure 1, financial data is located in a PostgreSQL cloud environment, while personal customer data is on premise in the respective MongoDB and Informix environments. Using our advance virtualization engine, you can query each of these sources together and save half the cost of traditional information extraction methods.

Data cataloging

Once this data is ingested, it needs a mechanism to curate, categorize and facilitate its sharing throughout an organization. For example, our stock trading platform may have multiple teams of data scientists focused on core customer initiatives such as UI optimization algorithms or understanding order flow. A data catalog, such as the IBM Watson® Knowledge Catalog, can facilitate the relationship between these roles and reduce the prep necessary to complete these tasks. Data catalogs bridge the gap between raw and useable data, allowing for the application of business context, data policies and data protection rules to your virtual data. For example, if the lead data steward at my trading platform wishes to censor credit card numbers as they flow to different data projects, I can apply a data protection rule on credit card numbers as shown in Figure 2:

Figure 2

Now, you have credit card numbers censored throughout your environment, improving trust in your company while also enhancing your ability to meet different government regulations.

With this rule applied, data scientists who view customer information see redacted credit card numbers as shown in Figure 3:

Figure 3

Now if this table is needed in a Python project, data scientists can export that same core data for analysis without seeing any confidential information, as shown in Figure 4:

Figure 4

This is how a data fabric architecture enables our trading platform to virtualize sources and access data across multiple environments, then organize this data and safely collaborate with key data personnel. If you’re curious as to how this demo was made and would like to see how our final trading platform effectively analyzes data, sign up for my 15 Minute Friday Session on July 8th in the form below.

Was this article helpful?
YesNo

More from Cloud

From complexity to clarity: Future pathways for VMware clients

5 min read - Today, VMware clients might be facing transformational decisions amidst an evolving landscape following Broadcom's acquisition of VMware and in search of the best pathways to serve their business needs. However, this process can be complex and challenging, with the potential impacts of choosing the right offerings, adapting to licensing modifications and navigating the partnership impacts. IBM Consulting® can support VMware clients in their transformational journey based on its vast experience of supporting clients through their hybrid cloud estate. IBM Consulting…

Accelerating responsible AI adoption with a new Amazon Web Services (AWS) Generative AI Competency

3 min read - We’re at a watershed moment with generative AI. According to findings from the IBM Institute for Business Value, investment in generative AI is expected to grow nearly four times over the next two to three years. For enterprises that make the right investments in the technology it could deliver a strategic advantage that pays massive dividends. At IBM® we are committed to helping clients navigate this new reality and realize meaningful value from generative AI over the long term. For our…

New 4th Gen Intel Xeon profiles and dynamic network bandwidth shake up the IBM Cloud Bare Metal Servers for VPC portfolio

3 min read - We’re pleased to announce that 4th Gen Intel® Xeon® processors on IBM Cloud Bare Metal Servers for VPC are available on IBM Cloud. Our customers can now provision Intel’s newest microarchitecture inside their own virtual private cloud and gain access to a host of performance enhancements, including more core-to-memory ratios (21 new server profiles/) and dynamic network bandwidth exclusive to IBM Cloud VPC. For anyone keeping track, that’s 3x as many provisioning options than our current 2nd Gen Intel Xeon…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters