Hidden risk of shadow data and shadow AI leads to higher breach costs

Security leaders are used to thinking about defense-in-depth and ensuring their security stack and overall architecture provide resilience and protection. While this paradigm holds true today, it may be time to think about shifting to data-first security. This means data management that corresponds with today’s use cases, and where data is the central asset that requires protection through its entire lifecycle, use, and disposal. A paradigm shift in data security is well supported by evidence in the 2024 edition of the Cost of a Data Breach Report.

The report presents research studying the causes, cost impacts and recovery from actual breaches at 604 organizations across the globe and in 17 industries. The findings show some interesting trends that can help solve the data puzzle, including impacts to security, privacy, governance and regulation. All these aspects already see elevated risks rise from the rush to provision new generative AI (gen AI) initiatives and take them to market rapidly, leaving security considerations behind. Alarmingly, a recent executive survey about gen AI security revealed that only 24% of new initiatives include a security component.

A data journey in the dark

Data has become the main asset that companies rely on nowadays. But while data is king, it is still not being managed or protected sufficiently to match its significance and the potential impact of data loss. Let’s look at some ways in which data, the data journey, and the protection paradigms surrounding its lifecycles were main contributing factors in the cost of data breaches.

Multi-cloud hopping

Firstly, data nowadays is at the scale that requires organizations to go beyond their old on-premise and private cloud infrastructures. The drivers here are scalability of the data volume but also traffic and workload demands that only grow over time. With data traveling through multi-cloud environments, the Cost of a Data Breach Report significantly notes that 40% of breaches involved data stored across multiple types of environments. When breached, public cloud environments incurred the highest average breach cost at USD 5.17 million.

Why is this happening? The decentralized nature of multi-cloud is a complex factor in visualizing and controlling data—and in cases of a breach, it simply takes longer to gather information, investigate and activate the cloud provider’s support to contain the breach. Clouds also host more data, and scale means that more data is breached at one time, potentially adding to the impact on customers and recovery costs.

Shadow data

Data is spread out in more places than ever, and 35% of breaches this year involved data stored in unmanaged data sources—aka “shadow data.” This translated into data not being classified properly or at all, not being properly protected, and not being managed in terms of its lifecycle as it moves into and within the organization. Considering that 25% of breaches involving shadow data were solely on premises, this situation likely highlights unmanaged risk in the form of data governance gaps, data privacy issues, and impending regulatory impact.

Breaches involving shadow data also took 26.2% longer to identify and 20.2% longer to contain, averaging 291 days. This inevitably resulted in higher breach costs averaging USD 5.27 million where shadow data was involved, but those are only the tip of the iceberg here if one considers the spillover effect of breaches to others in the ecosystem, potential contractual issues, and lawsuits are part of a longer tail of costs that continue to add up 2-3 years after the breach.

Unclassified, unprotected

When data is not inventoried and catalogued effectively, it is not classified properly and therefore also not protected adequately. That could easily be data that should have been tagged restricted or confidential, which leads to the next statistic from the report. Attackers were able to access a lot more sensitive data during breaches, leading to a 26.5% rise in IP theft. Lost IP cost considerably more per record than last year, rising to USD 173 in 2024 from USD 156 per record in the 2023 report; an 11% uptick.

But let’s put that hard cost aside for a moment. The impact of IP theft can literally mean that the organization will lose its competitive advantage. It can lose considerable market share and revenue that it expected to generate from strategic IP. What shareholder would not be alarmed by this statistic, considering that most organizations are actively embarking on developing innovative gen AI applications they expect to exclusively monetize.

A costly side effect of deficient data protection is lost business and reputation damage, for an average of USD 1.47 million and the majority of the increase in the average cost of a breach in 2024.

Shadow data, shadow models, shadow AI

With gen AI as the new gold rush nowadays, various stakeholders in the organization can easily expose it to unmanaged risk linked with unsanctioned data, models, and overall use of AI. These uses can be invisible to the IT and security teams, which can result in impactful incidents down the line.

Another risk factor is datasets destined for use in AI implementation, sourced from multiple third-party providers. Unmanaged by the security team, these external sources can add risks like poisoning and vulnerabilities. But the more insidious risks are shadow models, and loads of unencrypted training data streaming into and out of cloud environments.

Think of this scenario for example: a healthcare organization is using gen AI to identify anomalies in chest x-rays. They send the images into a cloud model to receive results, but the images are traveling and used in unencrypted form. An attacker accesses the images, and then extorts the healthcare provider to pay a ransom. The same can happen with plaintext, or any other unprotected data that should be better guarded. Don’t be surprised to promptly see a lawsuit filed by impacted data subjects.

Webinar: Top insights from the Cost of a Data Breach

Recommendations—Pay data its (security) dues

Most organizations today will lose almost all productivity if they lose access to data. From the simplest form of employee productivity to the complexity of data-driven enterprises, companies do not consider data a by-product of their business. Data is the main asset to which organizations align their culture, organization and technology, for sustained innovation and sustainable business growth. It only stands to reason that data be managed and protected to the proper extent of its classification, and using the right technologies to achieve that.

Encrypt

Identify, classify, encrypt. The better data is protected, the smaller the leverage attackers will have in case of a data breach. This will also mean lesser impact to data subjects and the chances of regulatory fines dropping as well. So, encrypt, and do it smartly. Not all data is made equal. If your organization uses images or other types of data, learn about better ways to encrypt it so that you can use it securely and enjoy its benefits.

The more innovative your organization is, the more it uses data, the more important encryption becomes. Consider confidential computing for your use cases, as well as post-quantum encryption to ensure protected data remains protected in the future.

Go DSPM

Since data is evidently spread out across environments and remains exposed in many cases, one way to regain control is via data security posture management (DSPM). DSPM is a cybersecurity technology that identifies sensitive data across multiple cloud environments and services, assessing its vulnerability to security threats and risk of regulatory non-compliance. Instead of securing the devices, systems and applications that house, move or process data, security teams can use DSPM to focus on protecting data directly.

Rethink data protection in the gen AI era

With the scale and use scenarios of data in gen AI solutions, organizations must rethink their data lifecycle and how to protect it at scale, in all its states. Think about securing training data by protecting it from theft and manipulation. Organizations can use data discovery and classification to detect sensitive data used in training or fine-tuning. They can also implement data security controls across encryption, access management and compliance monitoring. Extend posture management to AI models to protect sensitive AI training data, gain visibility into the use of unsanctioned or shadow AI models, malicious drifts, AI misuse or data leakage.

Evolve with regulatory demands

The use of data already involves extensive requirements from data privacy regulators. These demands are becoming more elaborate and nuanced when it comes to data used in AI-enabled solutions and scenarios. This means that traditional data protection capabilities may not suffice and require enhanced classification, protection and monitoring mechanisms, as well as improved controls for auditability and oversight.

Better insights, better security

In its 19^th edition this year, the Cost of a Data Breach Report provides IT, risk management and security leaders with timely, quantifiable evidence to guide them in their strategic decision-making. It also helps teams better manage their risk profiles and security investments. This year, the statistics provide insights from the experiences of 604 organizations and 3,556 cybersecurity and business leaders who faced a data breach. Download a copy of the report to empower yourself with real-world examples and expert recommendations on how to mitigate the risks.

Explore the report

Was this article helpful?

YesNo

A data journey in the dark

Multi-cloud hopping

Shadow data

Unclassified, unprotected

Shadow data, shadow models, shadow AI

Recommendations—Pay data its (security) dues

Encrypt

Go DSPM

Rethink data protection in the gen AI era

Evolve with regulatory demands

Better insights, better security

Tags

More from Security

Reduce downtime and increase agility: Mainframe observability with OpenTelemetry

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

The power of the mainframe and cloud-native applications

IBM Newsletters