What is data loss prevention (DLP)?

12 August 2024

Authors

Matt Kosinski

Writer

What is DLP?

Data loss prevention (DLP) is the discipline of shielding sensitive data from theft, loss and misuse by using cybersecurity strategies, processes and technologies.

Data is a competitive differentiator for many businesses. A typical corporate network contains a trove of trade secrets, sales records, customers' personal data and other sensitive information. Hackers target this data, and organizations often struggle to keep their critical data secure.

In the meantime, hundreds, if not thousands, of authorized users access enterprise data across cloud storage and on-premises repositories every day. Preventing data loss while facilitating authorized access is a priority for most organizations.

Data loss prevention (DLP) helps organizations stop data leaks and losses by tracking data throughout the network and enforcing security policies on that data. Security teams try to ensure that only the right people can access the right data for the right reasons.

A DLP solution inspects data packets as they move across a network, detecting the use of confidential information such as credit card numbers, healthcare data, customer records and intellectual property. This way, organizations can apply the right access controls and usage policies to each type of data.

Man looking at computer

Strengthen your security intelligence 


Stay ahead of threats with news and insights on security, AI and more, weekly in the Think Newsletter. 


Why DLP is important

Data is at risk regardless of where it is stored, making information protection a significant priority for an organization. The cost of failure can be high. The latest Cost of a Data Breach Report from IBM® found that the global average cost of a data breach increased 10% over the previous year, reaching USD 4.88 million, the biggest jump since the pandemic.

Personally identifiable information (PII), in particular, is highly valuable to thieves and often targeted. The Cost of a Data Breach Report also found that nearly half of all breaches involved customer PII, which can include tax identification (ID) numbers, emails, phone numbers and home addresses. Intellectual property (IP) records came in a close second with 43% of breaches.

Protecting data is becoming ever more difficult because an organization’s data might be used or stored in multiple formats, in multiple locations, by various stakeholders across organizations. Moreover, different sets of data might need to follow different rules based on sensitivity levels or relevant data privacy regulations.

DLP policies and tools help organizations protect themselves by monitoring every piece of data throughout the network in all three states: in use, in motion and at rest.

  • Data in use: This is when data is accessed, processed, updated or deleted. For example, an organization’s data used for analysis or calculations or a text document edited by an end user.

  • Data in motion: Also known as data in transit, this involves data moving through a network, such as being transmitted by an event streaming server or a messaging app, or moved between networks. Data in motion is the least secure of the three states and requires special attention.

  • Data at rest: This is data in storage, such as sitting in a cloud drive, local hard disk drive or archive. Generally, data at rest is easier to protect, but security measures still need to be in place. Data at rest can be compromised through an act as simple as someone picking up a USB flash drive from an unattended desk.

Ideally, an organization’s data loss prevention solution is able to monitor all data in use, in motion and at rest for the entire variety of software in use. For example, adding DLP protection for archiving, business intelligence (BI) applications, email, teaming and operating systems such as macOS and Microsoft Windows.

Mixture of Experts | 17 January, episode 38

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Types of data loss

Data loss events are often described as data breaches, data leakage or data exfiltration. The terms are often used interchangeably, but they have distinct meanings.

  • Data breach: A data breach is any security incident that results in unauthorized access to confidential or sensitive information. This includes any cyberattack or other security incident in which unauthorized parties gain access to sensitive data or confidential information.

  • Data leakage: This is the accidental exposure of sensitive data or confidential information to the public. Data leakage can result from a technical security vulnerability or procedural security error and can include both electronic and physical transfers.

  • Data exfiltration: Exfiltration refers to stealing data. This is any theft when an attacker moves or copies someone else’s data to a device under the attacker’s control. All data exfiltration requires a data leak or a data breach, but not all data leaks or data breaches lead to data exfiltration.

Causes of data loss

Some losses arise from simple mistakes, while others are caused by cyberattacks such as distributed denial of service (DDos) attacks and phishing. Almost any data loss can cause significant business disruptions.

Some of the most common causes of data loss include:

  • Human error and social engineering
  • Insider threats
  • Malware
  • Physical threats
  • Security vulnerabilities
  • Smartphone or PC theft
  • Weak or stolen credentials

Human error and social engineering

Data thieves use tactics that fool people into sharing data they shouldn’t share. Social engineering can be as artful as a phishing attack that convinces an employee to email confidential data, or as devious as leaving a malware-infected USB flash drive where an employee might find it and plug it into an organization-supplied device.

On the other hand, human error might be as simple as leaving a smartphone at a cash register or deleting files by mistake.

Insider threats

Authorized users—including employees, contractors, stakeholders and providers—might put data at risk through carelessness or malicious intent.

Malicious insiders are often motivated by personal gain or a grievance toward the company. Insider threats can be unintentional and as simple as the carelessness of not updating passwords, or as dangerous as exposing sensitive enterprise data while using publicly available generative AI (gen AI).
 
Malicious insider attacks are common and costly. The latest Cost of a Data Breach Report from IBM found that compared to other vectors, malicious insider attacks resulted in the highest costs, averaging USD 4.99 million.

Malware

This is software created specifically to harm a computer system or its users. The best-known form of data-threatening malware is ransomware, which encrypts data so that it can’t be accessed and demands a ransom payment for the decryption key. Sometimes, attackers will even ask for a second payment to prevent the data from being exfiltrated or shared with other cybercriminals.

Physical threats

Depending on how well backed up an organization’s data is, a hard disk drive malfunction might be catastrophic. The cause might be a head crash or software corruption. Spilling a refreshing beverage in the office—coffee, tea, soda or water—might short-circuit the system board in a PC, and there’s hardly ever a convenient time. An interruption in the power supply can shut down systems at the wrong or worst time, which then might interrupt the saving of work or break transmissions.

Security vulnerabilities

Vulnerabilities are weaknesses or flaws in the structure, code or implementation of an application, device, network or other IT asset that hackers can exploit. These include coding errors, misconfigurations, zero-day vulnerabilities (unknown or as yet unpatched weaknesses) or out-of-date software, such as an old version of MS Windows.

Smartphone or PC theft

Any digital device left unattended—on a desk, car or bus seat—can be a tempting target and grant the thief access to a network and permission to access data. Even if the thief only wants to sell the equipment for cash, the organization still suffers the disruption of shutting off access to that device and replacing it.

Weak or stolen credentials

This includes passwords that hackers can easily guess, or passwords or other credentials—for example, ID cards—that hackers or cybercriminals might steal.

Data loss prevention strategies and policies

DLP policies can cover multiple topics, including data classification, access controls, encryption standards, data retention and disposal practices, incident response protocols and technical controls such as firewalls, intrusion detection systems and antivirus software.

A major benefit of data protection policies is that they set clear standards. Employees know their responsibilities for safeguarding sensitive information and often have training on data security practices, such as identifying phishing attempts, handling sensitive information securely and promptly reporting security incidents.

Also, data protection policies can enhance operational efficiency by offering clear processes for data-related activities such as access requests, user provisioning, incident reporting and security audits.

Rather than drafting a single policy for all data, information security teams typically create different policies for the different types of data in their networks. This is because different types of data often need to be handled differently for different use cases to meet compliance needs and avoid interfering with the approved behavior of authorized end users.

For example, personally identifiable information (PII)—such as credit card numbers, social security numbers and home and email addresses—is subject to data security regulations that dictate proper handling.

However, the company might do what it wishes with its own intellectual property (IP). Furthermore, the people who need access to PII might not be the same people who need access to company IP.

Both kinds of data need to be protected, but in different ways; hence, distinct DLP policies tailored to each type of data are needed.

Types of DLP solutions

Organizations use DLP solutions to monitor network activities, identify and tag data and enforce DLP policies to prevent misuse or theft.

There are three main types of DLP solutions:

  • Network DLP
  • Endpoint DLP
  • Cloud DLP

Network DLP

Network DLP solutions focus on how data moves through, into and out of a network. They often use artificial intelligence (AI) and machine learning (ML) to detect anomalous traffic flows that might signal a data leak or loss. While network DLP tools are designed to monitor data in motion, many also offer visibility into data in use and at rest on the network.

Endpoint DLP

Endpoint DLP tools monitor activity on laptops, servers, mobile devices and other devices accessing the network. These solutions are installed directly on the devices that they monitor, and they can stop users from committing prohibited actions on those devices. Some endpoint DLP tools also block unapproved data transfers between devices.

Cloud DLP

Cloud security solutions focus on data stored in and accessed by cloud services. They can scan, classify, monitor and encrypt data in cloud repositories. These tools can also help enforce access control policies on individual end users and any cloud services that might access company data.

Organizations might choose to use one type of solution or a combination of multiple solutions, depending on their needs and how their data is stored. The goal for all remains clear: to defend all sensitive data.

How DLP works

Security teams typically follow a 4-step process throughout the data lifecycle to put DLP policies into practice with the help of DLP tools:

  • Data identification and classification
  • Data monitoring
  • Applying data protections
  • Documenting and reporting DLP efforts

Data identification and classification

First, the organization catalogs all its structured and unstructured data.

  • Structured data is data with a standardized form, such as a credit card number. It is usually clearly labeled and stored in a database.

  • Unstructured data is free-form information, such as text documents or images, which may not be neatly organized in a central database. 

Security teams typically use DLP tools to scan the entire network to discover data wherever it is stored—in the cloud, on physical endpoint devices, on employees' personal devices and elsewhere.

Next, the organization classifies this data, sorting it into groups based on sensitivity level and shared characteristics. Classifying data enables the organization to apply the right DLP policies to the right kinds of data.

For example, some organizations might group data based on type, such as financial data, marketing data or intellectual property. Other organizations might group data based on relevant regulations, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA).

Many DLP solutions can automate data classification. These tools use AI, machine learning and pattern matching to analyze structured and unstructured data to determine what type of data it is, whether it is sensitive and which policies should apply.

Data monitoring

After data is classified, the security team monitors how it is handled. DLP tools can use several techniques to identify and track sensitive data being used. These techniques include: 

  • Content analysis, such as using AI and machine learning to parse an email message for confidential information.

  • Data matching, such as comparing file contents to known sensitive data.

  • Detecting labels, tags and other metadata that explicitly identify a file as sensitive. Sometimes called “data fingerprinting.” 

  • File matching, where a DLP tool compares the hashes—the file identities—of protected files.

  • Keyword matching, where DLP looks for keywords often found in sensitive data.

  • Pattern matching, such as looking for data that follows a certain format. For example, an American Express card will always have a 15-digit number and begin with “3.” But not all such numbers are for AmEx, so a DLP solution can also look for the corporate name, abbreviation or an expiration date nearby.

When a DLP tool finds sensitive data, it looks for policy violations, abnormal user behavior, system vulnerabilities and other signs of potential data loss, including: 

  • Data leakages, such as a user trying to share a confidential file with someone outside the organization.

  • Unauthorized users attempting to access critical data or perform unapproved actions, such as editing, erasing or copying a sensitive file.

  • Malware signatures, traffic from unknown devices or other indicators of malicious activity.

 

Applying data protections

When DLP solutions detect policy violations, they can respond with real-time remediation efforts. Examples include: 

  • Encrypting data as it moves through the network.

  • Terminating unauthorized access to data.

  • Blocking unauthorized transfers and malicious traffic.

  • Warning users that they are violating policies.

  • Flagging suspicious behavior for the security team to review.

  • Triggering more authentication challenges before users can interact with critical data.

  • Enforcing least-privilege access to resources, such as in a zero-trust environment.

Some DLP tools also help with data recovery, automatically backing up information so it can be restored after a loss.

Organizations can take more proactive measures to enforce DLP policies as well. Effective identity and access management (IAM), including role-based access control policies, can restrict data access to the right people. Training employees on data security requirements and best practices can help prevent accidental data losses and leaks before they happen. 

Documenting and reporting DLP efforts

DLP tools typically feature dashboards and reporting functions that security teams use to monitor sensitive data throughout the network. This documentation enables the security team to track DLP program performance over time so that policies and strategies can be adjusted as needed. 

DLP tools can also help organizations comply with relevant regulations by keeping records of their data security efforts. If there is a cyberattack or audit, the organization can use these records to prove that it followed the appropriate data handling procedures.

DLP and regulatory compliance

DLP strategies are often aligned with compliance efforts. Many organizations craft their DLP policies specifically to comply with regulations such as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data Security Standard (PCI DSS). 

Different regulations impose different standards for different kinds of data. For example, HIPAA sets rules for personal health information, while PCI DSS dictates how organizations handle payment card data. A company that collects both kinds of data would likely need a separate DLP policy for each kind to meet compliance requirements.

Many DLP solutions include prewritten DLP policies aligned to the various data security and data privacy standards companies need to meet.

Trends in data loss prevention

From the rise of generative AI to emerging regulations, several factors are changing the data landscape. In turn, DLP policies and tools will need to evolve to meet these changes. Some of the most significant trends in DLP include:

  • Hybrid and multicloud environments
  • Generative AI
  • Increased regulation
  • Mobile workforce and remote work
  • Shadow IT and shadow data

Hybrid and multicloud environments

Many organizations now store data on premises and in multiple clouds, possibly even in multiple countries. These measures might add flexibility and cost savings, but they also increase the complexity of protecting that data.

For example, the Cost of a Data Breach Report found that 40% of breaches occur at organizations that store their data across multiple environments.

Generative AI

Large language models (LLMs) are, by definition, large, and they consume massive amounts of data that organizations must store, track and protect against threats such as prompt injections. Gartner has forecast that “By 2027, 17% of the total cyberattacks/data leaks will involve generative AI.”1

Increased regulation

With major data breaches and social media abuses come increased calls for government and industry regulation, which can add to the complexity of systems and compliance verifications. Recent developments such as the EU AI Act and the CCPA draft rules on AI are imposing some of the strictest data privacy and protection rules to date.

Mobile workforce and remote work

Managing data within a building or network is simpler than providing system access to a mobile workforce or remote workers, where the communication and access issues multiply the efforts required of the IT staff.

In addition, remote workers sometimes have multiple employers or contracts, so that “crossed wires” can create more data leaks. Gartner predicts that “by the end of 2026, democratization of technology, digitization and automation of work will increase the total available market of fully remote and hybrid workers to 64% of all employees, up from 52% in 2021.”1

Shadow IT and shadow data

With employees increasingly using personal hardware and software at work, this unmanaged shadow IT creates a major risk for organizations.

Employees might be sharing work files on a personal cloud storage account, meeting on an unauthorized video conferencing platform or creating an unofficial group chat without IT approval. Personal versions of Dropbox, Google Drive and Microsoft OneDrive might create security headaches for the IT team.

Organizations are also dealing with an increase in shadow data—that is, data in the enterprise network that the IT department does not know about or manage. The proliferation of shadow data is a major contributor to data breaches. According to the Cost of a Data Breach Report, 35% of breaches involve shadow data.

Related solutions
Data security and protection solutions

Protect data across multiple environments, meet privacy regulations and simplify operational complexity.

    Explore data security solutions
    IBM Guardium

    Discover IBM Guardium, a family of data security software that protects sensitive on-premises and cloud data.

     

      Explore IBM Guardium
      Data security services

      IBM provides comprehensive data security services to protect enterprise data, applications and AI.

      Explore data security services
      Take the next step

      Protect your organization’s data across hybrid clouds and simplify compliance requirements with data security solutions.

      Explore data security solutions Book a live demo