Exploring privacy issues in the age of AI

Authors

Alice Gomstyn

IBM Content Contributor

Alexandra Jonker

Editorial Content Lead

It’s one of the difficult truths of innovation: As technology advances, so do the risks of using it.

For example, tools that enhance data collection and analysis also increase the likelihood that personal data and sensitive information will appear where it doesn’t belong.

This particular risk, privacy risk, is especially prevalent in the age of artificial intelligence (AI), as sensitive information is collected and used to create and fine-tune AI and machine learning systems. And as policymakers rush to address the issue with privacy regulations around the use of AI, they create new compliance challenges for businesses using AI technologies for decision-making.

Privacy and compliance concerns notwithstanding, companies continue to deploy AI models to boost productivity and unlock value. Let’s take a closer look at the AI privacy risks and safeguards affecting society and commerce today.

What is AI privacy?

AI privacy is the practice of protecting personal or sensitive information collected, used, shared or stored by AI.

AI privacy is closely linked to data privacy. Data privacy, also known as information privacy, is the principle that a person should have control over their personal data. This control includes the ability to decide how organizations collect, store and use their data. But the concept of data privacy predates AI and how people think of data privacy has evolved with the advent of AI.

“Ten years ago, most people thought about data privacy in terms of online shopping. They thought, ‘I don't know if I care if these companies know what I buy and what I'm looking for, because sometimes it's helpful,’” Jennifer King, a fellow at the Stanford University Institute for Human-Centered Artificial Intelligence, explained in an interview posted to the institute’s website.¹

“But now we've seen companies shift to this ubiquitous data collection that trains AI systems,” King said, “which can have major impact across society, especially our civil rights.”

Understanding the privacy risks of AI

We can often trace AI privacy concerns to issues regarding data collection, cybersecurity, model design and governance. Such AI privacy risks include:

Collection of sensitive data
Collection of data without consent
Use of data without permission
Unchecked surveillance and bias
Data exfiltration
Data leakage

Collection of sensitive data

One reason AI arguably poses a greater data privacy risk than earlier technological advancements is the sheer volume of information in play. Terabytes or petabytes of text, images or video are routinely included as training data, and inevitably some of that data is sensitive: healthcare information, personal data from social media sites, personal finance data, biometric data used for facial recognition and more. With more sensitive data being collected, stored and transmitted than ever before, the odds are greater that at least some of it will be exposed or deployed in ways that infringe on privacy rights.

Collection of data without consent

Controversy may ensue when data is procured for AI development without the express consent or knowledge of the people from whom it’s being collected. In the case of websites and platforms, users increasingly expect more autonomy over their own data and more transparency regarding data collection. Such expectations came to the fore recently, as the professional networking site LinkedIn faced backlash after some users noticed they were automatically opted into allowing their data to train generative AI models.²

Use of data without permission

Even when data is collected with individuals’ consent, privacy risks loom if the data is used for purposes beyond those initially disclosed. “We’re seeing data such as a resume or photograph that we’ve shared or posted for one purpose being repurposed for training AI systems, often without our knowledge or consent,” King said. In California, for instance, a former surgical patient reportedly discovered that photos related to her medical treatment had been used in an AI training dataset. The patient claimed that she had signed a consent form for her doctor to take the photos, but not for them to be included in a dataset.³

Unchecked surveillance and bias

Privacy concerns related to widespread and unchecked surveillance—whether through security cameras on public streets or tracking cookies on personal computers—surfaced well before the proliferation of AI. But AI can exacerbate these privacy concerns because AI models are used to analyze surveillance data. Sometimes, the outcomes of such analysis can be damaging, especially when they demonstrate bias. In the field of law enforcement, for example, a number of wrongful arrests of people of color have been linked to AI-powered decision-making.⁴

Data exfiltration

AI models contain a trove of sensitive data that can prove irresistible to attackers. “This [data] ends up with a big bullseye that somebody’s going to try to hit,” Jeff Crume, an IBM Security Distinguish Engineer, explained in a recent IBM Technology video (the link resides outside ibm.com). Bad actors can conduct such data exfiltration (data theft) from AI applications through various strategies. For instance, in prompt injection attacks, hackers disguise malicious inputs as legitimate prompts, manipulating generative AI systems into exposing sensitive data. Such as, a hacker using the right prompt might trick an LLM-powered virtual assistant into forwarding private documents.

Data leakage

Data leakage is the accidental exposure of sensitive data, and some AI models have proven vulnerable to such data breaches. In one headline-making instance, ChatGPT, the large language model (LLM) from OpenAI, showed some users the titles of other users’ conversation histories.⁵ Risks exist for small, proprietary AI models as well. For example, consider a healthcare company that builds an in-house, AI-powered diagnostic app based on its customers’ data. That app might unintentionally leak customers’ private information to other customers who happen to use a particular prompt. Even such unintentional data sharing can result in serious privacy breaches.

Tracking laws on privacy protection

Efforts by policymakers to prevent technological advancements from compromising individual privacy date back to at least the 1970s. However, rapid growth in commercialized data collection and the deployment of AI created a new urgency to enact data privacy laws. Such laws include:

The European Union’s General Data Protection Regulation (GDPR)

GDPR sets several principles that controllers and processors must follow when handling personal data. Under the principle of purpose limitation, companies must have a specific, lawful purpose in mind for any data they collect. They must convey that purpose to users and only collect the minimum amount of data required for that purpose.

Companies must also use data fairly. They must keep users informed about the processing of personal data and follow data protection rules. Under the principle of storage limitation, a company should only keep personal data until its purpose is fulfilled. Data should be deleted once it is no longer needed.

The EU Artificial Intelligence (AI) Act

Considered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.

Though the EU AI act doesn't specifically have separate, prohibited practices on AI privacy, the act does enforce limitations on the usage of data. Prohibited AI practices include:

Untargeted scraping of facial images from the internet or CCTV for facial recognition databases; and
Law enforcement use of real-time remote biometric identification systems in public (unless an exception applies, and pre-authorization by a judicial or independent administrative authority is required)

High-risk AI systems must comply with specific requirements, such as adopting rigorous data governance practices to ensure that training, validation and testing data meet specific quality criteria.

US Privacy Regulations

Laws on data privacy took effect in multiple American jurisdictions in recent years. Examples include the California Consumer Privacy Act and the Texas Data Privacy and Security Act. In March 2024, Utah enacted the Artificial Intelligence and Policy Act, which is considered the first major state statute to specifically govern AI use.

At the federal level, the US government has yet to implement new nationwide AI and data privacy laws. However, in 2022 the White House Office of Science and Technology Policy (OSTP) released its “Blueprint for an AI Bill of Rights.” The nonbinding framework delineates five principles to guide the development of AI, including a section dedicated to data privacy encouraging AI professionals to seek individuals’ consent on data use.

China’s Interim Measures for the Administration of Generative AI Services

China is among the first countries to enact AI regulations. In 2023, China issued its Interim Measures for the Administration of Generative Artificial Intelligence Services. Under the law, the provision and use of generative AI services must “respect the legitimate rights and interests of others” and are required to “not endanger the physical and mental health of others, and do not infringe upon others' portrait rights, reputation rights, honor rights, privacy rights, and personal information rights.”⁶

The latest AI News + Insights  

Expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Subscribe today

AI privacy best practices

Organizations can devise AI privacy approaches to help comply with regulations and build trust with their stakeholders.⁷ Recommendations from the OSTP include:

Conducting risk assessments
Limiting data collection
Seeking and confirming consent
Following security best practices
Providing more protection for data from sensitive domains
Reporting on data collection and storage

Conducting risk assessments

Privacy risks should be assessed and addressed throughout the development lifecycle of an AI system. These risks may include possible harm to those who aren’t users of the system but whose personal information might be inferred through advanced data analysis.

Limiting data collection

Organizations should limit the collection of training data to what can be collected lawfully and used “consistent with the expectations of the people whose data is collected.” In addition to such data minimization, companies should also establish timelines for data retention, with the goal of deleting data as soon as possible.

Seeking explicit consent

Organizations should provide the public with mechanisms for “consent, access, and control” over their data. Consent should be reacquired if the use case that prompted the data collection changes.

Following security best practices

Organizations that use AI should follow security best practices to avoid the leakage of data and metadata. Such practices might include using cryptography, anonymization and access-control mechanisms.

Providing more protection for data from sensitive domains

Data from certain domains should be subject to extra protection and used only in “narrowly defined contexts.” These “sensitive domains” include health, employment, education, criminal justice and personal finance. Data generated by or about children is also considered sensitive, even if it doesn’t fall under one of the listed domains.

Reporting on data collection and storage

Organizations should respond to individuals’ requests to learn which of their data is being used in an AI system. Organizations should also proactively provide general summary reports to the public about how people’s data is used, accessed and stored. Regarding data from sensitive domains, organizations should also report security lapses or breaches that caused data leaks.

Data governance tools and programs can help businesses follow OSTP recommendations and other AI privacy best practices. Companies can deploy software tools to:

Conduct privacy risk assessments on the models they use
Create dashboards with information on data assets and the status of privacy assessments
Enable privacy issue management, including collaboration between privacy owners and data owners
Enhance data privacy through approaches such as anonymizing training data, encrypting data and minimizing the data used by machine learning algorithms (Learn more here.)

As AI and data privacy laws evolve, emerging technology solutions can enable businesses to keep up with the regulatory changes and be prepared if regulators request audits. Cutting-edge solutions automate the identification of regulatory changes and conversion into enforceable policies.

Footnotes

(All links reside outside ibm.com.)

¹“Privacy in an AI Era: How Do We Protect Our Personal Information?” Stanford University Institute of Human-Centered Artificial Intelligence. 18 March 2024.

²“LinkedIn Is Quietly Training AI on Your Data—Here's How to Stop It.” PC Mag. 18 September 2024.

³ “Artist finds private medical record photos in popular AI training data set.” Ars Technica. 21 September 2022.

⁴ “When Artificial Intelligence Gets It Wrong.” Innocence Project. 19 September 2023.

⁵“OpenAI CEO admits a bug allowed some ChatGPT users to see others’ conversation titles.” CNBC. 17 April 2023.

⁶ Interim Measures for the Administration of Generative Artificial Intelligence Services, Cyberspace Administration of China. 13 July 2023.

⁷ “Blueprint for an AI Privacy Bill of Rights.” The White House Office of Science and Technology Policy. Accessed 19 September 2024.

AI governance for the enterprise

Learn how to get started with AI governance, the building blocks and best practices to help your teams accelerate responsible AI.