AI Innovation in Europe Requires a Balanced and Risk-Based Interpretation of the GDPR

On 5 November, 91pro视频 participated in the European Data Protection Board’s (EDPB) stakeholder workshop on AI Models. The workshop was meant to inform the EDPB’s opinion on the use of personal data for AI models, which is expected before the end of the year. Ahead of the workshop, 91pro视频 joined 14 other industry associations in a joint statement, calling on the EDPB to take a balanced, risk-based, and pro-innovation approach to data processing for AI development and deployment.

Companies must be able to access and use personal data for developing and deploying innovative and transformative AI technologies in Europe. Such access is crucial for the EU’s digital transformation and competitiveness goals. The opinion must recognize that this can be done in a way that is proportionate, risk-based, and aligned with the GDPR.

To do so, 91pro视频 shared the following recommendations:

  1. Legitimate interest is an appropriate legal basis for AI model training

The EDPB should recognize that legitimate interest is an appropriate and robust legal basis for processing personal data for the training of AI models, as also previously recognized by other European regulators like the French Data Protection Authority.

AI models have the potential to significantly benefit society, businesses, and individuals. AI can increase productivity, boost economic growth, support scientific discovery and healthcare, and enable transformative use cases. These considerations can help tip the legitimate interest balance in favor of the controller.

Of course, this will be possible if the circumstances of the processing are assessed, and the appropriate safeguards are deployed when necessary. Importantly, the opinion should note that the commercial goal of developing an AI model should not prevent legitimate interest. As considered by the Court of Justice of the EU, a wide range of interests may be considered legitimate, including those of commercial nature.

Finally, the opinion should not consider consent as a preferred legal basis. Given the scale of data needed for most AI models, obtaining consent would be impractical, disproportionate, and lead to reduced representativity of datasets.

  1. AI models are not searchable databases of personal data

AI models are a mathematical representation of statistical patterns. They are not designed to memorize information about specific individuals, and once trained, they are not a searchable database of personal data.

The EDPB opinion should reflect this technical reality. When an AI model is trained, the training data is broken down into numerical ‘tokens.For example, in the case of an LLM, tokens would represent words, their meaning and relations (e.g., the word dog is associated with ‘animal’ but not with ‘plant’). The model performs mathematical operations to correlate these tokens, and it learns correlations in the form of numerical weights. When prompted, the model will then provide an output that predicts the most likely answer to a user’s prompt, based on the patterns and correlations it has learned. For example, in the case of an LLM, tokens would represent words, their meaning and relations (e.g., the word dog is associated with ‘animal’ but not with ‘plant’).

As such, the model itself does not process personal data as per the meaning of article 4 of the GDPR, as also recognized by the on Large Language Models (LLMs) from the Hamburg Commissioner for Data protection. If personal data is processed while the model is being trained or deployed, this shall be done in compliance with GDPR.

  1. Access to a wide variety of data is needed for robust AI

The opinion should recognize that processing a wide range of data – including data about sensitive topics – is fundamental to achieve robust and representative AI models that reflect the cultural, geographic and linguistic diversity of Europe.

Crucially, the ability to legally process data about sensitive topics that is information related to sensitive attributes (such as gender or religion), but outside the scope of article 9 of the GDPR - will be important for companies to identify and mitigate potential biases – which is required by the EU AI Act.

Rather than focusing on limiting the use of data, which will have negative consequences on businesses and individuals, the opinion should focus on the identification and implementation of safeguards to ensure the safe and lawful processing of personal data.

  1. Recognize diversity of AI models and complexity of value chains

The exact safeguards applied to a specific AI project will depend on the nature of the model and the nature of the data being processed. Actors across the AI value chain – such as deployers and developers - will also have different roles and risk management responsibilities. This is why the opinion should not be prescriptive and should maintain a flexible approach.

Companies can implement different safeguards across different stages of the value chain to ensure the lawful processing of data. For example, data governance measures, de-duplication, minimization or pseudonymization can be helpful to address specific data protection risks at the training stage. On the other hand, measures like prompt and output filters can help address generative AI risks at the deployment stage.

Industry is continuously working to improve appropriate safeguards, and we hope to continue our collaboration with regulators to further explain the state of the art. A rigid and prescriptive approach may undermine the efforts to develop and deploy new safeguards.

  1. Data subject rights can be subject to limitations

Data subject rights are of fundamental importance for the protection of data subjects. However, the opinion should acknowledge that data subject rights can be subject to limitations, both from a technical and legal perspective.

For example, it would be technically impossible to implement a right of rectification or erasure on the data after the model has been trained, without re-training the model. This would be disproportionate, costly, and resource intensive. Similarly, it would be technically impossible to locate and remove data about a specific individual after the data has been transformed and becomes part of a training dataset.

Alternative measures for the protection of data subjects should be considered in these cases. Output-level filters, for example, can be an effective and proportionate solution to remove content that data subjects.

Tags: Artificial Intelligence

Related