Generative AI vs. copyright law - Which changes will the AI Act bring?

bengtpetersen
Jul 1, 2024
4 min read

Since the AI boom triggered by ChatGPT, copyright conflicts have been on the rise. The reason for this is obvious: data from copyright-protected books, images, newspaper articles, etc. is needed to train AI. The AI Act is intended to create regulations to balance the different interests, at least in the EU.

Generative AI such as ChatGPT, Midjourney and others rely on training with huge amounts of data in order to generate new texts, images or videos on the basis of this data. The data entered is not created by the companies behind the generative AI itself, but often has an author or artist. Especially since the AI boom triggered by ChatGPT at the end of 2022, legal conflicts between the creators and the providers of Generative AI have been increasing. For example, a group of artists in the US filed a class action lawsuit against several providers of image generators in early 2023. At the end of last year, the New York Times took OpenAI and Microsoft to court for allegedly using several million newspaper articles for the ChatGPT training. The first lawsuits of this kind have also been filed in Germany, such as that brought by a photographer against an association that, among other things, provided images of the photographer for Generative AI training.

On 13.03.2024, the EU Parliament approved the AI Act. This means that the introduction of the world's first comprehensive regulation of AI is on the home straight. During the legislative process, the transparency obligations of AI developers towards authors were a particular topic of discussion. European legislators were faced with the challenge of balancing legitimate copyright interests with the interests of AI developers in legal certainty and moderate regulation.

I. Content of the AI Act

What is artificial intelligence according to the AI Act? The definition of AI was the subject of much debate during the legislative process. Ultimately, a definition that is similar to the OECD definition prevailed. Three criteria in particular are decisive: the system can be operated with a varying degree of autonomy, the system has an independent adaptability and the outputs of the system can be traced back to the inputs.

The AI Act divides AI systems into three different categories: prohibited AI systems, high-risk AI systems and AI systems without risk. At the heart of the AI Act are the high-risk AI systems, which are subject to comprehensive documentation, monitoring and quality requirements. High-risk AI systems include, for example, AI in the areas of employment and personnel management or critical infrastructures.

Special regulations also apply to general purpose AI (GPAI), i.e. AI with a general purpose. According to the AI Act, this includes AI models that have 'significant generality' and are capable of competently performing a wide range of different tasks. As a result, AI models such as the Large Language Model (LLM) GTP-4 behind the ChatGPT application are covered. The requirements for these models primarily include transparency and documentation obligations as well as compliance with EU copyright law. Stricter requirements apply to GPAI models 'with systematic risk', e.g. with regard to their quality and risk management. This risk should apply to models with a high computational effort for their training, so that GPT-4, for example, should fall under this category.

II. Copyright protection in the AI Act

Initially, in response to the appearance of ChatGPT at the end of 2022, Art. 28b AI Act-Parl was incorporated into the AI Act. The regulation subjected the so-called basic models at the time to strict obligations such as preventive risk containment and public transparency of all training data protected by copyright. Ultimately, Art. 28b of the AI Act-Parl was not able to prevail due to the many concerns expressed ('innovation killer', 'competitive disadvantage' or 'overregulation') and was replaced by the new Title 8a (Art. 52a et seq.).

Art. 52c AI Act regulates the obligation to introduce a policy for the purpose of complying with EU copyright law. In particular, the GPAI providers should observe the copyright reservations of use in text and data mining in accordance with Art. 4 of the DSM Copyright Directive (implemented in Germany by Section 44b UrhG). Accordingly, authors can reserve their rights in order to prevent mining for GPAI training purposes.

Although the standard continues to regulate comprehensive transparency and documentation obligations for providers of LPAIs, these now largely no longer apply to the public, but to authorities and providers of AI systems that wish to integrate the LPAI model into their system. However, the information obligations of the providers of PPAI have been significantly weakened vis-à-vis any copyright holders. A 'sufficiently detailed summary of the content used for training' is already sufficient, which does not have to be technically detailed (argument: protection of trade secrets), but is merely intended to make it easier for any copyright holders to enforce their rights (Art. 52c para. 1 d) AI Act).

Not explicitly regulated, but evident from Recital 66j of the AI Act, is the obligation of GPAI providers in other EU countries to comply with EU copyright law. This means that EU law that is not actually applicable in other EU countries is transferred to GPAI providers based there if they offer their product on the EU market. The reason for this is to protect EU providers from unfair competition compared to non-EU providers. For authors within the EU, this intention means an improved level of protection because they can also enforce their rights against foreign GPAI providers within the EU.

An AI Office, yet to be established, will have the powers of a market surveillance authority, alongside national market surveillance authorities, to monitor compliance with all obligations of AI providers. It will also provide a template for the 'sufficiently detailed summary of the content used for training' to be published.

III. Conclusion

Although the EU legislator has reduced the transparency obligations of the GPAI providers towards authors in particular, it has also drawn clear red lines. These are intended to make it easier for any rights holders to enforce their rights. If the requirements for GPAI providers actually lead to sufficient transparency for authors in practice, the legislator has succeeded in striking a good balance between the interests involved. Numerous legal disputes between authors and GPAI providers, which would otherwise be inevitable, could thus be reduced in the future. The decisive factor in this context is likely to be how the template to be provided by the AI Office for the summary of the training data used will be designed.

Comments