The use of media content for training Generative AI models is subject to intense controversies. Some media organisations are closing deals with companies such as OpenAI about the use of their contents for training (like Springer), others take the ways to the courts to object to their data being used (New York Times vs Open AI) or might consider training their own models. All of them struggle with the challenge of sorting out the conditions for using media data for training AI models in a way that, on the one hand, respects the (intellectual) property rights, economic and competitive interests of the media, and on the other hand, contributes to more responsible models, trained on high-quality and multi-language content. Looking more broadly, the provision of high-quality, publicly available data for AI training is seen as a measure needed to break concentrations of power - since these largely depend on asymmetries in access to data. Media is one such category of public interest data sources, alongside research or heritage data.
The goal of this panel is to map the different competing interests and considerations and explore the extent to which regulations such as the AI Act or the Directive on Copyrights and the Single Market offer workable solutions and where there is room for improvement. As Europe aims to build common data spaces for media, the panel will map out different potential strategies.