Last week, the Authors Guild sent an open letter to the leaders of some of the world’s biggest generative AI companies, urging them to obtain consent, credit, and fairly compensate writers for the use of copyrighted materials in training AI. The letter, signed by over 9,000 writers, including renowned authors like George Saunders and Margaret Atwood, highlights the growing concern among creatives regarding the lack of recognition and compensation for their contributions to training generative AI systems.
Generative AI systems, such as large language models (LLMs), rely on extensive training data to produce coherent and contextually relevant outputs. However, the sources of this training data have remained mostly undisclosed, leading to speculation and growing dissatisfaction among writers and visual artists who have noticed similarities between their work and the output of these AI systems. Many have called on generative AI companies to disclose their data sources and compensate the creators whose works were used. Some have resorted to open letters and social media posts, while others have chosen to pursue legal action.
The intersection of copyright law and AI plays a significant role in addressing these concerns. However, copyright law alone is ill-equipped to tackle the broad scope of issues faced by artists. These range from longstanding concerns about employment and compensation in a rapidly changing landscape shaped by the internet to new anxieties surrounding privacy and the incorporation of personal and uncopyrightable characteristics. While copyright can provide some limited answers, it falls short in addressing the broader implications of AI on society.
Mike Masnick, editor of the technology blog Techdirt, emphasizes the need for holistic approaches to deal with the questions raised by AI’s impact on various aspects of society. Relying solely on copyright as a tool to address these issues is seen as misplaced. The complexity of AI’s implications necessitates a multi-faceted approach that considers not just copyright but also other legal and ethical considerations.
Legal action has become increasingly prevalent in the pursuit of recognition and compensation for artists whose works were used to train AI systems. Comedian Sarah Silverman, along with four other authors, recently filed separate class-action lawsuits against OpenAI, alleging that the company trained its ChatGPT system using their works without permission. These lawsuits are part of a broader trend, with the Joseph Saveri Law Firm, which specializes in antitrust litigation, also representing artists suing other AI image generator companies like Stability AI, Midjourney, and DeviantArt. The outcome of these lawsuits holds significant implications for the legal status of the data sets used to train AI models and the use of fair use defenses.
The Silverman case specifically raises concerns about the scraping of copyrighted material, including Silverman’s memoir, “Bedwetter,” from shadow libraries that host pirated ebooks and academic papers. The court’s ruling could potentially set new precedents regarding the fair use of scraped copyrighted material by AI models. Law professor Matthew Sag from Emory University suggests that Silverman’s lawsuit holds more promise compared to others, as it addresses compelling arguments related to copyright infringement. The court’s decision in this case could shape how future lawsuits involving AI and copyrighted materials are handled. OpenAI has not yet commented on the matter.
It is crucial to understand the nuanced nature of AI’s relationship with copyrighted works. While the lawsuits argue that LLMs “copy” protected works, experts contend that a more accurate characterization is that they “digest” the training data. LLMs learn from the data to predict the optimal next word in a given sequence, rather than simply copying it like a scribe in a monastery. This distinction is essential when considering the legal implications of AI’s use of copyrighted materials.
In conclusion, the Authors Guild’s open letter and the subsequent lawsuits filed by artists against generative AI companies highlight the growing demand for recognition and fair compensation in the AI ecosystem. While copyright law plays a role, it is clear that a more comprehensive approach is needed to address the multifaceted challenges posed by AI. As these legal battles unfold, they will shape not only the future of AI but also the relationship between artists, their copyrighted works, and AI systems.