OpenAI has filed a motion to overturn a court order requiring it to hand over 20 million anonymized ChatGPT chat logs in a copyright infringement case brought by the New York Times and other major media outlets. The company has requested that a federal judge in New York reconsider the order, arguing that the release of such a vast number of conversations would violate user privacy and that the vast majority of the logs are not relevant to the copyright claims at hand. In a court filing, OpenAI warned that the disclosure of the logs would expose personal conversations from users who have used the chatbot over the past three years, potentially leading to a ‘speculative fishing expedition’ by the New York Times to find any potential evidence of copyright misuse.
The New York Times and its co-plaintiffs have maintained that the logs are essential to proving whether OpenAI’s AI system improperly used their copyrighted material to train the model. The lawsuit alleges that OpenAI misused articles from the Times and other news outlets to train its AI without permission, leading to fabricated responses that mimic the style of these publications. In response, the New York Times has argued that the logs are necessary to establish whether ChatGPT’s responses contained unauthorized reproductions of their copyrighted content, which they claim is critical to disproving OpenAI’s assertion that they ‘hacked’ the chatbot’s responses to create evidence.
Magistrate Judge Ona Wang has ordered OpenAI to produce the transcripts, stating that the company’s de-identification process, which involves anonymizing user data, would protect user privacy. However, the company has been given until Friday to comply with the order, which has sparked debate over the balance between intellectual property rights and user privacy concerns. OpenAI’s stance highlights the growing tensions in the tech industry over how data is managed and shared, as well as the implications for companies that rely on vast amounts of user data to train their AI systems.