OpenAI claims NYT is not telling the full story in its copyright lawsuit

January 9, 2024

109

OpenAI on Monday said that The New York Times (NYT) is not telling the full story about the lawsuit it filed against the Sam Altman-led company and Microsoft on December 27.

“Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third–party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate,” OpenAI wrote in a blog post.

As part of the lawsuit, the NYT submitted approximately 100 examples of copyright violations that showcase ChatGPT or its underlying model returning pieces of text that are nearly identical to paragraphs published as part of NYT articles or editorial content.

However, OpenAI has claimed that even when “manipulated” prompts are used, its models “don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

OpenAI said the examples put forth by NYT are not typical examples of misuse or allowed user activity. It noted that the generated texts are not a substitute for the prestigious newspaper.

OpenAI working on solving the regurgitation issue

The Sam Altman-led company said it has identified and is working on solving the “regurgitation” issue of ChatGPT, which it terms as “memorization” and said is a failure of the model training process.

Memorization, according to the company, tends to happen more commonly when particular content appears more than once in training data, in this case, NYT’s articles appearing on other websites as well.

“So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” the company wrote in the blog post.

Experts argue over copyright claims

While there has been a lot of commentary about the NYT lawsuit against OpenAI, several technology innovators seem to be sympathizing with OpenAI’s logic.

“After reading the @nytimes lawsuit against @OpenAI and @Microsoft, I find my sympathies more with OpenAI and Microsoft than with the NYT,” Andrew Ng, one of the leading scientists in the field of AI wrote on X, formerly Twitter.

Ng claimed that just as humans are allowed to read documents on the open internet, learn from them, and synthesize brand-new ideas, AI should be allowed to do so too.

“I would like to see training on the public internet covered under fair use — society will be better off this way — though whether it actually is will ultimately be up to legislators and the courts,” the AI scientist explained in DeepLearning.AI’s weekly newsletter.

Somewhat supporting OpenAI’s claims, Ng further said that the examples of violations put forth by NYT occurred due to a RAG-like mechanism where the user prompt causes the system to browse the web, retrieve a specific article, and then print it out.

Systems architect Daniel Jeffries also took to Twitter to explain why the Times case has a near-zero probability of winning and somewhat supported OpenAI’s claims.

Jeffries was reacting to Jason Klint’s post on Twitter, which argued that the Times case was more likely to win. Klint is the CEO of Digital Content Next, a trade association for content companies.

The systems architect also pointed out that the Times case may go the same way the Sarah Silverman case went, wherein a US district judge had ruled that determining whether generated images may be in direct violation of copyright laws was “not plausible” at the moment.

OpenAI has already stated that it believes training AI models using publicly available internet materials is fair use. This practice, according to the company, is supported by long-standing and widely accepted precedents.

Before the NYT filed the lawsuit, OpenAI said it was holding negotiations on a deal with the NYT through December 19.

“The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting,” it wrote in the blog post, adding that it had already explained to the Times that their content didn’t meaningfully contribute to the training of its existing models and also wouldn’t be sufficiently impactful for future training.

This story originally appeared on Computerworld

OpenAI claims NYT is not telling the full story in its copyright lawsuit

OpenAI working on solving the regurgitation issue

Experts argue over copyright claims

LinkedIn is developing in-app games to further distract you from your job hunt

I’m here for the hoverboard

Apple can’t get out of facing a class-action lawsuit over AirTags stalking claims

Most Popular

Electric Transmission Buildout Could Cost Americans Trillions of Dollars | The Gateway Pundit

positive interest rates By Reuters

Exploring Omega’s Constellation Meteorite Collection

Khris Middleton sparks Bucks past Suns after 16-game absence

Recent Comments

WORLD NEWS

Israel launches night raid on Gaza’s al-Shifa hospital

Putin poised to rule for another six years after re-election in Russia

North Korea fires ballistic missile as top US diplomat visits Seoul

TRENDING NEWS

Judy Garland ‘Wizard of Oz’ Ruby Slippers Theft: Second Man Charged

Justin Timberlake’s ‘Everything I Thought It Was’ Voted Best New Music

North West Gives First Interview on ‘Elementary School Dropout’ Album

POPULAR CATEGORY

ABOUT US

FOLLOW US