Study of ChatGPT citations makes dismal reading for publishers

As more publishers cut content licensing deals with ChatGPT-maker OpenAI, a study put out this week by the Tow Center for Digital Journalism — looking at how the AI chatbot produces citations (i.e. sources) for publishers’ content — makes for interesting, or, well, concerning, reading.

In a nutshell, the findings suggest publishers remain at the mercy of the generative AI tool’s tendency to invent or otherwise misrepresent information, regardless of whether or not they’re allowing OpenAI to crawl their content.

The research, conducted at Columbia Journalism School, examined citations produced by ChatGPT after it was asked to identify the source of sample quotations plucked from a mix of publishers — some of which had inked deals with OpenAI and some which had not.

The Center took block quotes from 10 stories apiece produced by a total of 20 randomly selected publishers (so 200 different quotes in all) — including content from The New York Times (which is currently suing OpenAI in a copyright claim); The Washington Post (which is unaffiliated with the ChatGPT maker); The Financial Times (which has inked a licensing deal); and others.

“We chose quotes that, if pasted into Google or Bing, would return the source article among the top three results and evaluated whether OpenAI’s new search tool would correctly identify the article that was the source of each quote,” wrote Tow researchers Klaudia Jaźwińska and Aisvarya Chandrasekar in a blog post explaining their approach and summarizing their findings.

“What we found was not promising for news publishers,” they go on. “Though OpenAI emphasizes its ability to provide users ‘timely answers with links to relevant web sources,’ the company makes no explicit commitment to ensuring the accuracy of those citations. This is a notable omission for publishers who expect their content to be referenced and represented faithfully.”

“Our tests found that no publisher — regardless of degree of affiliation with OpenAI — was spared inaccurate representations of its content in ChatGPT,” they added.

Unreliable sourcing

The researchers say they found “numerous” instances where publishers’ content was inaccurately cited by ChatGPT — also finding what they dub “a spectrum of accuracy in the responses”. So while they found “some” entirely correct citations (i.e. meaning ChatGPT accurately returned the publisher, date, and URL of the block quote shared with it), there were “many” citations that were entirely wrong; and “some” that fell somewhere in between.

In short, ChatGPT’s citations appear to be an unreliable mixed bag. The researchers also found very few instances where the chatbot didn’t project total confidence in its (wrong) answers.

Some of the quotes were sourced from publishers that have actively blocked OpenAI’s search crawlers. In those cases, the researchers say they were anticipating that it would have issues producing correct citations. But they found this scenario raised another issue — as the bot “rarely” ‘fessed up to being unable to produce an answer. Instead, it fell back on confabulation in order to generate some sourcing (albeit, incorrect sourcing).

“In total, ChatGPT returned partially or entirely incorrect responses on 153 occasions, though it only acknowledged an inability to accurately respond to a query seven times,” said the researchers. “Only in those seven outputs did the chatbot use qualifying words and phrases like ‘appears,’ ‘it’s possible,’ or ‘might,’ or statements like ‘I couldn’t locate the exact article’.”

They compare this unhappy situation with a standard internet search where a search engine like Google or Bing would typically either locate an exact quote, and point the user to the website/s where they found it, or state they found no results with an exact match.

ChatGPT’s “lack of transparency about its confidence in an answer can make it difficult for users to assess the validity of a claim and understand which parts of an answer they can or cannot trust,” they argue.

For publishers, there could also be reputation risks flowing from incorrect citations, they suggest, as well as the commercial risk of readers being pointed elsewhere.

Decontextualized data

The study also highlights another issue. It suggests ChatGPT could essentially be rewarding plagiarism. The researchers recount an instance where ChatGPT erroneously cited a website which had plagiarized a piece of “deeply reported” New York Times journalism, i.e. by copy-pasting the text without attribution, as the source of the NYT story — speculating that, in that case, the bot may have generated this false response in order to fill in an info gap that resulted from its inability to crawl the NYT’s website.

“This raises serious questions about OpenAI’s ability to filter and validate the quality and authenticity of its data sources, especially when dealing with unlicensed or plagiarized content,” they suggest.

In further findings that are likely to be concerning for publishers which have inked deals with OpenAI, the study found ChatGPT’s citations were not always reliable in their cases either — so letting its crawlers in doesn’t appear to guarantee accuracy, either.

The researchers argue that the fundamental issue is OpenAI’s technology is treating journalism “as decontextualized content”, with apparently little regard for the circumstances of its original production.

Another issue the study flags is the variation of ChatGPT’s responses. The researchers tested asking the bot the same query multiple times and found it “typically returned a different answer each time”. While that’s typical of GenAI tools, generally, in a citation context such inconsistency is obviously suboptimal if it’s accuracy you’re after.

While the Tow study is small scale — the researchers acknowledge that “more rigorous” testing is needed — it’s nonetheless notable given the high-level deals that major publishers are busy cutting with OpenAI.

If media businesses were hoping these arrangements would lead to special treatment for their content vs competitors, at least in terms of producing accurate sourcing, this study suggests OpenAI has yet to offer any such consistency.

While publishers that don’t have licensing deals but also haven’t outright blocked OpenAI’s crawlers — perhaps in the hopes of at least picking up some traffic when ChatGPT returns content about their stories — the study makes dismal reading too, since citations may not be accurate in their cases either.

In other words, there is no guaranteed “visibility” for publishers in OpenAI’s search engine even when they do allow its crawlers in.

Nor does completely blocking crawlers mean publishers can save themselves from reputational damage risks by avoiding any mention of their stories in ChatGPT. The study found the bot still incorrectly attributed articles to the New York Times despite the ongoing lawsuit, for example.

‘Little meaningful agency’

The researchers conclude that as it stands, publishers have “little meaningful agency” over what happens with and to their content when ChatGPT gets its hands on it (directly or, well, indirectly).

The blog post includes a response from OpenAI to the research findings — which accuses the researchers of running an “atypical test of our product”.

“We support publishers and creators by helping 250 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution,” OpenAI also told them, adding: “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results.”

Source link