ChatGPT spewed out cancer treatment regimens that contained a “potentially dangerous” mixture of correct and false information, according to the results of a study published Thursday.
Researchers at Brigham and Women’s Hospital, a brand of Harvard Medical School, prompted OpenAI’s popular chatbot to provide treatment advice that aligned with guidelines established by the National Comprehensive Cancer Network.
While all of ChatGPT’s outputs “included at least 1 NCCN-concordant treatment,” about 34% also contained an incorrect treatment recommendation, the study found.
Additionally, about 12% of ChatGPT’s responses contained “hallucinations” — meaning outright false information with no links to accepted cancer treatments.
ChatGPT “speaks oftentimes in a very sure way that seems to make sense, and the way that it can mix incorrect and correct information is potentially dangerous,” researcher Danielle Bitterman, an oncologist at the Artificial Intelligence in Medicine program of the Mass General Brigham health system, told Bloomberg.
The study’s results support a common concern raised by critics, including billionaire Elon Musk, who have warned that advanced AI tools will rapidly spread misinformation if proper guardrails are not put in place.
The researchers performed the study by prompting ChatGPT to generate “breast, prostate and lung cancer treatment recommendations.”
“Language learning models can pass the US Medical Licensing Examination, encode clinical knowledge and provide diagnoses better than laypeople,” the researchers said. “However, the chatbot did not perform well at providing accurate cancer treatment commendations.”
“Hallucinations were primarily recommendations for localized treatment of advanced disease, targeted therapy, or immunotherapy,” they added.
OpenAI has been repeatedly stated that GPT-4, the current chatbot available to the public, is prone to making mistakes.
In a March blog post, the firm said GPT-4 “still is not fully reliable” and admitted that “’hallucinates’ facts and makes reasoning errors.”
“Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with additional context, or avoiding high-stakes uses altogether) matching the needs of a specific use-case,” OpenAI said.
“Developers should have some responsibility to distribute technologies that do not cause harm, and patients and clinicians need to be aware of these technologies’ limitations,” the researchers added.
ChatGPT has drawn intense scrutiny as it boomed in popularity this year.
Earlier this month, UK-based researchers determined that ChatGPT displayed a “significant” bias toward liberal political viewpoints.
Issues with inaccurate responses aren’t limited to OpenAI’s chatbot. Google’s version, Bard, has also been known to generate false information in response to user prompts.
As The Post reported, some experts say chatbots and other AI products could cause major disruption in the upcoming 2024 presidential election.
This story originally appeared on NYPost