Study finds AI offers no advantage in patient health decision-making

LONDON, U.K.: People who turn to artificial intelligence for medical advice may not be making better health decisions than those who rely on traditional sources such as internet searches or official health websites, according to a new study published in Nature Medicine.

The findings come as AI chatbots are increasingly used by patients seeking guidance on symptoms and next steps, even though there is limited evidence that such tools are safer or more effective than existing methods.

Researchers at the University of Oxford's Internet Institute worked with doctors to develop 10 medical scenarios, ranging from minor ailments such as a common cold to emergencies such as a life-threatening haemorrhage causing bleeding on the brain.

In initial testing without human users, three large language models — OpenAI's ChatGPT-4o, Meta's Llama 3, and Cohere's Command R+ — correctly identified the medical condition in 94.9 percent of cases. However, they selected the appropriate course of action, such as calling an ambulance or seeking medical care, in 56.3 percent of cases. The companies did not respond to requests for comment.

‘Huge Gap' Between AI's Promise and Real-world Use

To assess how AI performs in practice, the researchers then recruited 1,298 participants in Britain. Participants were assigned to investigate symptoms and decide on next steps using either AI tools or their usual resources, such as general internet searches, personal experience, or the National Health Service website.

When people were involved, performance dropped sharply. Relevant conditions were identified in fewer than 34.5 percent of cases, and the correct course of action was chosen in less than 44.2 percent, results no better than those achieved by participants using traditional sources.

Adam Mahdi, a co-author of the paper and associate professor at Oxford, said the findings revealed the "huge gap" between what AI systems are capable of and how they perform when used by the public.

"The knowledge may be in those bots; however, this knowledge doesn't always translate when interacting with humans," he said, adding that more work was needed to understand why this breakdown occurs.

Incomplete Information and Misleading Responses

The research team closely examined about 30 interactions between participants and AI systems. They found that users often provided incomplete or incorrect information about their symptoms. At the same time, the AI models sometimes produced misleading or inaccurate responses.

In one example, a patient describing symptoms consistent with a subarachnoid haemorrhage, including a stiff neck, light sensitivity, and the "worst headache ever", was correctly advised by AI to go to the hospital. Another participant described similar symptoms but referred to a "terrible" headache and was advised instead to lie down in a darkened room.

The researchers plan to expand the study to other countries and languages and to examine whether AI performance improves over time or varies across settings.

The study was supported by the data company Prolific, the German non-profit Dieter Schwarz Stiftung, and the governments of the United Kingdom and the United States.

More Australian News

Access More

Sign up for Australian News

a daily newsletter full of things to discuss over drinks.and the great thing is that it's on the house!