AI gives higher quality answers and is more empathetic than real doctors, a study suggests.
A study from the University of California San Diego compared written answers from doctors and those from ChatGPT to real-world health questions to see which came out on top.
A panel of healthcare professionals preferred ChatGPT’s answers 79 percent of the time, rating them as higher quality in terms of the information provided and greater understanding. The panel didn’t know what was what.
ChatGPT recently caused a stir in the medical community after it was found to be able to pass the gold standard exam required to practice medicine in the US, raising the prospect that it could one day replace human doctors .
A three-doctor panel reviewed each exchange and was blinded as to which was written by a doctor and which was written by AI. Then they rated the answers
Graphs showing how the panel of physicians rated physicians versus ChatGPT’s responses to patient questions, based on quality on the left and empathy on the right
Jessica Kelley, a nurse practitioner at San Diego-based Human Longevity and co-author of the study, said, “ChatGPT messages responded with nuanced and accurate information that often covered more aspects of the patient’s questions than the doctor’s answers. ‘
The research team from the University of California San Diego looked at Reddit’s AskDocs forum, where about 452,000 members post medical issues and verified healthcare professionals make suggestions.
Anyone can answer a question on the forum, but moderators check the provider’s credentials and the answers reflect the respondent’s level of expertise.
While question-and-answer exchanges on social media are not the traditional way a patient speaks to a doctor, the panel said the conversations were reflective of their clinical experience.
The UC San Diego research team randomly chose 195 AskDocs exchanges where a verified physician had answered a public question.
The team submitted the original question to ChatGPT and asked it to write an answer.
A three-doctor panel reviewed each exchange and was blinded as to which was written by a doctor and which was written by AI.
One of the patient’s questions was, “Ask about the risk of going blind if bleach is splashed into one eye, leading to irritated and dry eyes.”
The doctor’s response was, “It sounds like you’re fine.” You should rinse the eye if you get a chemical or foreign object in your eye. You may also contact Poison Control at 1-800-222-1222.”
ChatGPT’s response was, “I’m sorry to hear that bleach was splashed in your eye. It is important to rinse the eye as soon as possible after a chemical splash to remove as much of the chemical as possible and prevent further irritation or damage.
“If you haven’t already, you should rinse your eyes with clean water or saline solution as soon as possible. You can do this by keeping your eye open and running a gentle stream of water over your eye from the inner to the outer corner for at least 15 minutes. If you don’t have access to water, you can use saline or artificial tears to rinse the eye.
“If you experience significant pain, redness, or vision changes, or if you can’t flush your eye properly, seek medical attention immediately. However, if your eye just feels irritated and dry, you may be able to treat it at home by using artificial tears or a lubricating eye ointment to keep the eye moist and comfortable.
“Getting bleach in your eye is unlikely to cause you to go blind, but it’s important to take care of the eye and seek medical attention if necessary to prevent further irritation or damage.”
All three panelists favored ChatGPT’s response. They gave it an average quality score of four and an average empathy score of 4.67. They rated the doctor’s response 3.33 for quality and 2.33 for empathy.
Good or very good quality responses were 3.6 times higher for the chatbot than for doctors and empathic responses were 9.8 times higher.
Professor James Davenport, Hebron and Medlock Professor of Information Technology, University of Bath, UK, said: ‘Both here and across the database, the ChatGPT responses were on average four times as long as the doctors’.
“It is alleged that the raters (all doctors) got the two answers blind, not knowing which was the doctor and which was the ChatGPT. This was probably formally true, but length and style certainly made it clear in practice.
“At least in the six (sample exchanges) given, the doctors made no attempt to be empathetic, knowing their answers were public, while ChatGPT is focused on a 1:1 conversation. So, in terms of empathy, this is far from a level comparison. This could have been more explicit.’
He added: ‘The paper is not saying that ChatGPT can replace doctors, but is rightly calling for further research into whether and how ChatGPT can help doctors generate responses.
“As it points out, “teams of clinicians often rely on canned responses,” and a stochastic parrot like ChatGPT has a much wider range of responses than even the largest library of canned responses.”