Artificial Intelligence Chatbot Responses to Patient Queries on Traumatic Brain Injury: An Expert Assessment of Reliability and Accuracy

  • Patrick Schuss
  • , Andreas S Gonschorek
  • , Michael Kämper
  • , Johannes Lemcke
  • , Hans-Jörg Meisel
  • , Witold Rogge
  • , Marc Schaan
  • , Peter Schwenkreis
  • , Martin Strowitzki
  • , Kai Wohlfahrt
  • , Ingo Schmehl
  • , Neuro-Trauma Working Group

Research output: Contribution to journalOriginal Articlepeer-review

Abstract

The increasing use of artificial intelligence-driven chatbots for medical queries requires a systematic evaluation of their accuracy, reliability, and potential role in patient education. This study assesses the performance of three widely used chatbots-ChatGPT, Google Gemini, and Microsoft CoPilot-in answering patient-oriented questions related to traumatic brain injury (TBI). A standardized set of questions related to TBI was developed, divided into eight subtopics, and presented to each chatbot using unified prompts. The responses were evaluated together with reference answers prepared by experts from a group of specialists in the fields of neurology, neurosurgery, and neurorehabilitation, and subsequently assessed in a survey of patients undergoing rehabilitation for TBI. Performance was evaluated using a modified scoring framework in five key dimensions of quality. Statistical analysis included multivariate analysis of variance to compare chatbot performance and logistic regression analysis to determine the likelihood of chatbot responses being considered an adequate substitute for expert advice. Significant differences between the chatbots were found in several quality dimensions, with ChatGPT scoring higher than Gemini and CoPilot on reliability, responsiveness, and perceived trustworthiness (p < 0.05). No chatbot consistently demonstrated an advantage in conveying empathy. Logistic regression analysis revealed that responses from ChatGPT were significantly more likely to be rated as an adequate substitute for expert input (p < 0.0001, OR = 4.3, 95% CI: 2.4-7.6). AI-driven chatbots vary in their ability to provide high-quality medical information, with significant differences in reliability and responsiveness. While ChatGPT outperformed other models in providing structured information, further improvements in context awareness and empathy are needed before broader clinical integration can be considered.
Original languageEnglish
Number of pages9
JournalJournal of neurotrauma
Early online dateNov 2025
DOIs
Publication statusPublished - 21 Nov 2025

Fingerprint

Dive into the research topics of 'Artificial Intelligence Chatbot Responses to Patient Queries on Traumatic Brain Injury: An Expert Assessment of Reliability and Accuracy'. Together they form a unique fingerprint.

Cite this