Chat-GPT in triage: Still far from surpassing human expertise - An observational study

  • Arian Zaboli
  • , Francesco Brigo
  • , Gloria Brigiari
  • , Magdalena Massar
  • , Marta Parodi
  • , Norbert Pfeifer
  • , Gabriele Magnarelli
  • , Gianni Turcato

Research output: Contribution to journalOriginal Articlepeer-review

6 Citations (Web of Science)

Abstract

Background: Triage is essential in emergency departments (EDs) to prioritize patient care based on clinical urgency. Recent investigations have explored the role of large language models (LLMs) in triage, but their effectiveness compared to human triage remains uncertain. This study assessed the effectiveness of ChatGPT 4.0 in triaging ED patients. Methods: This retrospective study analyzed data from 2658 patients. Triage codes assigned by human triage personnel were compared with those assigned by Artificial Intelligence (AI) triage using Chat-GPT 4.0. Agreement between human and AI triage was assessed using Cohen's kappa statistic. Clinical outcomes were evaluated through Receiver Operating Characteristic (ROC) curves to determine predictive accuracy. Sensitivity and specificity of both triage systems were compared across different symptoms using 2 x 2 contingency tables. Results: The Cohen's kappa statistic for agreement between human and AI triage was 0.125 (95 % CI: 0.100-0.134). ROC analysis demonstrated that human triage outperformed AI in predicting all study outcomes, with statistically significant differences. For 30-day mortality, the ROC of human triage was 0.88, while for AI triage it was 0.70, p < 0.001. A similar result was observed for life-saving interventions, where human triage had an ROC of 0.98 and AI triage 0.87, p = 0.014. For specific symptoms, human triage showed superior sensitivity and specificity. Conclusions: LLMs like Chat-GPT 4.0 have limited utility in ED triage, particularly due to their lower sensitivity for high-risk patients, which lead to under-triage. Human triage remains more reliable than Chat-GPT.
Original languageEnglish
Pages (from-to)165-171
Number of pages7
JournalTHE AMERICAN JOURNAL OF EMERGENCY MEDICINE
Volume92
Early online dateMar 2025
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Advanced nurse practice
  • Artificial intelligence
  • ChatGPT
  • Emergency department
  • Llm
  • Large language models
  • Triage

Fingerprint

Dive into the research topics of 'Chat-GPT in triage: Still far from surpassing human expertise - An observational study'. Together they form a unique fingerprint.

Cite this