TY - JOUR
T1 - Chat-GPT in triage
T2 - Still far from surpassing human expertise - An observational study
AU - Zaboli, Arian
AU - Brigo, Francesco
AU - Brigiari, Gloria
AU - Massar, Magdalena
AU - Parodi, Marta
AU - Pfeifer, Norbert
AU - Magnarelli, Gabriele
AU - Turcato, Gianni
N1 - Lehr-KH (SABES-ASDAA), Teaching Hospital of the Paracelsus Medical Private University (PMU), Bolzano, Italy und Hospital of Merano-Meran (SABES-ASDAA), Merano-Meran, Italy
PY - 2025/6
Y1 - 2025/6
N2 - Background: Triage is essential in emergency departments (EDs) to prioritize patient care based on clinical urgency. Recent investigations have explored the role of large language models (LLMs) in triage, but their effectiveness compared to human triage remains uncertain. This study assessed the effectiveness of ChatGPT 4.0 in triaging ED patients. Methods: This retrospective study analyzed data from 2658 patients. Triage codes assigned by human triage personnel were compared with those assigned by Artificial Intelligence (AI) triage using Chat-GPT 4.0. Agreement between human and AI triage was assessed using Cohen's kappa statistic. Clinical outcomes were evaluated through Receiver Operating Characteristic (ROC) curves to determine predictive accuracy. Sensitivity and specificity of both triage systems were compared across different symptoms using 2 x 2 contingency tables. Results: The Cohen's kappa statistic for agreement between human and AI triage was 0.125 (95 % CI: 0.100-0.134). ROC analysis demonstrated that human triage outperformed AI in predicting all study outcomes, with statistically significant differences. For 30-day mortality, the ROC of human triage was 0.88, while for AI triage it was 0.70, p < 0.001. A similar result was observed for life-saving interventions, where human triage had an ROC of 0.98 and AI triage 0.87, p = 0.014. For specific symptoms, human triage showed superior sensitivity and specificity. Conclusions: LLMs like Chat-GPT 4.0 have limited utility in ED triage, particularly due to their lower sensitivity for high-risk patients, which lead to under-triage. Human triage remains more reliable than Chat-GPT.
AB - Background: Triage is essential in emergency departments (EDs) to prioritize patient care based on clinical urgency. Recent investigations have explored the role of large language models (LLMs) in triage, but their effectiveness compared to human triage remains uncertain. This study assessed the effectiveness of ChatGPT 4.0 in triaging ED patients. Methods: This retrospective study analyzed data from 2658 patients. Triage codes assigned by human triage personnel were compared with those assigned by Artificial Intelligence (AI) triage using Chat-GPT 4.0. Agreement between human and AI triage was assessed using Cohen's kappa statistic. Clinical outcomes were evaluated through Receiver Operating Characteristic (ROC) curves to determine predictive accuracy. Sensitivity and specificity of both triage systems were compared across different symptoms using 2 x 2 contingency tables. Results: The Cohen's kappa statistic for agreement between human and AI triage was 0.125 (95 % CI: 0.100-0.134). ROC analysis demonstrated that human triage outperformed AI in predicting all study outcomes, with statistically significant differences. For 30-day mortality, the ROC of human triage was 0.88, while for AI triage it was 0.70, p < 0.001. A similar result was observed for life-saving interventions, where human triage had an ROC of 0.98 and AI triage 0.87, p = 0.014. For specific symptoms, human triage showed superior sensitivity and specificity. Conclusions: LLMs like Chat-GPT 4.0 have limited utility in ED triage, particularly due to their lower sensitivity for high-risk patients, which lead to under-triage. Human triage remains more reliable than Chat-GPT.
KW - Advanced nurse practice
KW - Artificial intelligence
KW - ChatGPT
KW - Emergency department
KW - Llm
KW - Large language models
KW - Triage
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=pmu_pure&SrcAuth=WosAPI&KeyUT=WOS:001456639500001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1016/j.ajem.2025.03.028
DO - 10.1016/j.ajem.2025.03.028
M3 - Original Article
C2 - 40120387
SN - 0735-6757
VL - 92
SP - 165
EP - 171
JO - THE AMERICAN JOURNAL OF EMERGENCY MEDICINE
JF - THE AMERICAN JOURNAL OF EMERGENCY MEDICINE
ER -