TY - JOUR
T1 - Is artificial intelligence ready to replace specialist doctors entirely?
T2 - ENT specialists vs ChatGPT: 1-0, ball at the center
AU - Dallari, Virginia
AU - Sacchetto, Andrea
AU - Saetti, Roberto
AU - Calabrese, Luca
AU - Vittadello, Fabio
AU - Gazzini, Luca
N1 - Lehr-KH Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Bolzano-Bozen, Italy
PY - 2024/2
Y1 - 2024/2
N2 - PURPOSE: The purpose of this study is to evaluate ChatGPT's responses to Ear, Nose and Throat (ENT) clinical cases and compare them with the responses of ENT specialists.METHODS: We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT. The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days.RESULTS: Among the 20 cases, 8 were rated as low complexity, 6 as moderate complexity and 6 as high complexity. The overall mean correctness and consistency score of ChatGPT responses was 3.80 (SD 1.02) and 2.89 (SD 1.24), respectively. We did not find a statistically significant difference in the average ChatGPT correctness and coherence score according to case complexity. The total intraclass correlation coefficient (ICC) for the stability of the correctness and consistency of ChatGPT was 0.763 (95% confidence interval [CI] 0.553-0.895) and 0.837 (95% CI 0.689-0.927), respectively.CONCLUSIONS: Our results revealed the potential usefulness of ChatGPT in ENT diagnosis. The instability in responses and the inability to recognise certain clinical elements are its main limitations.
AB - PURPOSE: The purpose of this study is to evaluate ChatGPT's responses to Ear, Nose and Throat (ENT) clinical cases and compare them with the responses of ENT specialists.METHODS: We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT. The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days.RESULTS: Among the 20 cases, 8 were rated as low complexity, 6 as moderate complexity and 6 as high complexity. The overall mean correctness and consistency score of ChatGPT responses was 3.80 (SD 1.02) and 2.89 (SD 1.24), respectively. We did not find a statistically significant difference in the average ChatGPT correctness and coherence score according to case complexity. The total intraclass correlation coefficient (ICC) for the stability of the correctness and consistency of ChatGPT was 0.763 (95% confidence interval [CI] 0.553-0.895) and 0.837 (95% CI 0.689-0.927), respectively.CONCLUSIONS: Our results revealed the potential usefulness of ChatGPT in ENT diagnosis. The instability in responses and the inability to recognise certain clinical elements are its main limitations.
KW - Humans
KW - Artificial Intelligence
KW - Pharynx
KW - Neck
KW - Nose
U2 - 10.1007/s00405-023-08321-1
DO - 10.1007/s00405-023-08321-1
M3 - Original Article
C2 - 37962570
SN - 0937-4477
VL - 281
SP - 995
EP - 1023
JO - EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY
JF - EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY
IS - 2
ER -