TY - JOUR
T1 - Comparing deep learning and pathologist quantification of cell-level PD-L1 expression in non-small cell lung cancer whole-slide images
AU - van Eekelen, Leander
AU - Spronck, Joey
AU - Looijen-Salamon, Monika
AU - Vos, Shoko
AU - Munari, Enrico
AU - Girolami, Ilaria
AU - Eccher, Albino
AU - Acs, Balazs
AU - Boyaci, Ceren
AU - de Souza, Gabriel Silva
AU - Demirel-Andishmand, Muradije
AU - Meesters, Luca Dulce
AU - Zegers, Daan
AU - van der Woude, Lieke
AU - Theelen, Willemijn
AU - van den Heuvel, Michel
AU - Grünberg, Katrien
AU - van Ginneken, Bram
AU - van der Laak, Jeroen
AU - Ciompi, Francesco
N1 - Provincial Hospital of Bolzano (SABES-ASDAA),
Bolzano-Bozen, Italy
PY - 2024/3/26
Y1 - 2024/3/26
N2 - Programmed death-ligand 1 (PD-L1) expression is currently used in the clinic to assess eligibility for immune-checkpoint inhibitors via the tumor proportion score (TPS), but its efficacy is limited by high interobserver variability. Multiple papers have presented systems for the automatic quantification of TPS, but none report on the task of determining cell-level PD-L1 expression and often reserve their evaluation to a single PD-L1 monoclonal antibody or clinical center. In this paper, we report on a deep learning algorithm for detecting PD-L1 negative and positive tumor cells at a cellular level and evaluate it on a cell-level reference standard established by six readers on a multi-centric, multi PD-L1 assay dataset. This reference standard also provides for the first time a benchmark for computer vision algorithms. In addition, in line with other papers, we also evaluate our algorithm at slide-level by measuring the agreement between the algorithm and six pathologists on TPS quantification. We find a moderately low interobserver agreement at cell-level level (mean reader-reader F1 score = 0.68) which our algorithm sits slightly under (mean reader-AI F1 score = 0.55), especially for cases from the clinical center not included in the training set. Despite this, we find good AI-pathologist agreement on quantifying TPS compared to the interobserver agreement (mean reader-reader Cohen's kappa = 0.54, 95% CI 0.26-0.81, mean reader-AI kappa = 0.49, 95% CI 0.27-0.72). In conclusion, our deep learning algorithm demonstrates promise in detecting PD-L1 expression at a cellular level and exhibits favorable agreement with pathologists in quantifying the tumor proportion score (TPS). We publicly release our models for use via the Grand-Challenge platform.
AB - Programmed death-ligand 1 (PD-L1) expression is currently used in the clinic to assess eligibility for immune-checkpoint inhibitors via the tumor proportion score (TPS), but its efficacy is limited by high interobserver variability. Multiple papers have presented systems for the automatic quantification of TPS, but none report on the task of determining cell-level PD-L1 expression and often reserve their evaluation to a single PD-L1 monoclonal antibody or clinical center. In this paper, we report on a deep learning algorithm for detecting PD-L1 negative and positive tumor cells at a cellular level and evaluate it on a cell-level reference standard established by six readers on a multi-centric, multi PD-L1 assay dataset. This reference standard also provides for the first time a benchmark for computer vision algorithms. In addition, in line with other papers, we also evaluate our algorithm at slide-level by measuring the agreement between the algorithm and six pathologists on TPS quantification. We find a moderately low interobserver agreement at cell-level level (mean reader-reader F1 score = 0.68) which our algorithm sits slightly under (mean reader-AI F1 score = 0.55), especially for cases from the clinical center not included in the training set. Despite this, we find good AI-pathologist agreement on quantifying TPS compared to the interobserver agreement (mean reader-reader Cohen's kappa = 0.54, 95% CI 0.26-0.81, mean reader-AI kappa = 0.49, 95% CI 0.27-0.72). In conclusion, our deep learning algorithm demonstrates promise in detecting PD-L1 expression at a cellular level and exhibits favorable agreement with pathologists in quantifying the tumor proportion score (TPS). We publicly release our models for use via the Grand-Challenge platform.
KW - Humans
KW - Carcinoma, Non-Small-Cell Lung/pathology
KW - Lung Neoplasms/pathology
KW - Pathologists
KW - B7-H1 Antigen/metabolism
KW - Deep Learning
KW - Immunohistochemistry
KW - Biomarkers, Tumor/metabolism
U2 - 10.1038/s41598-024-57067-1
DO - 10.1038/s41598-024-57067-1
M3 - Original Article
C2 - 38531958
SN - 2045-2322
VL - 14
SP - 7136
JO - Scientific reports
JF - Scientific reports
IS - 1
ER -