大规模胰腺癌检测通过非对比增强CT和深度学习| 文献速递-视觉通用模型与疾病诊断

Title

题目

Large-scale pancreatic cancer detection via non-contrast CT and deep learning

大规模胰腺癌检测通过非对比增强CT和深度学习

文献速递介绍

胰腺导管腺癌（PDAC）是最致命的实体恶性肿瘤，通常在晚期和不可手术的阶段被检测到。早期或偶然发现与延长生存期相关，但使用单一测试筛查无症状个体的PDAC仍然不可行，因为假阳性的潜在危害和低流行率。非对比增强计算机断层扫描（CT）通常用于临床指标，为大规模筛查提供了潜在可能性，然而，长期以来认为使用非对比增强CT无法识别PDAC。在这里，我们开发了一种深度学习方法，名为人工智能胰腺癌检测（PANDA），它可以通过非对比增强CT高精度地检测和分类胰腺病变。PANDA在来自单一中心的3,208名患者的数据集上进行了训练。PANDA在涉及6,239名患者的10个中心的多中心验证中，病变检测的受试者工作特征曲线下面积（AUC）达到了0.986-0.996，对于PDAC识别的灵敏度和特异性，优于平均放射科医生表现34.1%的灵敏度和6.3%的特异性，同时在涉及20,530名连续患者的真实世界多场景验证中，病变检测的灵敏度为92.9%，特异性为99.9%。值得注意的是，与放射学报告（使用对比增强CT）相比，PANDA结合非对比增强CT在鉴别常见胰腺病变亚型方面表现不逊色。PANDA有望成为大规模胰腺癌筛查的新工具。

3.5

Method

方法

The retrospective collection of the patient datasets in each cohort was approved by the institutional review board (IRB) at each institution with a waiver for informed consent: the Shanghai Institution of Pancreatic Diseases (SIPD) IRB, Shengjing Hospital of China Medical University (SHCMU) IRB, First Affiliated Hospital of Zhejiang University (FAHZU) IRB, Xinhua Hospital (XH) of Shanghai Jiao Tong University School of Medicine IRB, Fudan University Shanghai Cancer Center (FUSCC) IRB, Tianjin Medical University Cancer Institute and Hospital (TMUCIH) IRB, Sun Yat-Sen University Cancer Center (SYUCC) IRB, Guangdong Provincial People’s Hospital (GPPH) IRB, Linkou Chang Gung Memorial Hospital (CGMH) IRB, and General University Hospital in Prague (GUHP) IRB. All data in this study were de-identified prior to model training, testing and reader studies.

每个队列中患者数据集的回顾性收集均获得了各自机构的机构审查委员会（IRB）批准，并获得了知情同意的豁免：上海胰腺疾病研究所（SIPD）IRB、中国医科大学盛京医院（SHCMU）IRB、浙江大学第一附属医院（FAHZU）IRB、上海交通大学医学院附属新华医院（XH）IRB、复旦大学上海肿瘤医院（FUSCC）IRB、天津医科大学肿瘤医院（TMUCIH）IRB、中山大学肿瘤医院（SYUCC）IRB、广东省人民医院（GPPH）IRB、林口长庚医院（CGMH）IRB，以及布拉格通用大学医院（GUHP）IRB。在进行模型训练、测试和读者研究之前，本研究中的所有数据均已去标识化。

Results

结果

We present a deep learning model, PANDA, to detect and diagnose PDAC and seven subtypes of non-PDAC lesions (Methods), that is, pancreatic neuroendocrine tumor (PNET), solid pseudopapillary tumor (SPT), intraductal papillary mucinous neoplasm (IPMN), mucinous cystic neoplasm (MCN), serous cystic neoplasm (SCN), chronic pancreatitis, and ‘other’ (cf. Supplementary Table 1), from abdominal and chest non-contrast CT scans. Our model can detect the presence or absence of a pancreatic lesion, segment the lesion, and classify the lesion subtypes (Fig. 1a).

我们提出了一个深度学习模型，PANDA，用于检测和诊断PDAC和七种非PDAC病变亚型（方法），即胰腺神经内分泌肿瘤（PNET）、实性假性乳头状瘤（SPT）、导管内乳头状粘液性肿瘤（IPMN）、粘液性囊性肿瘤（MCN）、浆液性囊性肿瘤（SCN）、慢性胰腺炎和“其他”（见附表1），从腹部和胸部非对比增强CT扫描中（图1a）。我们的模型可以检测胰腺病变的存在或不存在，分割病变，并对病变亚型进行分类。

Figure

图

Fig. 1 | Overview of PANDA’s development, evaluation and clinical translation.a, Model development. PANDA takes non-contrast CT as input and outputs the probability and the segmentation mask of possible pancreatic lesions, including PDAC and seven non-PDAC subtypes; PANDA was trained with pathologyconfirmed patient-level labels and lesion masks annotated on contrast CT images. CP, chronic pancreatitis. b, Model evaluation. We evaluate the performance of PANDA on the internal test cohort, two reader studies (on noncontrast and contrast CT, respectively), external test cohorts consisting of nine centers, a chest CT cohort, and real-world multi-scenario studies (the clinical trial includes two real-world studies; chictr.org.cn, ChiCTR2200064645). c**, Model clinical translation. The real-world clinical evaluation answers five critical questions to close the clinical translational gap for PANDA.

图1 | PANDA的开发、评估和临床转化概述。a，模型开发。PANDA以非对比增强CT为输入，输出可能的胰腺病变的概率和分割掩模，包括PDAC和七种非PDAC亚型；PANDA是使用病理学确认的患者级标签和在对比增强CT图像上注释的病变掩模进行训练的。CP，慢性胰腺炎。b，模型评估。我们在内部测试队列、两个读者研究（分别在非对比和对比CT上）、由九个中心组成的外部测试队列、胸部CT队列以及真实世界的多场景研究中评估了PANDA的性能（该临床试验包括两个真实世界研究；chictr.org.cn，ChiCTR2200064645）。c，模型临床转化。真实世界的临床评估回答了五个关键问题，以弥合PANDA的临床转化差距。

Fig. 2 | Internal and external validation. a,b Receiver operating characteristic curves of lesion detection (a) and PDAC identification (b) for the internal and external test cohorts. c, Proportion of PDACs detected by PANDA in terms of American Joint Committee on Cancer (AJCC) T stage (left) and TNM (tumor, nodes, metastasis) stage (right) in the internal test cohort (n = 105) and external test cohort (n = 2,584). d, Sensitivity, specificity and AUC of lesion detection in the external center cohorts (sites A–I, n = 5,337). e, Proportion of different lesion subtypes detected by PANDA in the internal test cohort (n = 175) and external test cohort (n = 3,669). f, Confusion matrices of differential diagnosis in the internal differential diagnosis cohort (left) and external test cohorts (right). c–e, Error bars indicate 95% CI. The center shows the computed mean of the metric specified by its respective axis labels. The results of subgroups with too few samples to be studied reliably (≤10) are omitted and marked as not applicable (n/a).

图2 | 内部和外部验证。a，b，内部和外部测试队列的病变检测（a）和PDAC识别（b）的接收器操作特征曲线。c，在内部测试队列（n = 105）和外部测试队列（n = 2,584）中，PANDA检测到的PDAC的比例，按照美国癌症联合委员会（AJCC）T分期（左）和TNM（肿瘤、淋巴结、转移）分期（右）进行分析。d，在外部中心队列（A–I站，n = 5,337）中，病变检测的敏感性、特异性和AUC。e，在内部测试队列（n = 175）和外部测试队列（n = 3,669）中，PANDA检测到的不同病变亚型的比例。f，在内部鉴别诊断队列（左）和外部测试队列（右）中的混淆矩阵。c–e，误差线表示95%置信区间。中心显示了其各自轴标签指定的指标的计算平均值。由于样本过少而无法可靠研究（≤10），子组的结果被省略并标记为不适用（n/a）。

Fig. 3 | Reader studies. a, Comparison between PANDA and 33 readers with different levels of expertise on non-contrast CT for lesion detection. b, Lesion detection performance of the same set of readers with the assistance of PANDA on non-contrast CT. c, Comparison between PANDA using non-contrast CT and 15 pancreas specialists using contrast-enhanced CT for lesion detection. *d,e, Balanced accuracy improvement in radiologists with different levels of expertise for lesion detection (d) and PDAC identification (e). f**, Examples of early-stage PDACs and a case of autoimmune pancreatitis (AIP) missed by readers on non-contrast CT and on contrast CT but detected by PANDA.

图3 | 读者研究。a，PANDA与33名具有不同专业水平的读者在非对比增强CT上进行病变检测的比较。b，在非对比增强CT上，在PANDA的协助下，相同一组读者进行的病变检测性能。c，PANDA使用非对比增强CT与15名胰腺专家使用对比增强CT进行病变检测的比较。*d,e，放射科医生在不同专业水平上进行病变检测（d）和PDAC识别（e）方面的平衡准确性改善。f**，早期PDAC的示例和一例自身免疫性胰腺炎（AIP）的案例，这些案例在非对比增强CT和对比增强CT上被读者错过，但被PANDA检测到。

Fig. 4 | Validation on chest non-contrast CT. a, Schematic diagram of the proportion of the pancreatic lesion scanned in chest non-contrast CT. We categorize all cases into three categories, that is, lesion not scanned, lesion partially scanned, and lesion fully scanned, based on the relative position of the lowest scanned slice and the lesion. b, The proportion of the three categories in PDAC and non-PDAC cases. c, ROC curve for lesion detection on non-contrast chest CT. d, Proportion of lesions detected by PANDA in the PDAC (n = 63) and non-PDAC cases (n = 51). Error bars indicate 95% CI. The center shows the computed mean of the metric specified by the respective axis labels. The results of subgroups with too few samples to be studied reliably (≤10) are omitted and marked as ‘n/a’. e, Illustration of how PANDA can detect lesions that are not scanned in chest CT. Two scans of the same patient showing that PANDA can detect dilated pancreatic duct (usually caused by PDAC) even when the PDAC is not scanned. f, PANDA can detect early-stage PDACs and metastatic cancer that was initially misdetected by the radiologists on chest non-contrast CT (COVID-19 prevention CT).

图4 | 胸部非对比增强CT验证。a，示意图显示了在胸部非对比增强CT中扫描的胰腺病变的比例。根据最低扫描切片与病变的相对位置，我们将所有病例分为三类，即未扫描的病变、部分扫描的病变和完全扫描的病变。b，PDAC和非PDAC病例中三个类别的比例。c，在非对比胸部CT上进行的病变检测的ROC曲线。d，在PDAC（n = 63）和非PDAC病例（n = 51）中，PANDA检测到的病变比例。误差线表示95%置信区间。中心显示了各自轴标签指定的指标的计算平均值。由于样本过少而无法可靠研究（≤10），子组的结果被省略并标记为“n/a”。e，说明了PANDA如何检测到在胸部CT中未扫描的病变。同一患者的两次扫描显示，即使未扫描到PDAC，PANDA也可以检测到扩张的胰腺导管（通常由PDAC引起）。f，PANDA可以检测到最初被放射科医生在胸部非对比增强CT（COVID-19预防CT）上错误检测的早期PDAC和转移癌。

Fig. 5 | Real-world clinical evaluation. a, The data collection process of two real-world datasets, that is, RW1 and RW2, for the original PANDA model and the upgraded PANDA Plus model, respectively. SOC, standard of care. b,c,e,f, The sensitivity, specificity and PPV on RW1 (n = 16,420) and RW2 (n = 4,110). The superscript * represents adjusted results if we exclude cases of (peri-)pancreatic findings. d, Proportion of different lesion types detected in RW1 (n = 179) and RW2 (n = 166). g, The comparison between PANDA and PANDA Plus on RW2 (n = 4,110). Error bars indicate 95% CI. The center shows the computed mean of the metric specified by the respective axis labels. The results of subgroups with too few samples to be studied reliably (≤10) are omitted and marked as ‘n/a’. h, Examples of (peri-)pancreatic findings (left) and the number detected by PANDA (right). CBD, common bile duct. i, Examples of cases in which the lesion was missed by the initial SOC but was detected by PANDA.

图5 | 真实世界临床评估。a，两个真实世界数据集（即RW1和RW2）的数据收集过程，分别用于原始PANDA模型和升级版PANDA Plus模型。SOC，标准护理。b、c、e、f，在RW1（n = 16,420）和RW2（n = 4,110）上的敏感性、特异性和阳性预测值。上标表示如果排除（周围）胰腺发现病例的结果进行调整。d，在RW1（n* = 179）和RW2（n = 166）中检测到不同病变类型的比例。g，PANDA和PANDA Plus在RW2（n = 4,110）上的比较。误差线表示95%置信区间。中心显示了各自轴标签指定的指标的计算平均值。由于样本过少而无法可靠研究（≤10），子组的结果被省略并标记为“n/a”。h，（周围）胰腺发现的示例（左）和PANDA检测到的数量（右）。CBD，胆总管。i，通过PANDA检测到但最初的SOC错过的病例的示例。