Title
题目
Predicting Invasiveness of Lung Adenocarcinoma at Chest CT with Deep Learning Ternary Classification Models
《使用深度学习三分类模型预测胸部CT中的肺腺癌侵袭性》
Background
背景
Preoperative discrimination of preinvasive, minimally invasive, and invasive adenocarcinoma at CT informs clinical management decisions but may be challenging for classifying pure ground-glass nodules (pGGNs). Deep learning (DL) may improve ternary classification.
术前在CT上区分非侵袭性、微小侵袭性和侵袭性腺癌对于临床管理决策具有重要意义,但对于纯磨玻璃结节(pGGNs)的分类可能存在挑战。深度学习(DL)可能会改进三分类的准确性。
Method
方法
In this retrospective study, six ternary models for classifying preinvasive, minimally invasive, and invasive adenocarcinoma were developed using a multicenter data set of lung nodules. The DL-based models were progressively modified through framework optimization, joint learning, and an adjudication strategy (simulating a multireader approach to resolving discordant nodule classifications), integrating two binary classification models with a ternary classification model to resolve discordant classifications sequentially. The six ternary models were then tested on an external data set of pGGNs imaged between December 2019 and January 2021. Diagnostic performance including accuracy, specificity, and sensitivity was assessed. The χ2 test was used to compare model performance in different subgroups stratified by clinical confounders.
在这项回顾性研究中,使用多中心肺结节数据集开发了六种用于分类非侵袭性、微小侵袭性和侵袭性腺癌的三分类模型。这些基于深度学习的模型通过框架优化、联合学习和裁定策略(模拟多阅片者方法以解决结节分类不一致)逐步改进,结合了两个二分类模型与一个三分类模型,依次解决分类不一致的问题。随后,在一个外部数据集中对这些三分类模型进行了测试,该数据集包含2019年12月至2021年1月期间拍摄的纯磨玻璃结节(pGGNs)。评估了包括准确率、特异性和敏感性在内的诊断性能,并使用χ²检验比较了按临床混杂因素分层的不同亚组中的模型表现。
Conclusion
结论
Combining framework optimization, joint learning, and an adjudication approach improved DL classification of adenocarcinoma invasiveness at chest CT.
结合框架优化、联合学习和裁定方法,提高了深度学习在胸部CT中对腺癌侵袭性分类的准确性。
Results
结果
A total of 4929 nodules from 4483 patients (mean age, 50.1 years ± 9.5 [SD]; 2806 female) were divided into training (n = 3384), validation (n = 579), and internal (n = 966) test sets. A total of 361 pGGNs from 281 patients (mean age, 55.2 years ± 11.1 [SD]; 186 female) formed the external test set. The proposed strategy improved DL model performance in external testing (P < .001). For classifying minimally invasive adenocarcinoma, the accuracy was 85% and 79%, sensitivity was 75% and 63%, and specificity was 89% and 85% for the model with adjudication (model 6) and the model without (model 3), respectively. Model 6 showed a relatively narrow range (maximum minus minimum) across diagnostic indexes (accuracy, 1.7%; sensitivity, 7.3%; specificity, 0.9%) compared with the other models (accuracy, 0.6%–10.8%; sensitivity, 14%–39.1%; specificity, 5.5%–17.9%).
共计4929个结节来自4483名患者(平均年龄50.1岁±9.5岁[标准差],其中2806名为女性),这些结节被分为训练集(n = 3384)、验证集(n = 579)和内部测试集(n = 966)。外部测试集由281名患者的361个纯磨玻璃结节(pGGNs)组成(平均年龄55.2岁±11.1岁[标准差],其中186名为女性)。结果表明,所提出的策略在外部测试中显著提高了深度学习模型的性能(P < .001)。对于微小侵袭性腺癌的分类,带有裁定策略的模型(模型6)的准确率为85%,灵敏度为75%,特异性为89%;相比之下,没有裁定策略的模型(模型3)的准确率为79%,灵敏度为63%,特异性为85%。与其他模型相比,模型6在诊断指标(准确率、灵敏度、特异性)上的范围相对较窄(准确率范围为1.7%;灵敏度范围为7.3%;特异性范围为0.9%),而其他模型的范围较宽(准确率范围为0.6%–10.8%;灵敏度范围为14%–39.1%;特异性范围为5.5%–17.9%)
Figure
图
Figure 1: Overview of the framework of the ternary classification models and upgrade processes. Baseline: Flowchart of the radiomics-based model (model 1) (orangedotted box). Upgrade 1: Framework of the deep learning (DL)–based model (model 2) (cyan-dotted box). Upgrade 2: Fusion model (model 3), generated by combining model 1 and model 2 through a joint learning method (purple-dotted box). Upgrade 3: Implementation of an adjudication strategy (simulating a multireader approach) in model 4 (upgraded from model 1), model 5 (upgraded from model 2), and model 6 (upgraded from model 3) (red-dotted box). In the proposed strategy, 3v1 represented binary task 1 (atypical adenomatous hyperplasia [AAH] and adenocarcinoma in situ [AIS] + minimally invasive adenocarcinoma [MIA] vs invasive adenocarcinoma [IAC]), 2v2 represented binary task 2 (AAH and AIS vs MIA + IAC), and 2v1v1 represented the ternary classification (AAH and AIS vs MIA vs IAC). AUC = area under the receiver operating characteristics curve, DFL = discriminative filter learning, 4D = four-dimensional, LASSO = least absolute shrinkage and selection operator, LD = linear discriminant, LR = logistic regression, MLP = multilayer perception, ROI = region of interest, SVM = support vector machines, Xgboost = extreme gradient boosting
图1: 三分类模型的框架及升级过程概览。基线:基于影像组学的模型(模型1)的流程图(橙色虚线框)。升级1:基于深度学习(DL)模型(模型2)的框架(青色虚线框)。升级2:通过联合学习方法将模型1和模型2结合生成的融合模型(模型3)(紫色虚线框)。升级3:在模型4(从模型1升级)、模型5(从模型2升级)和模型6(从模型3升级)中实施裁定策略(模拟多阅片者方法)(红色虚线框)。在所提出的策略中,3v1代表二分类任务1(非典型腺瘤样增生 [AAH] 和原位腺癌 [AIS] + 微小侵袭性腺癌 [MIA] 对比侵袭性腺癌 [IAC]),2v2代表二分类任务2(AAH 和 AIS 对比 MIA + IAC),2v1v1代表三分类(AAH 和 AIS 对比 MIA 对比 IAC)。AUC = 受试者工作特性曲线下面积,DFL = 判别滤波学习,4D = 四维,LASSO = 最小绝对收缩和选择算子,LD = 线性判别分析,LR = 逻辑回归,MLP = 多层感知器,ROI = 感兴趣区,SVM = 支持向量机,Xgboost = 极端梯度提升。
Figure 2: Flowchart of patient inclusion and exclusion criteria for (A) training, validation, and internal test sets and (B) external test set. pGGN = pure groundglass nodule.
图2: 患者纳入和排除标准的流程图,分别用于(A) 训练集、验证集和内部测试集,以及(B) 外部测试集。pGGN = 纯磨玻璃结节。
Figure 3: Receiver operating characteristic curves obtained via the average method of the six ternary classification models of adenocarcinoma invasiveness in (A) internal and (B) external test sets. Model 1 is a radiomics-based model; model 2, a deep learning–based model; model 3, a fusion model generated by combining model 1 and model 2 through a joint learning method; model 4, upgraded model 1 based on the adjudication strategy; model 5, upgraded from model 2 based on the adjudication strategy; model 6, upgraded from model 3 based on the adjudication strategy. Since models 4, 5, and 6 could only generate classification results instead of probabilities, these models generated a single point in the receiver operating characteristic space. AUC = area under the receiver operating characteristic curve
图3: 通过平均方法获得的六种腺癌侵袭性三分类模型的受试者工作特性曲线(ROC),分别用于(A) 内部测试集和(B) 外部测试集。模型1为基于放射组学的模型;模型2为基于深度学习的模型;模型3为通过联合学习方法结合模型1和模型2生成的融合模型;模型4为基于裁定策略升级的模型1;模型5为基于裁定策略升级的模型2;模型6为基于裁定策略升级的模型3。由于模型4、5和6只能生成分类结果而非概率,这些模型在受试者工作特性空间中只生成了一个单点。AUC = 受试者工作特性曲线下面积。
Figure 4: Radar maps of the five diagnostic indexes (accuracy, sensitivity, specificity, precision, and F1 score) in the external test set for (A) all six models and for (B) ternary classification of invasiveness. Each line in the radar map represents the performance of a certain model for a single classification (atypical adenomatous hyperplasia [AAH]/adenocarcinoma in situ [AIS], minimally invasive adenocarcinoma [MIA], or invasive adenocarcinoma [IAC]) according to the five diagnostic indexes, and the area enclosed by the line can be used to visually compare the performance for different classifications of different models. Model 1 is a radiomics-based model; model 2, a deep learning–based model; model 3, a fusion model generated by combining model 1 and model 2 through a joint learning method; model 4, upgraded from model 1 based on the adjudication strategy; model 5, upgraded from model 2 based on the adjudication strategy; model 6, upgraded from model 3 based on the adjudication strategy
图4: 外部测试集中五个诊断指标(准确率、灵敏度、特异性、精确度和F1得分)的雷达图,分别用于(A) 六种模型的整体表现和(B) 侵袭性三分类。雷达图中的每一条线表示某一模型在单一分类(非典型腺瘤样增生 [AAH] / 原位腺癌 [AIS]、微小侵袭性腺癌 [MIA] 或侵袭性腺癌 [IAC])下,根据五个诊断指标的表现,线条所围成的区域可以用来直观地比较不同模型在不同分类上的性能。模型1为基于放射组学的模型;模型2为基于深度学习的模型;模型3为通过联合学习方法结合模型1和模型2生成的融合模型;模型4为基于裁定策略升级的模型1;模型5为基于裁定策略升级的模型2;模型6为基于裁定策略升级的模型3.
Figure 5: Confusion matrices for ternary classification of invasiveness in the external test set for (A) conventional ternary models including model 1 (a radiomics-based model), model 2 (a deep learning–based model), and model 3 (a fusion model generated by combining model 1 and model 2) and for (B) ternary models designed with the adjudication strategy including model 4 (upgraded from model 1), model 5 (upgraded from model 2), and model 6 (upgraded from model 3). The density of each color in confusion matrices displays the number of nodules in certain classification; a darker color indicates a greater number. (B) For the ternary models augmented with the adjudication strategy, the confusion matrices were modified as follows. Top row: The rows of the matrices represent the result of binary classification task 1, and the columns represent the result of binary classification task 2. The four large cells at the intersections of the rows and columns show the classification results (circled in the rounded square) of atypical adenomatous hyperplasia (AAH)/adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), invasive adenocarcinoma (IAC), and paradoxical nodules based on the fusion rule of binary task 1 and binary task 2. Then, each large cell is further divided into three parts according to the actual pathologic classification of the lesion: the upper left number is pathologic AAH/AIS, upper right number is pathologic MIA, and the bottom is pathologic IAC. Middle row: Confusion matrices of the ternary classification model that was used to more accurately identify the paradoxical nodules (ie, nodules simultaneously predicted as an IAC in binary task 1 and as an AAH/AIS in binary task 2). M = classification results generated by model, P = classification results at pathologic examination, T1 = binary classification task 1, T2 = binary classification task 2.
图5: 外部测试集中三分类模型对侵袭性的混淆矩阵,分别用于(A) 常规三分类模型(包括模型1(基于放射组学的模型)、模型2(基于深度学习的模型)和模型3(通过结合模型1和模型2生成的融合模型))和(B) 设计了裁定策略的三分类模型(包括模型4(从模型1升级)、模型5(从模型2升级)和模型6(从模型3升级)。
在混淆矩阵中,每种颜色的密度显示了某一分类中的结节数量;颜色越深,数量越多。(B) 对于采用裁定策略的三分类模型,混淆矩阵的修改如下:
顶部行:矩阵的行表示二分类任务1的结果,列表示二分类任务2的结果。四个大单元格在行列交点处显示了基于二分类任务1和任务2融合规则的非典型腺瘤样增生(AAH)/原位腺癌(AIS)、微小侵袭性腺癌(MIA)、侵袭性腺癌(IAC)和矛盾结节的分类结果(用圆角方框圈出)。每个大单元格进一步根据病理分类进行划分:左上角的数字为病理AAH/AIS,右上角的数字为病理MIA,底部的数字为病理IAC。
中间行:混淆矩阵用于更准确地识别矛盾结节(即在二分类任务1中预测为IAC而在二分类任务2中预测为AAH/AIS的结节)。M = 模型生成的分类结果,P = 病理检查结果,T1 = 二分类任务1,T2 = 二分类任务2。
Figure 6: Heat maps of model 5 (upgraded from the conventional model 2 based on the adjudication strategy) generated by gradient-weighted class activation mapping, or Grad-CAM. Two examples are used to illustrate the mechanism of model classification. (A) The two binary classification models detect preinvasive features (binary classification task 1, model 3v1) and invasive features (binary classification task 2, model 2v2) separately. (B) Complex minimally invasive adenocarcinoma (MIA) nodule correctly classified with model 5 using the proposed strategy but incorrectly classified with the conventional ternary classification model (model 2). AAH = atypical adenomatous hyperplasia, AIS = adenocarcinoma in situ, IAC = invasive adenocarcinoma.
图6: 通过梯度加权类激活映射(Grad-CAM)生成的模型5的热图(模型5是基于裁定策略对常规模型2进行升级的模型)。使用两个示例来说明模型分类的机制。(A) 两个二分类模型分别检测非侵袭性特征(二分类任务1,模型3v1)和侵袭性特征(二分类任务2,模型2v2)。(B) 复杂的微小侵袭性腺癌(MIA)结节在使用所提出的策略的模型5中被正确分类,但在使用常规三分类模型(模型2)中被错误分类。AAH = 非典型腺瘤样增生,AIS = 原位腺癌,IAC = 侵袭性腺癌。
Table
表
Table 1: Baseline Characteristics of Patients and Pulmonary Nodules in the Data Sets
表1: 数据集中患者和肺部结节的基线特征
Table 2: Diagnostic Indexes of the Six Models in the External Test Set
表2: 六个模型在外部测试集中的诊断指标
Table 3: Range of Diagnostic Indices in Ternary Classification for the Six Models in the External Test Set
表3: 外部测试集中六种模型的三分类诊断指标范围