通过深度点图表示的隐式场实现肺树结构的高效解剖标注文献速递-生成式模型与transformer在医学影像中的应用

Title

题目

Efficient anatomical labeling of pulmonary tree structures via deeppoint-graph representation-based implicit fields

通过深度点图表示的隐式场实现肺树结构的高效解剖标注

文献速递介绍

近年来，肺部疾病（Decramer等，2008；Marcus等，2000；Nunes等，2017）已成为全球死亡的主要原因（Quaderi和Hurst，2018），肺部研究因此受到了越来越多的关注。在与肺部疾病相关的研究中，通过医学影像了解肺部解剖结构非常重要，因为肺部疾病与从肺部CT图像中推导出的度量指标之间有已知的关联（Charbonnier等，2019；Kirby等，2019；Qin等，2019；Shaw等，2002；Loud等，2001；Shen等，2014）。一些研究也强调了拓扑保持和对关键位置的学习（Yu等，2023；Tan等，2021；Xie等，2022）。然而，从点云或图中提取可用的表面并非易事。

考虑到上述挑战，如图2（e）所示，我们希望开发一种优化的自动化解决方案，该方案在计算上高效，并且能够重建具有连续表面定义和拓扑保持的3D模型，特别关注关键结构点。为此，我们提出了一种新方法，使得骨架图上下文可以深入注入到基于点的表示中，并采用隐式方法生成特征场，从而实现高效且去噪的密集体积重建。所提出的隐式点图网络（IPGN）包括点和图编码器，结合多个点图融合层作为点图网络（PGN）进行深度特征融合，并使用隐式点模块来建模3D中的隐式表面，允许在任意位置进行快速分类推理。得益于隐式表示的灵活性，经过肺部树标注训练的IPGN可以通过简单修改推理方法（第六节）进一步扩展到肺部分段重建。

复杂的树状肺部结构——气道、动脉和静脉，如图1所示——具有较高的分支因子，并在肺部呼吸系统中发挥重要作用。在体积图像中对肺树进行多类语义分割，其中每个类别代表根据肺部分段的医学定义划分的特定树枝或分支，是建模其复杂性的有效方法，挑战主要体现在分叉位置和远端分支（Yu等，2023）。在肺树标注中，衍生的定量特征（Kirby等，2019；Smith等，2014；Shaw等，2002）不仅与肺部疾病和肺部相关的医学应用有关（Loud等，2001；Shen等，2014），而且对外科导航至关重要（Qin等，2019）。本研究集中于调查和开发深度学习方法，用于高效准确的肺部气道、动脉和静脉的解剖标注。

在深度学习方法中，卷积神经网络（CNN）已成为语义分割的事实标准方法（Zhou等，2018，2019）。它们的优点之一是能够生成具有明确表面的体积。然而，当处理大型3D体积时，CNN计算需求较高，并且在降低分辨率（图2(a)）或在局部补丁上操作时（图2(b)），往往会导致缺乏细节或全球上下文，从而得到不令人满意的结果。相比之下，点云是一种稀疏的3D形状表示，没有可用的表面，但其计算需求较低，同时能够保持全局结构（图2(c)）。此外，由于目标本身具有树状结构，因此图建模（图2(d)）成为一种有效的方式。

Aastract

摘要

Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require,among other things, a better understanding of the complex 3D tree-shaped structures within the pulmonarysystem, such as airways, arteries, and veins. Traditional approaches using high-resolution image stacks andstandard CNNs on dense voxel grids face challenges in computational efficiency, limited resolution, localcontext, and inadequate preservation of shape topology. Our method addresses these issues by shifting fromdense voxel to sparse point representation, offering better memory efficiency and global context utilization.However, the inherent sparsity in point representation can lead to a loss of crucial connectivity in tree-shapedstructures. To mitigate this, we introduce graph learning on skeletonized structures, incorporating differentiablefeature fusion for improved topology and long-distance context capture. Furthermore, we employ an implicitfunction for efficient conversion of sparse representations into dense reconstructions end-to-end. The proposedmethod not only delivers state-of-the-art performance in labeling accuracy, both overall and at key locations,but also enables efficient inference and the generation of closed surface shapes. Addressing data scarcity inthis field, we have also curated a comprehensive dataset to validate our approach.

肺部疾病在全球范围内位居主要死亡原因之列。治疗这些疾病需要更深入地理解肺部系统内复杂的三维树状结构，如气道、动脉和静脉。传统方法依赖于高分辨率图像堆栈和标准卷积神经网络（CNN）在稠密体素网格上，面临计算效率、分辨率、局部上下文以及形状拓扑保持不足等挑战。我们的方法通过从稠密体素表示转向稀疏点表示来解决这些问题，提供了更好的内存效率和全局上下文利用。然而，点表示的固有稀疏性可能导致树状结构中重要连接性的丧失。为了解决这个问题，我们引入了基于骨架结构的图学习，结合可微分特征融合，以改进拓扑结构和长距离上下文的捕捉。此外，我们采用隐式函数来高效地将稀疏表示转换为密集重建，进行端到端处理。所提出的方法不仅在标注准确性（无论是整体标注还是关键位置的标注）上实现了最先进的性能，还实现了高效推理和闭合表面形状的生成。为了应对该领域的数据稀缺问题，我们还整理了一个全面的数据集以验证我们的方法。

Method

方法

Given a binary volumetric image of a pulmonary tree (Fig. 6(a)), agraph (Fig. 6(b)) is constructed with VesselVio (Bumgarner and Nelson,

from the original volume following the procedure in 3.2.2, anda set of points (6𝑘) are randomly sampled from the foreground voxelsto construct a point cloud (Fig. 6(c), referred to as original pointcloud). While the original point cloud is a sparse and downsampledversion of the foreground volume, the graph represents a skeleton ofthe pulmonary tree.We first introduce the general notation rule for both point and graphelements. While the coordinates of 𝑀 points and 𝑁 graph nodes arerepresented as P and G, single point or graph element is expressed as pand 𝐠, where P = {p1 , p2 , …, p𝑚}, G = {𝐠1 , 𝐠2 ,… , 𝐠𝑛 }. The superscriptnotation p (𝑖) represents an element’s feature at the 𝑖th network layer.At input, the 3-dimensional {𝑥, 𝑦, 𝑧} point coordinates, P ∈ R𝑀×3and graph nodes, G ∈ R𝑁×3 are utilized as initial feature. We use a pointneural network, and a graph neural network as initial feature encoders,from which we extract a 128-dimensional intermediate feature for eachpoint and graph node, expressed as P (0) ∈ R𝑀×128 and G (0) ∈ R𝑁×128 .Subsequently, initial features from both branches P (0) and G (0)are incorporated within one or multiple Point-Graph Fusion layers,which allow for two-way feature integration based on feature propagation (Rossi et al., 2021) and ball-query&grouping (Qi et al., 2017).Let the input to a Point-Graph Fusion layer be defined as P (𝑖−1) andG (𝑖−1), the feature out of the fusion layer is P (𝑖) and G (𝑖) . The last PointGraph Fusion layer outputs P (𝑙) and G (𝑙*) after 𝑙 Point-Graph Fusionlayers for deep feature fusion. Finally, a lightweight MLP network anda GNN projects the fusion feature to 19-dimensional vectors for graph(Fig. 6(d)) and point predictions (Fig. 6(e)).An Implicit Point Module is further introduced to reconstruct thedense volumes, which consists of a feature propagation process and anMLP network. As features are extracted by the Point-Graph Network,the Implicit Point Module leverages the extracted multi-stage pointfeatures for fast dense volume segmentation. We define the points thatare not necessarily in the original point cloud, but are in the foregroundof the tree volume, thus requiring classification, as query points. Givena query point p𝑞 with at arbitrary coordinates, the module locates p𝑞 ’s𝑘-nearest point elements from the original point cloud: {p1 , p2 , … , p𝑘 },and extracts their multi-stage features {𝐳1 , 𝐳2 , …, 𝐳𝑘 } from the backboneNetwork for feature propagation into a multi-stage representation 𝐳𝑞of the query point p𝑞 . After propagating the point feature 𝐳𝑞 , theMLP network  is utilized to make class predictions. By applying thisprocess to all foreground points, we can efficiently generate a densevolume reconstruction (Fig. 6(f)).To avoid naming ambiguity, we refer to the aforementioned complete network as Implicit Point-Graph Network (IPGN), and that sans theimplicit part as the backbone network, Point-Graph Network (PGN).

给定一个肺部树的二值体积图像（图6(a)），首先使用VesselVio（Bumgarner和Nelson，2022）从原始体积中构建一个图（图6(b)），按照3.2.2节的过程，并从前景体素中随机采样一组点（6𝑘个点）来构建点云（图6(c)，称为原始点云）。原始点云是前景体积的稀疏和下采样版本，而图则表示肺部树的骨架。

接下来，我们首先介绍点和图元素的一般表示法规则。点的坐标和图节点的坐标分别表示为P和G，其中单个点或图元素表示为p和𝐠，并且P = {p₁, p₂, …, pₘ}，G = {𝐠₁, 𝐠₂, …, 𝐠ₙ}。上标p(𝑖)表示元素在第𝑖层网络中的特征。

在输入阶段，使用三维点坐标{𝑥, 𝑦, 𝑧}，P ∈ Rᵐ×3 和图节点G ∈ Rⁿ×3 作为初始特征。我们使用点神经网络和图神经网络作为初始特征编码器，从中提取每个点和图节点的128维中间特征，表示为P(0) ∈ Rᵐ×128 和 G(0) ∈ Rⁿ×128。

随后，来自两个分支的初始特征P(0) 和 G(0) 被整合到一个或多个点图融合层中，允许基于特征传播（Rossi等，2021）和球查询与分组（Qi等，2017）进行双向特征融合。设定点图融合层的输入为P(𝑖−1) 和 G(𝑖−1)，经过融合层后的输出特征为P(𝑖) 和 G(𝑖)。最后一个点图融合层在深度特征融合之后输出P(𝑙) 和 G(𝑙)。

最终，一个轻量级的MLP网络和图神经网络（GNN）将融合后的特征投影为19维的向量，用于图预测（图6(d)）和点预测（图6(e)）。

进一步引入了一个隐式点模块（Implicit Point Module）来重建稠密体积，该模块由特征传播过程和MLP网络组成。当通过点图网络提取特征时，隐式点模块利用提取的多阶段点特征进行快速的稠密体积分割。我们定义了那些不一定位于原始点云中的点，而是位于树体积前景中的点，需要分类，称为查询点。给定一个查询点pₖ，其坐标为任意位置，模块会定位到pₖ的𝑘个最近邻点元素：{p₁, p₂, …, pₖ}，并从骨干网络中提取其多阶段特征{𝐳₁, 𝐳₂, …, 𝐳ₖ}，通过特征传播将其转化为查询点pₖ的多阶段表示𝐳ₖ。在传播点特征𝐳ₖ之后，MLP网络被用来进行类别预测。通过将此过程应用于所有前景点，我们可以高效地生成稠密体积重建（(f)）。

为了避免命名上的歧义，我们将上述完整网络称为隐式点图网络（IPGN），不包括隐式部分的称为骨干网络点图网络（PGN）。

Conclusion

结论

In conclusion, we take an experimentally comprehensive deep-diveinto pulmonary tree segmentation based on the compiled PTL dataset.A novel architecture Implicit Point-Graph Network (IPGN) is presentedfor accurate and efficient pulmonary tree segmentation. Our methodleverages a dual-branch point-graph fusion model to effectively capturethe complex branching structure of the respiratory system. Extensiveexperiment results demonstrate that by implicit modeling on pointgraph features, the proposed model achieves state-of-the-art segmentation quality with minimum computation cost for practical dense volumereconstruction. The advancements made in this study could potentiallyenhance the diagnosis, management, and treatment of pulmonary diseases, ultimately improving patient outcomes in this critical area ofhealthcare.

总之，我们对基于编制的PTL数据集的肺树分割进行了实验性的深入研究。我们提出了一种新型架构——隐式点图网络（IPGN），用于实现准确高效的肺树分割。我们的方法利用双分支点图融合模型，有效捕捉呼吸系统复杂的分支结构。大量实验结果表明，通过对点图特征进行隐式建模，所提模型在最小计算成本下实现了最先进的分割质量，能够进行实用的密集体积重建。本研究取得的进展有望提升肺部疾病的诊断、管理和治疗，最终改善患者在这一关键医疗领域的治疗效果。

Figure

图

Fig. 1. Pulmonary Tree Labeling. (a) The pulmonary tree consists of three anatomicstructures (airway, artery and vein). (b) Given a binary volume representing a treestructure as input, we label each voxel into one of 19 classes (18 pulmonary segments1 background out of lung) based on branching regions.

图1. 肺树标注。(a) 肺树由三种解剖结构组成（气道、动脉和静脉）。(b) 给定一个表示树状结构的二进制体积作为输入，我们根据分支区域将每个体素标注为19个类别中的一个（18个肺部分段和1个肺外背景）。

Fig. 2. A Comparison of Data Representation for Pulmonary Tree Labeling. The CNN-based voxel methods are either low-resolution (down-sampled) or local (sliding-window).The standard sparse representation like point and graph is global but it is not trivial to reconstruct high-quality dense volume. Our method that combines point, graph, and implicitfields produces high-quality dense reconstruction efficiently

图2. 肺树标注数据表示的比较。基于CNN的体素方法要么是低分辨率（下采样），要么是局部（滑动窗口）。标准的稀疏表示，如点和图，是全局的，但重建高质量的密集体积并非易事。我们的方法结合了点、图和隐式场，能够高效地生成高质量的密集重建。

Fig. 3. Anatomy of Pulmonary Trees and Pulmonary Segments. Each pulmonarytree branch corresponds to a pulmonary segment. The intersegmental vein is highlightedin red, which lies along the pulmonary segment border.

图3. 肺树解剖与肺部分段。每个肺树分支对应一个肺部分段。红色高亮显示的是段间静脉，它位于肺部分段边界上

Fig. 4. Sample Illustration with Voxel-level and Graph-level Metrics. Due to thelarge number of voxels, voxel-level metric often overlooks labeling at key regions. Giventwo test samples with similar voxel performance, the sample with better graph-levelperformance (Case A and B) tends to perform better at key points than its counterpart(Case A’ and B’)

图4. 体素级和图级度量的示例说明。由于体素数量庞大，体素级度量通常忽视了关键区域的标注。给定两个测试样本，它们具有相似的体素表现，但图级度量表现更好的样本（案例A和B）在关键点上的表现往往优于其对应样本（案例A'和B'）。

Fig. 5. Skeleton Graphs Directly from Bumgarner and Nelson (2022) and Two Types**of Imperfections. (1) The centerline (CL) points, which represent the path of thetree branch as discrete and connected coordinates, might contain cliques. (2) Thecoordinates of leaf graph nodes might fall outside the foreground volume

图5. 直接来自Bumgarner和Nelson (2022)的骨架图和两种缺陷类型。(1) 中线（CL）点，表示树分支路径的离散且连接的坐标，可能包含团簇。(2) 叶图节点的坐标可能位于前景体积之外。

Fig. 6. Overview of the Proposed Implicit Point-Graph Network (IPGN) for Pulmonary Tree Labeling. The pipeline pre-processes dense volume to graph and point cloudinput. The Point-Graph Fusion layers enhance point features with graph context with differentiable feature fusion learning, and the Implicit Point Module produces dense predictionbased on the deep sparse representation efficiently

图6. 提出的隐式点图网络（IPGN）用于肺树标注的概述。该流程将密集体积预处理为图和点云输入。点图融合层通过可微分特征融合学习增强点特征，并结合图上下文；隐式点模块基于深度稀疏表示高效地生成密集预测。

Fig. 7. The Detailed Operation of a Point-Graph Fusion layer. The figure only shows the operations for one node, and one point, which are highlighted as Node/Point ofinterest. In reality, this process is parallel for all nodes and points. Here, 1 and 2 are MLP networks while  represents a graph neural network.

图7. 点图融合层的详细操作。图中仅显示了一个节点和一个点的操作，它们被高亮显示为感兴趣的节点/点。实际上，这个过程对所有节点和点是并行进行的。这里，1 和 2 是多层感知机（MLP）网络， 表示图神经网络。

Fig. 8. Implicit Point Module. For any query point, the Implicit Point Moduleconsumes multi-stage features from a Point-Graph Network with feature propagationand a neural network to provide a label

图8. 隐式点模块。对于任何查询点，隐式点模块通过特征传播和神经网络，利用来自点图网络的多阶段特征，为该点提供标签。

Fig. 9. Visualization of Segmentation Results. This figure displays segmentation results using GAT (Velickovic et al., 2017) w/ post-processing (Section 5.1.3), PointNet++ (Qiet al., 2017) and IPGN (with GAT and PointNet++ backbones) methods along with the ground truth. The initial-form predictions (graph or point cloud) before dense volumeprediction are also presented. Below each dense volume reconstruction, we show voxel (V) dice and node (N) dice results

图9. 分割结果的可视化。此图展示了使用GAT（Velickovic等，2017）带后处理（第5.1.3节）、PointNet++（Qi等，2017）和IPGN（结合GAT和PointNet++骨架）方法的分割结果，以及真实标签。还展示了在密集体积预测之前的初始预测形式（图或点云）。在每个密集体积重建下方，显示了体素（V）和节点（N）的Dice系数结果。

Fig. 10. Pulmonary Segments Visualization. A sample reconstruction of pulmonarysegments based on the airway, artery, and vein is presented against ground truth inaxial, sagittal and coronal views.

图10. 肺部分段可视化。展示了基于气道、动脉和静脉的肺部分段重建样本，并与真实标签在轴向、矢状面和冠状面视图中进行对比。

Table

表

Table 1Notations and corresponding descriptions. This table presents the notations used to represent point and graph elements within the architecture.The notations are organized into two stages: Point-Graph Feature Fusion and Implicit Dense Volume Reconstruction

表1. 符号及对应描述。此表列出了用于表示架构中点和图元素的符号。符号分为两个阶段：点图特征融合和隐式密集体积重建。

Table 2Model performance in dice score at voxel and graph level on the PTL dataset. Baseline methods using different feature representations are presented to compare againstthe proposed methods on the Pulmonary Tree Labeling (PTL) dataset. PGN=Point-Graph Network without implicit fields; IPGN=Implicit Point Graph Network; KP=key point.: Graph-level predictions from Voxel/Point Networks are acquired by inferencing on the node and edge locations.

表2PTL数据集上体素级和图级模型性能的Dice系数。展示了使用不同特征表示的基准方法，以便与提出的方法进行比较，数据集为肺树标注（PTL）。PGN = 无隐式字段的点图网络；IPGN = 隐式点图网络；KP = 关键点。：从体素/点网络的图级预测通过在节点和边位置进行推理得到。

Table 3Inference speed and segmentation metrics for dense volume reconstruction. This table compares the dense volume segmentation test time and dice score across voxel, point,graph-based, and point-graph fusion methods. The test times are measured in seconds while dice score presents segmentation quality.

表3. 密集体积重建的推理速度和分割指标。此表比较了体素、点、图和点图融合方法在密集体积分割测试中的时间和Dice系数。测试时间以秒为单位，Dice系数表示分割质量。

Table 4Impact of input for graph modeling. We show the results using MLP and GAT(Velickovic et al., 2017) backbones on different feature inputs

表4. 输入对图建模的影响。我们展示了在不同特征输入下，使用MLP和GAT（Velickovic等，2017）骨架的结果。

Table 5Multi-stage point feature input for implicit point module.This table presents thedice results using different combinations of concatenated multi-stage features fromthe backbone network as input to the Point Implicit Module. (Point Encoder: PointTransformer (Zhao et al., 2021); Graph Encoder: GAT (Velickovic et al., 2017)).

表5. 隐式点模块的多阶段点特征输入。此表展示了使用来自骨干网络的不同组合拼接的多阶段特征作为输入到点隐式模块时的Dice系数结果。（点编码器：PointTransformer（Zhao等，2021）；图编码器：GAT（Velickovic等，2017））。

Table 6Model performance on the original/corrupted data. We present experiment resultson the original and corrupted version (Weng et al., 2023) of the PTL dataset for PointTransformer (Zhao et al., 2021) and the proposed IPGN with voxel-level dice score (%)as metric.

表6. 原始数据/损坏数据上的模型性能。据集上进行的实验结果，比较了PointTransformer（Zhao等，2021）和提出的IPGN模型，使用体素级Dice系数（%）作为指标。

Table 7Model complexity. This table highlights the parameter counts for key competitive methods. The performance is represented as voxel-level dice(%)

表7. 模型复杂度。此表展示了关键竞争方法的参数数量。性能以体素级Dice系数（%）表示。