文献速递:基于SAM的医学图像分割--SAMUS:适应临床友好型和泛化的超声图像分割的Segment Anything模型

news2025/2/23 10:38:09



SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation

SAMUS:适应临床友好型和泛化的超声图像分割的Segment Anything模型



医学图像分割是一项关键技术,用于辨识和突出显示医学图像中的特定器官、组织和病变,是计算机辅助诊断系统的一个组成部分(刘等,2021)。为自动医学图像分割提出了众多深度学习模型,展示了巨大的潜力(Ronneberger, Fischer, 和 Brox,2015;吴等,2022)。然而,这些模型是为特定对象量身定做的,并且在应用于其他对象时需要重新训练,给临床使用带来了极大的不便。






Segment anything model (SAM), an eminent universal im age segmentation model, has recently gathered considerable attention within the domain of medical image segmenta tion. Despite the remarkable performance of SAM on natu ral images, it grapples with significant performance degrada tion and limited generalization when confronted with med ical images, particularly with those involving objects of low contrast, faint boundaries, intricate shapes, and diminu tive sizes. In this paper, we propose SAMUS, a universal model tailored for ultrasound image segmentation. In con trast to previous SAM-based universal models, SAMUS pur sues not only better generalization but also lower deploy ment cost, rendering it more suitable for clinical applications. Specifically, based on SAM, a parallel CNN branch is intro duced to inject local features into the ViT encoder through cross-branch attention for better medical image segmenta tion. Then, a position adapter and a feature adapter are de veloped to adapt SAM from natural to medical domains and from requiring large-size inputs (1024×1024) to small-size inputs (256×256) for more clinical-friendly deployment. A comprehensive ultrasound dataset, comprising about 30k im ages and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate SAMUS’s superiority against the state-of-the-art task-specific models and universal foundation models under both task-specific evaluation and generalization evaluation. Moreover, SAMUS is deployable on entry-level GPUs, as it has been liberated from the constraints of long sequence encoding. The code, data, and models will be released at https://github.com/xianlin7/SAMUS.






In this paper, we propose SAMUS, a universal founda tion model derived from SAM, for clinically-friendly and generalizable ultrasound image segmentation. Specifically, we present a parallel CNN branch image encoder, a fea ture adapter, a position adapter, and a cross-branch atten tion module to enrich the features for small-size objects and boundary areas while reducing GPU consumption. Further more, we construct a large ultrasound image dataset US30K, consisting of 30,106 images and 68,570 masks for eval uation and potential clinical usage. Experiments on both seeable and unseen domains demonstrate the outstanding segmentation ability and strong generalization ability of SAMUS. Moreover, the GPU memory cost of SAMUS is merely 28% of that required to train the entire SAM, and SAMUS is about 3× faster than SAM for inference.




As depicted in Fig. 8, the overall architecture of SAMUS is inherited from SAM, retaining the structure and param eters of the prompt encoder and the mask decoder without any adjustment. Comparatively, the image encoder is care fully modified to address the challenges of inadequate local features and excessive computational memory consumption, making it more suitable for clinically-friendly segmentation. Major modifications include reducing the input size, over

lapping the patch embedding, introducing adapters to the ViT branch, adding a CNN branch, and introducing cross branch attention (CBA). Specifically, the input spatial reso ution is scaled down from 1024 × 1024 pixels to 256 × 256 pixels, resulting in a substantial reduction in GPU memory cost due to the shorter input sequence in transformers. The overlapped patch embedding uses the same parameters as the patch embedding in SAM while its patch stride is half to the original stride, well keeping the information from patch boundaries. Adapters in the ViT branch include a position adapter and five feature adapters. The position adapter is to accommodate the global position embedding in shorter sequences due to the smaller input size. The first feature adapter follows the overlapped patch embedding to align in put features with the required feature distribution of the pre trained ViT image encoder. The remaining feature adapters are attached to the residual connections of the feed-forward network in the global transformer to fine-tune the pre-trained image encoder. In terms of the CNN branch, it is parallel to the ViT branch, providing complementary local information to the latter through the CBA module, which takes the ViT branch features as the query and builds global dependency with features from the CNN branch. It should be noted that CBA is only integrated into each global transformer. Finallythe outputs of both the two branches are combined as the fi nal image feature embedding of SAMUS.




Figure 1: Structure comparison of different SAM-based foundation models for medical image segmentation.



Figure 2: Overview of the proposed SAMUS.



Figure 3: Comparison between SAMUS and task-specific methods evaluated on seeable (marked in blue) and unseen datasets(marked in orange).



Figure 4: Qualitative comparisons between SAMUS and task-specific methods. From top to bottom are examples of segmenting thyroid nodule, breast cancer, and myocardium.



Figure 5: Qualitative comparisons between SAMUS and foundation models. From top to bottom are examples of seg menting thyroid nodule, breast cancer, and myocardium.



Figure 6: Segmentation and generalization ability compari son of our SAMUS and other foundation models on seeable (in light color) and unseen (in dark color) US30K data.



Figure 7: Comparison of SAMUS and foundation models on GPU memory cost, model parameters, computational com plexity, inference speed, performance, and generalization.




Table 1: Summary of the datasets in US30K. LV, MYO, and LA are short for the left ventricle, myocardium, and left atrium.



Table 2: Quantitative comparison of our SAMUS and SOTA task- specific methods on segmenting thyroid nodule (TN3K), breast cancer (BUSI), left ventricle (CAMUS-LV), myocardium (CAMUS-MYO), and left atrium (CAMUS- LA). The perfor mance is evaluated by the Dice score (%) and Hausdorff distance (HD). The best results are marked in bold.



Table 3: Quantitative comparison of our SAMUS and other foundation models on seeable US30K data. The performance is evaluated by the Dice score (%) and Hausdorff distance (HD).



Table 4: Ablation study on different component combinations of SAMUS on the thyroid nodule and breast cancer segmentation. F-Adapter and P-Adapter represent the feature adapter and the position adapter respectively



Table 5: Ablation study of different prompts. Pt1, pt2, and pt3 represent the single-point prompt in different (randomly deter mined) foreground positions. Multipoint prompts are generated by random sampling on the foreground areas.






1.在安全组是开放正确的端口好。8888要开,但是不只是开放8888,举个例子,这个,要开放17677这个端口号。 2.安全组要挂载到实例上,从三个点的进入点击管理实例,加到对应的…

对 Transformer 中位置编码 Position Encoding 的理解

目录 什么是位置编码 Position Encoding 一、将绝对位置编码加在 Transformer 的输入端 (Sinusoidal 位置编码或可学习位置编码) 二、将绝对位置编码乘在 q k v (RoPE 位置编码) 三、将相对位置编码加在注意力权重 (ALiBi 位置编码) 什么是位置编码 Position Encoding Tr…

wordcloud-1.9.2(1.9.3) for python 3.6/python3.X增强补丁

wordcloud-1.9.1开始无法在python3.6和海龟编辑器内正常使用,特做了一个whl 提供给python3.6使用。 另外我自己使用Python3.8 ,因此wordcloud-1.9.2-cp36-cp36-win_amd64.whl 和wordcloud-1.9.3-cp38-cp38-win_amd64.whl,词云图上有前20个单…


题目描述 题目分析 该题的答案交换次数与冒泡排序算法的交换次数应该是相同的。由于冒泡排序的时间复杂度为,不适合本题的数据大小。所以我们需要一个更加高效的数据结构。 分析题目,思路每轮是将最小的数字交换至最前,该数字的交换次数即它…

P5507 机关

题目背景 Steve成功降落后,在M星上发现了一扇大门,但是这扇大门是锁着的 题目描述 这扇门上有一个机关,上面一共有12个旋钮,每个旋钮有4个状态,将旋钮的状态用数字1到4表示 每个旋钮只能向一个方向旋转&#xff08…


Vue命名视图 命名视图 | Vue Router 如果要在 如何要在main区域里使用路由的话,整体区域是Layout,内涵Header和Nav以及Main path: /index,name: index,component: Layout, 若要只修改main区域的话,则取要加上v-if判断,来确实是…

小程序接入第三方信息流流程 下载SDK

由第三方信息流提供相应的SDK下载链接以及接入说明和开发文档或其他方式接入,如果第三方能支持小程序SDK,则不需要后面步骤,只需要提供相关开发文档和接入方式接口 接入SDK 后台开发人员接入第三方提供的SDK,并进行相关接口开发…


目录 一、子查询 1、语法: 2、以下例子均以图中两个表为基础 例子1:查询yun1班级大于85分的学生记录 例子2:将yun2班的学生记录放在一个单独的表中,叫yun2 例子3:教务处误把yun3班叫张丽的学生的成绩搞错了,应该为…


工作职责 性能测试工程师(Performance Testing Engineer)是负责评估和优化软件、应用程序或系统在不同负载和压力条件下的性能的专业人员。他们的工作职责包括以下几个方面: 性能测试计划:性能测试工程师与开发团队、产品团队和系…

云原生最佳实践系列 4:基于 MSE 和 SAE 的微服务部署与压测

方案概述 云原生应用平台为基于 Spring Cloud / Dubbo 开发的微服务应用提供了完善的能力支撑,例如服务注册发现、Serverless 无服务部署、实例弹性伸缩、微服务链路跟踪、全链路压力测试等,应用能够方便快捷的部署在阿里云上。 阿里云原生产品完全兼容…


1、迁移工具免安装,解压双击迁移工具,会进入如下界面:migration.rar 2、新建组–>创建新的服务 3、在创建好的服务下,新建数据库连接,建立源表和目标表 4、这一步是获取源库(Mysql数据库)与目标库(瀚高数据库&…


拓扑结构 资源已上传 acl访问控制列表 简单配置:控制目的ip地址 高级配置:源ip地址,目的ip地址等。 要求:拓扑三个vlan 10,20,30,通过设置acl使10网段可以访问20网段,但是不可以…


git最常用的命令与快捷操作说明 最常用的git三条命令1、git add .2、git commit -m "推送注释"3、git push origin 远程分支名:本地分支名 其他常用命令本地创建仓库分支删除本地指定分支切换本地分支合并本地分支拉取远程仓库指定分支代码过来合并推送代码到远程分支…


一、模版字符串 模版字符串,可以非常方便地引用变量,并合并出最终的字符串。 它允许你嵌入表达式,并通过${expression}语法来执行这些表达式。模板字符串使用反引号()而不是普通的单引号或双引号。 模板字符串有几个…


昨天的进度 写今天思路如何做评论表的增删该查评论表的增加 选择用户和商品 弹出框出现了问题 检查代码 结构没有问题 定义变量也没有问题 控制太中也没有报错信息 问题解决了 出现的问题在哪里定义的变量都有问题应该现在 setup 上面 定义一个 变量 const ref ref(fals…


写了几篇网络爬虫的博文后,有网友留言问Python爬虫如何入门?今天就来了解一下什么是爬虫,如何快速的上手Python爬虫。 一、什么是网络爬虫 网络爬虫,英文名称为Web Crawler或Spider,是一种通过程序在互联网上自动获取…


写在前面 Excel的学习心得分享,佛系更新。2024/03/26 目录 Excel每次都是以只读模式打开 给Excel设置“开机密码” 保护你的excel不让别人篡改 1.1Excel每次都是以只读模式打开 背景:如果有个工具,每天都有很多人使用,如果是…


读取appliaction.properties里面的端口数据 Value方式 RestController public class getText {//value方式读取文件Value("${server.port}")private String port;GetMapping("getPort")public String getPort(){return port;} }使用ConfigurationPropert…


标准提示 标准提示,是引导ChatGPT输出的一个简单方法,它提供了一个具体的任务让模型完成。 如果你要生成一篇新闻摘要。你只要发送指示词:“汇总这篇新闻”。 提示公式:生成[任务] 生成新闻文章的摘要: 任务&#x…


【每日跟读】常用英语500句 My apologies. 我向你道歉 Mayday. 求救 I’m begging you. 我求你了 Allow me. 让我来 That’s for sure. 那是肯定的 I wish I could. 我希望我能 Don’t leave me. 别离开我 You suck. 你太烂了 In that case. 这样的话 From now on. 从…