LD-Pruner、EdgeFusion(On-Device T2I)、FreeDiff、TextCenGen、MemLLM

news2025/1/22 12:53:01

本文首发于公众号:机器感知

https://mp.weixin.qq.com/s/KiyNfwYWU-wBiCO-hE9qkA

图片

The devil is in the object boundary: towards annotation-free instance  segmentation using Foundation Models

图片

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocab......

LD-Pruner: Efficient Pruning of Latent Diffusion Models using  Task-Agnostic Insights

图片

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster co......

EdgeFusion: On-Device Text-to-Image Generation

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices......

SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up

图片

Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability. ......

TriForce: Lossless Acceleration of Long Sequence Generation with  Hierarchical Speculative Decoding

图片

With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its......

FreeDiff: Progressive Frequency Truncation for Image Editing with  Diffusion Models

图片

Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have been developed to refine the editing guidance, these approaches necessitate modifications through complex network architecture and are limited to specific editing tasks. In this work, we re-examine the diffusion process and misalignment problem from a frequency perspective, revealing that, due to the power law of natural images and the decaying noise schedule, the denoising network primarily recovers low-frequency image components during the earlier timesteps and thus brings excessive low-frequency signals for editing. Leveraging this insight, we introduce a novel fine-tuning free a......

TextCenGen: Attention-Guided Text-Centric Background Adaptation for  Text-to-Image Generation

图片

Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with......

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale  Approach

图片

The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with ......

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

图片

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) $\unicode{x2013}$ though non-parametric $\unicode{x2013}$ has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our......

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

图片

Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. To address this issue, we propose a novel graph-aware neuron-level pruning method, Structured Neuron-level Pruning (SNP). SNP prunes neurons with less informative attention scores and eliminates redundancy among heads. Specifically, it prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. Our proposed method effectively compresses and accelerates Transformer-based models for both edge devices and server......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1608042.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

arm64-v8a、armeabi-v7a、x86、x86_64

当我们去GitHub下载应用的时候是不是经常很懵逼,就像下图一样,粗看一下如此多安装包到底要选择下载哪个且每种安装包到底有哪差别?毕竟因为自己一无所知,有时便随意下载一个后,安装时却报『此版本与你的系统不兼容』的…

TCP三次握手,但通俗理解

如何用通俗的语言来解释TCP(传输控制协议)的三次握手过程? 想象一下你正在和朋友电话沟通,但你们之间不是心灵感应,而是需要通过清晰地听到对方的声音来确认通话质量良好。TCP三次握手就像是在电话拨通之前&#xff0…

JavaScript 流程控制-循环

一、循环 二、 for 循环 重复执行的语句被称为循环体,能否继续重复执行,取决于循环的终止条件。 由循环体及循环的终止条件组成的语句被称为循环语句 1、语法结构 for 循环 主要用于把某些代码循环若干次,通常跟计数有关 for &#xff08…

C++-命名空间

C 命名空间是一种用于组织代码的机制,可以帮助避免命名冲突,提高代码的可读性和可维护性。命名空间将代码分组到逻辑单元中,允许在不同的代码单元中使用相同的名称而不会产生冲突。 命名空间通过将代码放置在一个命名空间内部来实现。在 C 中…

重构国内游戏账号登录系统的思考和实践

本期作者 背景 账号登录系统,作为游戏发行平台最重要的应用之一,在当前的发行平台的应用架构中,主要承载的是用户的账号注册、登录、实名、防沉迷、隐私合规、风控等职责。合规作为企业经营的生命线,同时,账号登录作为…

解决跨域和https不能访问的问题。

本地安装了项目,是一键安装的,安装之后还是apache的web服务器,有个视频服务用的是https的服务,要对这个项目进行二次开发,本地调用没问题,可是别人已调用就跨域。只能本地访问。 现在有两个问题:1.解决跨域问题 2.还要解决https访问的问题。 解决思路,用nginx 的ssl证…

语雀如何显示 Markdown 语法

正常的文章链接 https://www.yuque.com/TesterRoad/t554s28/eds3pfeffefw12x94wu8rwer8o 访问后是文章,无法复制 markdown 的内容 在链接后增加参数 /markdown?plaintrue&linebreakfalse&anchorfalse 直接显示代码

力扣经典150题第三十题:长度最小的子数组

目录 力扣经典150题解析之三十:长度最小的子数组1. 介绍2. 问题描述3. 示例4. 解题思路方法一:滑动窗口 5. 算法实现6. 复杂度分析7. 测试与验证测试用例设计测试结果分析 8. 进阶9. 总结10. 参考文献感谢阅读 力扣经典150题解析之三十:长度最…

2024面试软件测试,常见的面试题(上)

一、综合素质 1、自我介绍 面试官您好,我叫XXX,一直从事车载软件测试,负责最多的是中控方面。 以下是我的一些优势: 车载的测试流程我是熟练掌握的,且能够独立编写测试用例。 平时BUG提交会使用到Jira,类似…

计算机组成原理【CO】Ch5 中央处理器

目录 大纲 一条指令的执行 取指令 执行指令 数据传送类(mov、load、store) 运算类指令(加、减、乘、除、移位、与、或) 转移类指令(jmp、jxxx) 如何看懂注释 袁版注释⻛格(16年以后的真题&…

草稿 | word格式的网址索引

参考文献引用 参考文献上标设置:(改为上标的快捷键为ctrlshift“”) https://jingyan.baidu.com/article/cbcede07d786c743f50b4d47.html 多个参考文献一起引用: https://blog.csdn.net/neptune4751/article/details/119921187 交…

记录一下因为没等配置文件上传完就跑lg.sh导致f2.sh没起作用的原因

【背景说明】 我正在学习sgg的数仓采集项目,采集内容分为用户行为日志采集和MySQL的业务数据采集。 用户行为日志采集分为2个阶段: 阶段1:将日志文件的数据通过flume采集到kafka。我的这一步正常,kafka上有数据,即f…

Springboot配置文件(application.yml)的加载顺序

spring boot 启动会扫描一下位置的application.properties或者application.yml文件作为Spring boot的默认配置文件 file…/config/ file…/ classpath:/config classpath:/ 以上是按照优先级从高到低的顺序,所有位置的文件都会被加载,高优先级配置内容会…

Linux内核与基础命令学习总结

Linux操作系统 Linux操作系统博大精深,其中对线程,IO,文件系统等概念的实现都很有借鉴意义。 ​ 文件系统和VFS 文件系统的inode上面讲过了。VFS主要用于屏蔽底层的不同文件系统,比如接入网络中的nfs文件系统,亦或是w…

Pytest精通指南(20)日志收集器配置

文章目录 前言配置日志收集验证日志收集拓展-收集断言错误信息拓展-动态生成日志文件拓展-自定义封装日志收集类 前言 在pytest框架中,日志记录(logging)是一个强大的功能,它允许我们在测试期间记录信息、警告、错误等&#xff0c…

MongoDB扩大与谷歌云的合作,助推各行业客户部署和扩展新型应用

亮点前瞻 ● MongoDB Atlas Search Nodes现已在谷歌云(Google Cloud)上全面推出,让客户能够更轻松、更经济高效地隔离和扩展生成式AI工作负载 ●适用于MongoDB Atlas的Google Cloud Vertex AI扩展以及BigQuery与Spark的全新集成&#xff0c…

计算机软考流程介绍

笔者来介绍一下软考流程 1、考试简介 计算机技术与软件专业技术资格(水平)考试:简称 计算机软考 认证: 国家人力资源和社会保障部 国家工业和信息化部 目的: 科学、公正地对全国计算机与软件专业技术人员进行职业资格…

PolarDB MySQL 版 Serverless评测|一文带你体验什么是极致弹性|后续

PolarDB MySQL 版 Serverless评测|一文带你体验什么是极致弹性|后续 弹性压测三后续自动缩容全局一致性测试测评体验 在上一篇PolarDB MySQL 版 Serverless测评博文中:https://developer.aliyun.com/article/1385834 关于弹性压测三通过增加只读节点压测来观测到Ser…

流程图的新语法-mermaid的快速使用--推荐

chatgpt或者现在的大数据采用的流程图给出的代码如下: graph TD;A[接收客户请求] --> B[问题分类];B --> C[技术支持];B --> D[维修服务];C --> E[远程解决];C --> F[现场支持];D --> G[维修完成];G --> H[服务反馈];style A fill:#f9f,strok…

为什么学习C++之前学习C语言?

源地址:https://www.ctvol.com/c-cdevelopment/4074.html C 读作“C加加”,是“C Plus Plus”的简称。顾名思义,C 是在C语言的基础上增加新特性,玩出了新规则,所以叫“C Plus Plus”,还有C#,当然C#一般是针…