PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

news2024/12/26 6:40:41

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1649766.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【typescript 小秘籍 - 类型自动推导】

今天发现个typescript的小技巧,原来在vscode里面 typescript是可以根据数据,自动推导其类型的,这样就不用自己去手敲定义了。比如 鼠标移动到person上,可以看到 其自动推导了person的类型 然后直接复制下来 直接使用即可。

Redis学习(十)|使用消息队列的重试机制实现 MySQL 和 Redis 的数据一致性

文章目录 介绍原理整体方案实现步骤示例代码总结其他:Kafka 重试策略配置1. 生产者重试策略配置2. 消费者重试策略配置 介绍 在分布式系统中,保持 MySQL 和 Redis 之间的数据一致性是至关重要的。为了确保数据的一致性,我们通常采取先更新数…

【Three.js基础学习】15.scroll-based-animation

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档 前言 课程要点 结合html等场景 做滚动动画 1.遇到的问题, 在向下滚动时,下方会显白(部分浏览器) 解决:alpha:true …

什么是多模态大模型,有了大模型,为什么还要多模态大模型?

随着人工智能技术的愈演愈烈,其技术可以说是日新月异,每隔一段时间就会有新的技术和理念被创造出来;而多模态大模型也是其中之一。 什么是多模态 想弄明白什么是多模态大模型,那么首先就要弄明白什么是多模态。 简单来说&#x…

shell常用文件处理命令

1. 解压 1.1 tar 和 gz 文件 如果你有一个 .tar 文件,你可以使用以下命令来解压: tar -xvf your_file.tar在这个命令中,-x 表示解压缩,-v 表示详细输出(可选),-f 后面跟着要解压的文件名。 如果你的 .tar 文件同时被 gzip 压缩了(即 .tar.gz 文件),你可以使用以下…

PHP 匿名函数和闭包在数据结构中的应用

匿名函数和闭包在数据结构处理中的应用php 中的匿名函数和闭包可用于处理数组、链表和队列等数据结构。针对数组,匿名函数可用于过滤元素;针对链表,闭包可用于创建节点;针对队列,匿名函数和闭包可实现 fifo 队列操作。…

2005-2021年全国各地级市生态环境注意力/环保注意力数据(根据政府报告文本词频统计)

2005-2021年全国各地级市生态环境注意力/环保注意力数据(根据政府报告文本词频统计) 2005-2021年全国各地级市生态环境注意力/环保注意力数据(根据政府报告文本词频统计) 1、时间:2005-2021年 2、范围:2…

初始Linux(基础命令)

前言: 我们不能总沉浸在编程语言中,虽然代码能力提升了,但是也只是开胃小菜。我们要朝着更高的方向发展。 最近小编一直在刷力扣,以至于博客更新的比较少。今天就带各位开始学习全新的知识——Linux.至于为啥要学? Lin…

基于FPGA的多路彩灯控制器VHDL代码Quartus仿真

名称:基于FPGA的多路彩灯控制器VHDL代码Quartus仿真(文末获取) 软件:Quartus 语言:VHDL 代码功能: 多路彩灯控制器 综合训练内容要求 设计一台基于FPGA的多路彩灯控制器的设计。要求如下 1.彩灯从左…

IOS自动化—将WDA打包ipa批量安装驱动

前言 CSDN: ios自动化-Xcode、WebDriverAgent环境部署 ios获取原生系统应用的包 如果Mac电脑没有配置好Xcode相关环境,可以参考以上文章。 必要条件 Mac电脑,OS版本在12.4及以上(低于这个版本无法安装Xcode14,装不了Xcode14就…

20230507,LIST容器

学了又忘学了又忘,明知道会忘又不想复习又还得学 LIST容器 1.1 基本概念 链表是一种物理存储单元上非连续的存储结构,数据元素的逻辑顺序是通过链表中的指针链接实现的;链表由一系列结点组成 结点:一个是存储数据元素的数据域&a…

《ESP8266通信指南》12-Lua 固件烧录

往期 《ESP8266通信指南》11-Lua开发环境配置-CSDN博客 《ESP8266通信指南》10-MQTT通信(Arduino开发)-CSDN博客 《ESP8266通信指南》9-TCP通信(Arudino开发)-CSDN博客 《ESP8266通信指南》8-连接WIFI(Arduino开发…

循环链表 -- c语言实现

#pragma once // 带头双向循环链表增删查改实现 #include<stdlib.h> #include<stdio.h> #include<assert.h>typedef int LTDataType;typedef struct ListNode {LTDataType data;struct ListNode* next;struct ListNode* prev; }ListNode;//双链表申请一个新节…

【Python】PTA 查验身份

知识点&#xff1a; 1.这里的加权求和就是指每一位乘以题目给的对应位置上的数字 在python中&#xff0c;对于int(10)这样的转换而来的直接是整数10&#xff0c;但是在c语言中会转换成ASCII值&#xff0c;所以要特别注意 2.本题中有两种情况是错误的&#xff0c;就是要直接输…

DES加密解密算法(简单、易懂、超级详细)

目录 一、基础补充 二、什么是DES算法 &#xff08;1&#xff09;对称加密算法 &#xff08;2&#xff09;非对称加密算法 &#xff08;3&#xff09;对称加密算法的应用 三、DES算法的基础操作步骤 1.明文的加密整体过程 2.F轮函数解析 3.密钥的形成过程 四、AC代码 五、D…

自然语言(NLP)

It’s time for us to learn how to analyse natural language documents, using Natural Language Processing (NLP). We’ll be focusing on the Hugging Face ecosystem, especially the Transformers library, and the vast collection of pretrained NLP models. Our proj…

JuiceFS v1.2-beta1,Gateway 升级,多用户场景权限管理更灵活

JuiceFS v1.2-beta1 今天正式发布。在这个版本中&#xff0c;除了进行了大量使用体验优化和 bug 修复外&#xff0c;新增三个特性&#xff1a; Gateway 功能扩展&#xff1a;新增了“身份和访问管理&#xff08;Identity and Access Management&#xff0c;IAM&#xff09;” 与…

WHM中如何查看磁盘使用情况

今日看到有用户在论坛留言反馈他买了Hostease 独立服务器并购买cPanel面板&#xff0c;想要通过面板查看当前服务器使用的磁盘情况&#xff0c;但是不知道如何查看。因为这边也是对于cPanel即WHM面板有是有所了解的&#xff0c;对于这个用户的问题&#xff0c; 操做步骤如下&am…

【Linux】Docker 安装部署 Nacos

个人简介&#xff1a;Java领域新星创作者&#xff1b;阿里云技术博主、星级博主、专家博主&#xff1b;正在Java学习的路上摸爬滚打&#xff0c;记录学习的过程~ 个人主页&#xff1a;.29.的博客 学习社区&#xff1a;进去逛一逛~ 【Linux】Docker 安装部署 Nacos docker搜索na…

看完这篇文章我奶奶都懂Opentracing了(一)

前言 如果要基于Opentracing开发分布式链路追踪Java客户端工具包&#xff0c;首先肯定需要了解Opentracing中的各种概念&#xff0c;包括但不限于Span和Scope等&#xff0c;其实这些概念在Opentracing的官方文档中是有比较详尽的说明的&#xff0c;英文不好也能靠着机器翻译读…