MambaMixer、NeSLAM、Talk3D、BundledSLAM、ShapeFusion

news2024/11/26 21:22:13

本文首发于公众号:机器感知

MambaMixer、NeSLAM、Talk3D、BundledSLAM、ShapeFusion

ShapeFusion: A 3D diffusion model for localized shape editing

图片

In the realm of 3D computer vision, parametric models have emerged as a ground-breaking methodology for the creation of realistic and expressive 3D avatars. Traditionally, they rely on Principal Component Analysis (PCA), given its ability to decompose data to an orthonormal space that maximally captures shape variations. However, due to the orthogonality constraints and the global nature of PCA's decomposition, these models struggle to perform localized and disentangled editing of 3D shapes, which severely affects their use in applications requiring fine control such as face sculpting. In this paper, we leverage diffusion models to enable diverse and fully localized edits on 3D meshes, while completely preserving the un-edited regions. We propose an effective diffusion masking training strategy that, by design, facilitates localized manipulation of any shape region, without being limited to predefined regions or to sparse sets of predefined control vertices. Following our fra......

BundledSLAM: An Accurate Visual SLAM System Using Multiple Cameras

图片

Multi-camera SLAM systems offer a plethora of advantages, primarily stemming from their capacity to amalgamate information from a broader field of view, thereby resulting in heightened robustness and improved localization accuracy. In this research, we present a significant extension and refinement of the state-of-the-art stereo SLAM system, known as ORB-SLAM2, with the objective of attaining even higher precision.To accomplish this objective, we commence by mapping measurements from all cameras onto a virtual camera termed BundledFrame. This virtual camera is meticulously engineered to seamlessly adapt to multi-camera configurations, facilitating the effective fusion of data captured from multiple cameras. Additionally, we harness extrinsic parameters in the bundle adjustment (BA) process to achieve precise trajectory estimation.Furthermore, we conduct an extensive analysis of the role of bundle adjustment (BA) in the context of multi-camera scenarios, delving into its impac......

MambaMixer: Efficient Selective State Space Models with Dual Token and  Channel Selection

图片

Recent advances in deep learning have mainly relied on Transformers due to their data dependency and ability to learn at scale. The attention module in these architectures, however, exhibits quadratic time and space in input size, limiting their scalability for long-sequence modeling. Despite recent attempts to design efficient and effective architecture backbone for multi-dimensional data, such as images and multivariate time series, existing models are either data independent, or fail to allow inter- and intra-dimension communication. Recently, State Space Models (SSMs), and more specifically Selective State Space Models, with efficient hardware-aware implementation, have shown promising potential for long sequence modeling. Motivated by the success of SSMs, we present MambaMixer, a new architecture with data-dependent weights that uses a dual selection mechanism across tokens and channels, called Selective Token and Channel Mixer. MambaMixer connects selective mixers using......

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models  for Image Inpainting

图片

Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process.Despite the meaningful semantics generation,the existing arts suffer from the semantic discrepancy between masked and unmasked regions, since the semantically dense unmasked texture fails to be completely degraded while the masked regions turn to the pure noise in diffusion process,leading to the large discrepancy between them.In this paper,we aim to answer how unmasked semantics guide texture denoising process;together with how to tackle the semantic discrepancy,to facilitate the consistent and meaningful semantics generation.To this end,we propose a novel structure-guided diffusion model named StrDiffusion,to reformulate the conventional texture denoising process under structure guidance to derive a simplified denoising objective for image inpaintin......

Diff-Reg v1: Diffusion Matching Model for Registration Problem

图片

Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, which rely on single-pass prediction, may struggle with local minima in complex scenarios. To mitigate these challenges, we introduce a diffusion matching model for robust correspondence construction. Our approach treats correspondence estimation as a denoising diffusion process within the doubly stochastic matrix space, which gradually denoises (refines) a doubly stochastic matching matrix to the ground-truth one for high-quality correspondence estimation. It involves a forward diffusion process that gradually introduces Gaussian noise into the ground truth matching matrix and a rever......

HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban  Scenes

图片

The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across di......

NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking  With Depth Completion and Denoising

图片

In recent years, there have been significant advancements in 3D reconstruction and dense RGB-D SLAM systems. One notable development is the application of Neural Radiance Fields (NeRF) in these systems, which utilizes implicit neural representation to encode 3D scenes. This extension of NeRF to SLAM has shown promising results. However, the depth images obtained from consumer-grade RGB-D sensors are often sparse and noisy, which poses significant challenges for 3D reconstruction and affects the accuracy of the representation of the scene geometry. Moreover, the original hierarchical feature grid with occupancy value is inaccurate for scene geometry representation. Furthermore, the existing methods select random pixels for camera tracking, which leads to inaccurate localization and is not robust in real-world indoor environments. To this end, we present NeSLAM, an advanced framework that achieves accurate and dense depth estimation, robust camera tracking, and realistic synthe......

Transformer-Lite: High-efficiency Deployment of Large Language Models on  Mobile Phone GPUs

图片

The Large Language Model (LLM) is widely employed for tasks such as intelligent assistants, text summarization, translation, and multi-modality on mobile phones. However, the current methods for on-device LLM deployment maintain slow inference speed, which causes poor user experience. To facilitate high-efficiency LLM deployment on device GPUs, we propose four optimization techniques: (a) a symbolic expression-based approach to support dynamic shape model inference; (b) operator optimizations and execution priority setting to enhance inference speed and reduce phone lagging; (c) an FP4 quantization method termed M0E4 to reduce dequantization overhead; (d) a sub-tensor-based technique to eliminate the need for copying KV cache after LLM inference. Furthermore, we implement these methods in our mobile inference engine, Transformer-Lite, which is compatible with both Qualcomm and MTK processors. We evaluated Transformer-Lite's performance using LLMs with varied architectures and......

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

图片

Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views d......

StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image  Translation

图片

Most image-to-image translation models postulate that a unique correspondence exists between the semantic classes of the source and target domains. However, this assumption does not always hold in real-world scenarios due to divergent distributions, different class sets, and asymmetrical information representation. As conventional GANs attempt to generate images that match the distribution of the target domain, they may hallucinate spurious instances of classes absent from the source domain, thereby diminishing the usefulness and reliability of translated images. CycleGAN-based methods are also known to hide the mismatched information in the generated images to bypass cycle consistency objectives, a process known as steganography. In response to the challenge of non-bijective image translation, we introduce StegoGAN, a novel model that leverages steganography to prevent spurious features in generated images. Our approach enhances the semantic consistency of the translated ima......

Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D  Generative Prior

图片

Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introduce a novel audio-driven talking head synthesis framework, called Talk3D, that can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Given the personalized 3D generative model, we present a novel audio-guided attention U-Net architecture that predicts the dynamic face variations in the NeRF space driven by audio. Furthermore, our model is further modulated by audio-unrelated conditioning tokens which effectively disentangle variations unrelated to audio features. Compared to existing methods, our method excels in ......

Motion Inversion for Video Customization

图片

In this research, we present a novel approach to motion customization in video generation, addressing the widespread gap in the thorough exploration of motion representation within video generative models. Recognizing the unique challenges posed by video's spatiotemporal nature, our method introduces Motion Embeddings, a set of explicit, temporally coherent one-dimensional embeddings derived from a given video. These embeddings are designed to integrate seamlessly with the temporal transformer modules of video diffusion models, modulating self-attention computations across frames without compromising spatial integrity. Our approach offers a compact and efficient solution to motion representation and enables complex manipulations of motion characteristics through vector arithmetic in the embedding space. Furthermore, we identify the Temporal Discrepancy in video generative models, which refers to variations in how different motion modules process temporal relationships between......

U-VAP: User-specified Visual Appearance Personalization via Decoupled  Self Augmentation

图片

Concept personalization methods enable large text-to-image models to learn specific subjects (e.g., objects/poses/3D models) and synthesize renditions in new contexts. Given that the image references are highly biased towards visual attributes, state-of-the-art personalization models tend to overfit the whole subject and cannot disentangle visual characteristics in pixel space. In this study, we proposed a more challenging setting, namely fine-grained visual appearance personalization. Different from existing methods, we allow users to provide a sentence describing the desired attributes. A novel decoupled self-augmentation strategy is proposed to generate target-related and non-target samples to learn user-specified visual attributes. These augmented data allow for refining the model's understanding of the target attribute while mitigating the impact of unrelated attributes. At the inference stage, adjustments are conducted on semantic space through the learned target and no......

Relation Rectification in Diffusion Model

图片

Despite their exceptional generative abilities, large text-to-image diffusion models, much like skilled but careless artists, often struggle with accurately depicting visual relationships between objects. This issue, as we uncover through careful analysis, arises from a misaligned text encoder that struggles to interpret specific relationships and differentiate the logical order of associated objects. To resolve this, we introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate. To address this, we propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN). It models the directional relationships between relation terms and corresponding objects within the input prompts. Specifically, we optimize the HGCN on a pair of prompts with identical relational words but reversed object orders, supplemented by a few reference images. The lightweight HGCN ad......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1561055.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

全套医院手术麻醉系统源码 人工智能麻醉系统源码 医疗管理系统源码

全套医院手术麻醉系统源码 人工智能麻醉系统源码 医疗管理系统源码 手术麻醉临床信息系统有着完善的临床业务功能,能够涵盖整个围术期的工作,能够采集、汇总、存储、处理、展现所有的临床诊疗资料。通过该系统的实施,能够规范麻醉科的工作流…

Qt实现通过.css样式文件实时加载QSS样式

初学Qt时要想通过QSS修改控件QWidget,QPushButton等原生基础控件的样式,一般都是直接在.ui文件中直接添加qss,或者在代码中通过setStyleSheet(QString qss)来设置。当程序很大时,很多地方需要复用样式时会非常麻烦,qss…

C++教学——从入门到精通 6.ASCII码与字符型

如何把小写字母转换成大写字母呢? 这个问题问的好,首先我们要新学一个类型——char 这个类型就是字符型 再来说说ASCII码 给大家举几个例子 空格————32 0————48 9————57 A————65 Z————90 a————97 z————122 我们…

LeetCode-560. 和为 K 的子数组【数组 哈希表 前缀和】

LeetCode-560. 和为 K 的子数组【数组 哈希表 前缀和】 题目描述:解题思路一:一边算前缀和一边统计。这里用哈希表统计前缀和出现的次数,那么和为k的子数组的个数就是当前前缀和-k的个数,即preSums[presum - k]。画个图表述就是&a…

NGINX 反向代理 CORS

我遇到了一个问题就是 Nginx 是作为反向代理服务器部署的,但因为 Nginx 的配置导致 CORS 问题。 在这个时候我们可以对 Nginx 的配置文件进行修改: 在 location 后添加下面的内容: add_header Access-Control-Allow-Origin *; add_header A…

Kafka入门到实战-第二弹

Kafka入门到实战 Kafka快速开始官网地址Kafka概述Kafka术语Kafka初体验更新计划 Kafka快速开始 官网地址 声明: 由于操作系统, 版本更新等原因, 文章所列内容不一定100%复现, 还要以官方信息为准 https://kafka.apache.org/Kafka概述 Apache Kafka 是一个开源的分布式事件流…

ES6学习之路:迭代器Iterator和生成器Generator

迭代器 一、知识背景 什么是迭代器 迭代器就是在一个数据集合中不断取出数据的过程迭代和遍历的区别 遍历是把所有数据都取出迭代器注重的是依次取出数据,它不会在意有多少数据,也不会保证能够取出多少或者能够把数据都取完。比如斐波那契额数列&#…

离散时间动态系统的集成自适应动态规划智能控制-北科大博毕

主要内容: 传统值迭代产生迭代控制策略,给出稳定性和吸引域判据;传统值迭代则迭代过程中得到可容许策略折扣因子对迭代控制策略可容许的影响,神经网络对未知系统建模,讨论模型网络权重更新情况下参数误差和系统状态估…

TikTok零播放?可能是海外代理IP的问题

在当今社交媒体的蓬勃发展中,TIKTOK作为一款备受欢迎的短视频平台,其直播功能也逐渐受到用户的青睐。然而,有时候跨境电商商家在进行直播时却面临着一个令人头疼的问题:没有观众。这时候,海外代理IP可能是一个潜在的原…

吴恩达:别光盯着GPT-5,用GPT-4做个智能体可能提前达到GPT-5的效果

最近,斯坦福大学教授吴恩达在演讲中提到,他们发现,基于 GPT-3.5 构建的智能体工作流在应用中表现比 GPT-4 要好。 AI 智能体是去年很火的一个话题,但是 AI 智能体到底有多大的潜力,很多人可能没有概念。 最近&#x…

【环境搭建】(四)ubuntu22.04系统安装Opencv4.8.0+Opencv-contrib4.8.0

一个愿意伫立在巨人肩膀上的农民...... 一、安装下载所需工具 1.打开终端,输入以下命令来更新软件源: sudo apt-get update 2.安装wget: sudo apt-get install wget 3.下载opencv和opencv-contrib包: wget -O opencv-4.8.0.…

ISP-VPN实验

文章目录 ISP-VPN实验一,实验拓扑二、实验要求三、IP规划四、实验配置1、IP配置R1的配置R2的配置R3的配置R4的配置R5的配置 2、配置缺省路由3、认证与被认证配置4、HDLC封装5、构建MGRE和GRE6、整个私有网络基于RIP全网可达7、查看路由配置和PC端配置8、PC端pingR5的…

Vue element-plus 导航栏 [el-menu]

导航栏 [el-menu] Menu 菜单 | Element Plus el-menu有很多属性和子标签,为网站提供导航功能的菜单。 常用标签: 它里面有两个子标签。el-menu-item,它其实就是el-menu每一个里面的item,item就是真实匹配到路由的每个栏目&#…

【Python使用】嘿马头条完整开发md笔记第3篇:数据库,1 新增【附代码文档】

嘿马头条项目从到完整开发笔记总结完整教程(附代码资料)主要内容讲述:课程简介,ToutiaoWeb虚拟机使用说明1 产品介绍,2 原型图与UI图,3 技术架构,4 开发,1 需求,2 注意事项。数据库,理解ORM1 简介,2 安装,3 数据库连接…

使用CRXjs、Vite、Vue 开发 Chrome 多页面插件,手动配置 vite.config.ts 和 manifest.json 文件

一、使用CRXjs、Vite、Vue 开发 Chrome 多页面插件,手动配置 vite.config.ts 和 manifest.json 文件 一、创建 Vue 项目 1. 使用 Vite 创建 Vue 项目 npm create vitelatest # npm yarn create vite # yarn pnpm create vite # pnpm选择 Vue 和 TS 进入项目…

SpringBoot微服务实现深度学习:构建AGI道路的基石+实战案例演示

🎉🎉欢迎光临,终于等到你啦🎉🎉 🏅我是苏泽,一位对技术充满热情的探索者和分享者。🚀🚀 🌟持续更新的专栏《Spring 狂野之旅:从入门到入魔》 &a…

4月4日生效!管控升级!继续围堵 | 百能云芯

据路透社29日报道,美国拜登政府当天修改规则,升级针对中国人工智能(AI)芯片和相关工具出口管制的措施。报道声称,这是美国以国家安全为由阻碍中国芯片制造业发展的努力的一部分。 美国商务部下属的工业与安全局&#x…

移动硬盘损坏打不开?别慌,数据恢复有妙招!

移动硬盘作为我们日常存储和传输数据的重要工具,一旦遇到损坏打不开的情况,无疑会让我们倍感焦虑。毕竟,其中存储的文件可能关乎工作、学习或生活的方方面面,一旦丢失,后果不堪设想。那么,当移动硬盘损坏打…

UE5 SQLite笔记

开发环境: 系统:Windows 10 64 bit 引擎:Unreal Engine 5.1.1 IDE:JetBrains Rider 2023.2.1 语言:C 工具:DB Browser for SQLite SQLite数据类型: //INTEGER TEXT BLOB REAL NUMERIC/*integer…

使用Pilotfish扩展Sui执行能力

Pilotfish第一个多机智能合约执行引擎,使Sui网络的验证节点可以利用多台机器,并在负载增加时自动扩展以执行更多的交易。这一目标实现不会影响可靠性或功能完整性。 Pilotfish可以从内部执行机器的故障中恢复,并支持Sui的全面动态操作。其流…