Voice Conversion、DreamScene、X-SLAM、Panoptic-SLAM、DiffMap、TinySeg

news2025/1/12 3:47:36

本文首发于公众号:机器感知

Voice Conversion、DreamScene、X-SLAM、Panoptic-SLAM、DiffMap、TinySeg

图片

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a  Conditional Diffusion Model

图片

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available......

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular  Videos

图片

Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be infe......

X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD

图片

We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of cam......

Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic  Segmentation

图片

The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped ro......

Characterized Diffusion and Spatial-Temporal Interaction Network for  Trajectory Prediction in Autonomous Driving

图片

Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on t......

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose  Estimation

图片

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com......

DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model

图片

Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately d......

TinySeg: Model Optimizing Framework for Image Segmentation on Tiny  Embedded Systems

图片

Image segmentation is one of the major computer vision tasks, which is applicable in a variety of domains, such as autonomous navigation of an unmanned aerial vehicle. However, image segmentation cannot easily materialize on tiny embedded systems because image segmentation models generally have high peak memory usage due to their architectural characteristics. This work finds that image segmentation models unnecessarily require large memory space with an existing tiny machine learning framework. That is, the existing framework cannot effectively manage the memory space for the image segmentation models. This work proposes TinySeg, a new model optimizing framework that enables memory-efficient image segmentation for tiny embedded systems. TinySeg analyzes the lifetimes of tensors in the target model and identifies long-living tensors. Then, TinySeg optimizes the memory usage of the target model mainly with two methods: (i) tensor spilling into local or remote storage and (ii) ......

Efficient and Economic Large Language Model Inference with Attention  Offloading

图片

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading. This approach leverages a collection of cheap, memory-optimized devices for the attention operator while still utilizing high-end accelerators for other parts of the model. This heterogeneous setup ensures that each component is tailored to its specific workload, maximizing overall performance and cost efficienc......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1645311.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

速卖通自养号测评海外环境:成本、步骤、技巧全掌握

相信不少涉足跨境业务的企业和商家都对速卖通耳熟能详。作为当下炙手可热的跨境电商平台,速卖通在国内电商市场渐趋饱和的背景下,吸引了众多国内卖家的目光。他们纷纷入驻速卖通,希望借助这一平台的力量,成功打通跨境业务渠道。然…

腾讯会议崩溃解决

突然腾讯会议就罢工了,腾讯会议的主界面可以登陆上去,不会异常退出: 这时无论是通过别人提供的会议号“加入会议” 还是 “快速会议”,都会出现下面的异常,并崩溃退出: 在网上搜“SteinwayMSVCRT”导致的腾讯会议的错误,会告诉你使用金山毒霸的XX医生解决,下载了金山毒…

新代数控Syntec网络IP配置设定教程

点击面板【维护】→【网络设定】→【IP地址取得方法:直接指定IP地址】→【IP地址:输入采集需要设定的IP】→【子网掩码:255.255.255.0】→【预设网关】 输入方法:点击面板上的【ENTER】输入键,输入相关参数即可。

git使用注意事项事项

以下操作均在gitee平台上实现 文章目录 1、本地仓库和远程仓库有冲突2、git提交自动忽略某些文件3、git无法push提交到远程仓库 1、本地仓库和远程仓库有冲突 在web端修改了文件内容或者删除了文件,本地仓库需要重新把远程仓库拉取到本地,或者强制提交到…

Mars3d实现用一个button控制一个map.control的显示与隐藏

原生js,想做一个button,控制比如compass的显示与隐藏 点一下显示 再次单击的时候就隐藏掉 写了一个function控制显示隐藏 function addCompass(){ if(compass.showtrue) { compass.showfalse; } else{ compass.showtrue; } } 功能示例(Vue版) | Mars3D三维可视化平台 | 火星…

面试中算法(无序数组排序后最大相邻差)

有一个无序整型数组,求该数组排序后的任意两个相邻元素的最大差值;要求时间复杂度和空间复杂度尽可能低。 (1)任意一种时间复杂度为O (nlogn)的排序算法(如快速排序)给原数组排序,然…

知识库工具:付费的HelpLook AI知识库比免费的牵牛易帮好在哪里

在知识管理的领域中,选择合适的知识库工具对于企业来说很重要。市面上有很多知识库产品,有付费的和免费的,但是还是有很多企业会选择使用付费的,而不是免费的。这是为什么呢?这就是今天要探讨的问题,下面就…

机器学习(二) ----------K近邻算法(KNN)+特征预处理+交叉验证网格搜索

目录 1 核心思想 1.1样本相似性 1.2欧氏距离(Euclidean Distance) 1.3其他距离 1.3.1 曼哈顿距离(Manhattan Distance) 1.3.2 切比雪夫距离(Chebyshev distance) 1.3.3 闵式距离(也称为闵…

1.4 初探JdbcTemplate操作

实战目的 掌握Spring框架中JdbcTemplate的使用,实现对数据库的基本操作。理解数据库连接池的工作原理及其在实际开发中的重要性。通过实际操作,加深对Spring框架中ORM(对象关系映射)的理解。 关键技术点 JdbcTemplate操作&…

AI人才争夺战,华尔街入局:豪掷百万美元年薪抢人 | 最新快讯

量子位公众号 QbitAI 继硅谷之后,华尔街也入局“AI 人才争夺大战”。 他们的目标非常明确——抢的就是高精尖的 AI 专家。 △图源:Business Insider 现在这条“街”上,不论是银行、对冲基金还是私募股权公司都已纷纷下场,可谓是豪…

Netty进阶-协议设计与解析

Netty进阶 一、黏包半包的深入理解(本质原因:TCP是流式协议,消息无边界)1、TCP滑动窗口2、协议设计与解析2.1、Redis协议2.2、HTTP协议2.3、自定义协议2.3.1、自定义协议要求2.3.2、自定义消息对象(编解码器、消息抽象…

谷歌明年6月关闭 Google Fit 运动记录API,要求开发者迁移至Android Health平台 | 最新快讯

5 月 6 日消息,谷歌近日发布官方新闻稿,宣布将在明年 6 月使用 Android Health 平台取代 Google Fit 运动记录 API,开发人员应当尽早启动迁移计划。 谷歌自 2022 年起逐渐扩大对 Android Health 平台的投资,旨在减少平台碎片化&am…

Python turtle库 实现 随机彩色文字平面批量输出

# -*- coding: utf-8 -*- """ Spyder Editor This is a temporary script file. """ import turtle import random import turtle as t t.colormode(255) turtle.bgcolor("white") h255 l50#字号 m60#间隔 n500 t.penup() turtle.hide…

2024-05-06 问AI: 介绍一下深度学习中的LSTM网络

文心一言 当谈到深度学习中的LSTM(Long Short-Term Memory)网络时,它是一种特殊的循环神经网络(RNN)架构,旨在解决传统RNN在处理长序列时遇到的梯度消失和梯度爆炸问题。LSTM网络因其能够捕捉序列数据中的…

面试官:关于HTTPS/HTTP2/HTTP3你懂多少?

公众号:程序员白特,欢迎一起交流学习~ HTTPS是什么 HTTP为什么不安全? https被认为是通信安全的http,除了http多了s和默认端口改成了443之外,其他都是沿用的http(除了明文和不安全)&#xff0…

Qt QInputDialog详解

1.简介 QInputDialog是一个对话框类,用于从用户那里获取一个单一的值。这个值可以是字符串、数字、或者一个列表中的选项。QInputDialog提供了一个方便的方式来快速创建一个输入对话框,无需自己从头开始构建。 QInputDialog支持多种输入类型&#xff1…

软件设计师-应用技术-数据流图题1

基础知识及技巧: 0. 概念: 在结构化分析中,数据流图用来记录系统中的数据和数据在特定的过程中的流动,即数据如何被采集、处理、保存和使用的(围绕信息系统的功能)。 1. 元素实例: 补充知识:** 外部实体…

K. 子串翻转回文串

给一个串 s  s1s2... sn,你可以选定其一个非空子串,然后将该子串翻转。具体来说,若选定的子串区间为 [l, r](1 ≤ l ≤ r ≤ n),则翻转后该串变为 s1s2... sl - 1srsr - 1... slsr  1... sn…

【企业动态】爱尔兰客户到访东胜物联,共拓能源管理等解决方案

近日,来自爱尔兰的房屋数据监测客户莅临东胜物联(杭州黄龙国际中心)进行参观考察,双方就未来的广泛合作进行了深入的沟通交流。 来访期间,东胜物联CEO支江峰先生热情接待了客户,并陪同他们参观了产品展厅&…

C语言数组介绍

文章目录 一、数组的概念二、一维数组1.一维数组的创建2.一维数组的初始化3.数组的类型4.一维数组的使用5.一维数组在内存中的存储6.sizeof计算数组元素个数 三、二维数组1.二维数组的概念2.二维数组的创建3.二维数组的初始化4.二维数组的使用5.二维数组的输入和输出6.二维数组…