GenVideo、SkelFormer、EfficientGS、HOLD、Motion Synthesis、Learn2Talk

news2024/12/26 23:28:12

本文首发于公众号:机器感知

GenVideo、SkelFormer、EfficientGS、HOLD、Motion Synthesis、Learn2Talk

图片

Enabling Stateful Behaviors for Diffusion-based Policy Learning

图片

While imitation learning provides a simple and effective framework for policy learning, acquiring consistent actions during robot execution remains a challenging task. Existing approaches primarily focus on either modifying the action representation at data curation stage or altering the model itself, both of which do not fully address the scalability of consistent action generation. To overcome this limitation, we introduce the Diff-Control policy, which utilizes a diffusion-based model to learn the action representation from a state-space modeling viewpoint. We demonstrate that we can reduce diffusion-based policies' uncertainty by making it stateful through a Bayesian formulation facilitated by ControlNet, leading to improved robustness and success rates. Our experimental results demonstrate the significance of incorporating action statefulness in policy learning, where Diff-Control shows improved performance across various tasks. Specifically, Diff-Control achieves an ave......

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I  Diffusion Models

图片

Video editing methods based on diffusion models that rely solely on a text prompt for the edit are hindered by the limited expressive power of text prompts. Thus, incorporating a reference target image as a visual guide becomes desirable for precise control over edit. Also, most existing methods struggle to accurately edit a video when the shape and size of the object in the target image differ from the source object. To address these challenges, we propose "GenVideo" for editing videos leveraging target-image aware T2I models. Our approach handles edits with target objects of varying shapes and sizes while maintaining the temporal consistency of the edit using our novel target and shape aware InvEdit masks. Further, we propose a novel target-image aware latent noise correction strategy during inference to improve the temporal consistency of the edits. Experimental analyses indicate that GenVideo can effectively handle edits with objects of varying shapes, where existing appr......

Does Gaussian Splatting need SFM Initialization?

图片

3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome. To this end, we investigate various initialization strategies for Gaussian Splatting and delve into how volumetric reconstructions from Neural Radiance Fields (NeRF) can be utilized to bypass the dependency on SFM data. Our findings demonstrate that random initialization can perform much better if carefully designed and that by employing a combination of improved initialization strategies and structure distillation from low-cost NeRF models, it is possible to achieve equivalent results, or at times even superior, to those obtained from SFM initialization. ......

SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal  Transformers

图片

We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation. Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions. Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations. This module integrates prior knowledge about pose space and infers the full pose state at runtime. Separating the 3D keypoint detection and inverse-kinematic problems, along with the expressive representations learned by our skeletal transformer, enhance the generalization of our method to unseen noisy data. We evaluate our method on three public datasets in both in-distribution and out-of-distribution settings using three datasets, and observe strong performance with respect to prior works. Moreover, ablation experiments demonstrate the impact of each of the mo......

Improving Chinese Character Representation with Formation Tree

图片

Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, which requires models to handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). This model utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for to both character images and tree nodes, enabling efficient and effective training. This ......

EfficientGS: Streamlining Gaussian Splatting for Large-Scale  High-Resolution Scene Representation

图片

In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\times$4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-resolution, large-scale scenes. We analyze the densification process in 3DGS and identify areas of Gaussian over-proliferation. We propose a selective strategy, limiting Gaussian increase to key primitives, thereby enhancing the representational efficiency. Additionally, we develop a pruning mechanism to remove redundant Gaussians, those that are merely auxiliary to adjacent ones. For further enhancement, we integrate a sparse order increment for Spherical Harmonics (SH), designed to alleviate storage constraints and reduce training overhead. Our empirical evaluations, conducted on a ra......

Generative Modelling with High-Order Langevin Dynamics

图片

Diffusion generative modelling (DGM) based on stochastic differential equations (SDEs) with score matching has achieved unprecedented results in data generation. In this paper, we propose a novel fast high-quality generative modelling method based on high-order Langevin dynamics (HOLD) with score matching. This motive is proved by third-order Langevin dynamics. By augmenting the previous SDEs, e.g. variance exploding or variance preserving SDEs for single-data variable processes, HOLD can simultaneously model position, velocity, and acceleration, thereby improving the quality and speed of the data generation at the same time. HOLD is composed of one Ornstein-Uhlenbeck process and two Hamiltonians, which reduce the mixing time by two orders of magnitude. Empirical experiments for unconditional image generation on the public data set CIFAR-10 and CelebA-HQ show that the effect is significant in both Frechet inception distance (FID) and negative log-likelihood, and achieves the ......

MCM: Multi-condition Motion Synthesis Framework

图片

Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions. Text and audio represent the two predominant modalities employed as HMS control conditions. While existing research has primarily focused on single conditions, the multi-condition human motion synthesis remains underexplored. In this study, we propose a multi-condition HMS framework, termed MCM, based on a dual-branch structure composed of a main branch and a control branch. This framework effectively extends the applicability of the diffusion model, which is initially predicated solely on textual conditions, to auditory conditions. This extension encompasses both music-to-dance and co-speech HMS while preserving the intrinsic quality of motion and the capabilities for semantic association inherent in the original model. Furthermore, we propose the implementation of a Transformer-based diffusion model, designated as MWNet, as the main branch. This model adeptl......

Learn2Talk: 3D Talking Face Learns from 2D Talking Face

图片

Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex ......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1615912.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

web前端框架设计第五课-计算属性与监听属性

web前端框架设计第五课-计算属性与监听属性 一.预习笔记 1.计算属性 computed split():拆分 reverse():倒序 join():拼接 计算属性与方法,两者效果一致,但是computed 是基于它的依赖缓存,只有相关依赖发生改变时才会重新取值。而使用 met…

Rumble Club加速器哪个好用 稳定好用的联机加速器推荐

Rumble Club加速器哪个好用 稳定好用的联机加速器推荐 说到Rumble Club这款游戏,各位休闲玩家肯定不陌生,这是一款基于物理定律的在线玩家对战游戏,玩法独特且充满乐趣。玩家可以使用各种富有想象力的方式推搡、击打和超越对手,以…

路由过滤,路由策略小实验

目录 一,实验拓扑: 二,实验要求: 三,实验思路: 四,实验过程: 1,IP配置: 2、R1 和R2 运行 RIPv2,R2,R3 和R4运行 oSPF&#xff0…

卫星导航简介

本文旨在对卫星导航系统进行简要介绍,包括其基本原理、发展历程以及在现代社会中的广泛应用。文章首先阐述了卫星导航的基本原理,即利用卫星发射的信号进行定位和导航。接着,回顾了卫星导航技术的发展历程,从早期的试验阶段到如今…

OneNote插件推荐(OneMore)

使用OneNote编辑笔记时希望有一个插件能够实现markdown的功能,于是发现了OneMark,后面用着用着,OneMark竟然收费了,于是苦苦找寻好用的markdown插件,无果,此时发现我的目标主要是实现对代码的格式化&#x…

vue3中web前端JS动画案例(二)多物体运动-多值运动

<script setup> import { ref, onMounted, watch } from vue // ----------------------- 01 js 动画介绍--------------------- // 1、匀速运动 // 2、缓动运动&#xff08;常见&#xff09; // 3、透明度运动 // 4、多物体运动 // 5、多值动画// 6、自己的动画框架 // …

DBUnit增强:填充随机数据和相对时间数据

痛点 测试环境验证时&#xff0c;遇到与当前相对时间相关的测试吗&#xff1f;准备一份SQL&#xff1f;隔一段时间就不能用了。每过一段时间去更新脚本或重置系统时间&#xff1f;看上去也不是很合适的解决方案。依赖数据测试时要重新做&#xff0c;演示时候得全部改&#xff…

Ubuntu-18.04本地化部署Rustdesk服务器

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 前言一、配置防火墙二、安装三大件1.下载三大件2.安装三大件 三、安装客户端1.下载客户端1.Windows2.Linux 四、配置客户端连接服务器五、总结 前言 如果你是想数据…

腾讯云轻量2核2G4M服务器优惠价格99元一年,多配置报价单

腾讯云轻量2核2G4M服务器优惠价格99元一年&#xff0c;多配置报价单。腾讯云服务器价格表2024年最新价格&#xff0c;轻量2核2G3M服务器61元一年、2核2G4M服务器99元1年&#xff0c;三年560元、2核4G5M服务器165元一年、3年900元、轻量4核8M12M服务器646元15个月、4核16G10M配置…

JavaWeb开发06-原理-Spring配置优先级-Bean管理-SpringBoot原理-Maven继承和聚合-私服

一、Spring配置优先级 不同配置文件&#xff0c;配置同一个属性谁有效 properties>yml>yaml 命令行参数>Java系统属性 项目打包后要改变属性&#xff1a; 红色是Java系统属性&#xff0c;绿色是命令行参数 ‘ 二、Bean管理 1.获取bean 获取IOC容器&#xff1a;ap…

SpringAOP从入门到源码分析大全(三)ProxyFactory源码分析

文章目录 系列文档索引五、ProxyFactory源码分析1、案例2、认识TargetSource&#xff08;1&#xff09;何时用到TargetSource&#xff08;2&#xff09;Lazy的原理&#xff08;3&#xff09;应用TargetSource 3、ProxyFactory选择cglib或jdk动态代理原理4、jdk代理获取代理方法…

内存泄漏详解

一、什么是内存泄漏&#xff1f;二、内存泄漏的原因三、内存泄漏的影响四、如何检测和解决内存泄漏&#xff1f;五、总结 一、什么是内存泄漏&#xff1f; 内存泄漏指的是程序中已分配的内存没有被正确释放&#xff0c;导致这部分内存无法被再次利用&#xff0c;最终导致内存资…

【Java框架】SpringBoot(一)基本入门

目录 SpringBoot介绍Spring Boot的诞生SpringBoot特点Spring和Spring Boot的关系Spring Boot的优点和缺点Spring Boot优点Spring Boot缺点 快速创建Spring Boot项目 IDEA创建SpringBoot工程1.打开IDEA&#xff0c;选择Spring Initlializr2.选择SpringBoot版本和初始化依赖3.更改…

病理验证mIF和TMA路线(自学)

目录 技术 使用配对病理切片 mIF验证 单基因使用TMA验证 技术 多重荧光免疫组化技术 (Multiplex immunohistochemical&#xff0c;mIHC) 也称作酪氨酸信号放大 (Tyramide dignal amplification&#xff0c;TSA) 技术&#xff0c;是一类利用辣根过氧化酶 (Horseradish Pero…

(一)Java EE企业级应用开发实战之Servlet教程——JDK安装

首先打开清华大学开源软件镜像站&#xff0c;清华大学开源镜像网站地址为&#xff1a; https://mirrors.tuna.tsinghua.edu.cn/ 打开该地址后的界面显示如下图所示 找到8版本对应的SDK安装包&#xff0c;我现在用的开发机器是Windows&#xff0c;所以我找的是Windows对应的版本…

Spring AOP (一)

本篇主要介绍Spring AOP的基础概念和入门使用 一、AOP的基本概念 AOP是一种面向切面编程的思想&#xff0c;它与IOC并称为Spring 的两大核心思想。什么是面向切面编程呢&#xff0c;具体来说就是对一类事情进行集中统一处理。这听起来像不像前面篇章中所介绍的统一功能处理&am…

Vue2 移动端(H5)项目封装弹窗组件

前言 因vant-ui的dialog组件没有自定义footer插槽 效果 参数配置 1、代码示例&#xff1a; <t-dialog :visible.sync"show" :title"title" submit"submit"></t-dialog>2、配置参数&#xff08;t-dialog Attributes&#xff09; 参…

JAVA基础之垃圾收集器

一 JVM垃圾收集 分代收集思想 当前虚拟机的垃圾收集一般采用分代收集算法&#xff0c;这种算法本身没有创新性&#xff0c;只是根据对象存活周期的不同将内存分为几块。一般将java堆内存分为新生代和老年代&#xff0c;这样我们就可以根据不同年龄到的特点选择不同的垃圾收集…

自动驾驶控制算法

本文内容来源是B站——忠厚老实的老王&#xff0c;侵删。 三个坐标系和一些有关的物理量 使用 frenet坐标系可以实现将车辆纵向控制和横向控制解耦&#xff0c;将其分开控制。使用右手系来进行学习。 一些有关物理量的基本概念&#xff1a; 运动学方程 建立微分方程 主要是弄…

Agent 智能体食用指南

Agent 智能体食用指南 三年前都在 ALL in AI&#xff0c;一年前都在 ALL in LLM&#xff0c;现在都在 ALL in AgentAutoGEN分析MetaGPT 分析RAG 分析MOE 多专家分析 三年前都在 ALL in AI&#xff0c;一年前都在 ALL in LLM&#xff0c;现在都在 ALL in Agent 科技圈焦点&…