StoryImager、Face Morph、Hash3D、DreamView、Magic-Boost、SmartControl

news2024/11/25 20:41:09

本文首发于公众号:机器感知

StoryImager、Face Morph、Hash3D、DreamView、Magic-Boost、SmartControl

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

图片

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer ......

StoryImager: A Unified and Efficient Framework for Coherent Story  Visualization and Completion

图片

Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability in many scenarios. 2) The additional introduced story history encoders bring an extremely high computational cost. 3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly. To these ends, we propose a bidirectional, unified, and efficient framework, namely StoryImager. The StoryImager enhances the storyboard generative ability inherited from the pre-trained text-to-image model for a bidirectional generation. Specifically, we introduce a Target Frame Masking Strategy to extend and unify different story image generation tasks.......

Greedy-DiM: Greedy Algorithms for Unreasonably Effective Face Morphs

图片

Morphing attacks are an emerging threat to state-of-the-art Face Recognition (FR) systems, which aim to create a single image that contains the biometric information of multiple identities. Diffusion Morphs (DiM) are a recently proposed morphing attack that has achieved state-of-the-art performance for representation-based morphing attacks. However, none of the existing research on DiMs have leveraged the iterative nature of DiMs and left the DiM model as a black box, treating it no differently than one would a Generative Adversarial Network (GAN) or Varational AutoEncoder (VAE). We propose a greedy strategy on the iterative sampling process of DiM models which searches for an optimal step guided by an identity-based heuristic function. We compare our proposed algorithm against ten other state-of-the-art morphing algorithms using the open-source SYN-MAD 2022 competition dataset. We find that our proposed algorithm is unreasonably effective, fooling all of the tested FR system......

Hash3D: Training-free Acceleration for 3D Generation

图片

The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models,......

DreamView: Injecting View-specific Text Guidance into Text-to-3D  Generation

图片

Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed. However, a challenge arises when the specific appearances need customizing at designated viewpoints but referring solely to the overall description for generating 3D objects. For instance, ambiguity easily occurs when producing a T-shirt with distinct patterns on its front and back using a single overall text guidance. In this work, we propose DreamView, a text-to-image approach enabling multi-view customization while maintaining overall consistency by adaptively injecting the view-specific and overall text guidance through a collaborative text guidance injection module, which can also be lifted to 3D generation via score distillation sampling. DreamView is trained with large-scale rendered multi-view images and their corresponding view-specific texts to learn to balance the separate content manipulation in each view and the global consistency of the over......

3D Geometry-aware Deformable Gaussian Splatting for Dynamic View  Synthesis

图片

In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, ......

Policy-Guided Diffusion

图片

In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-policy experience. However, in practice, model rollouts must be severely truncated to avoid compounding error. As an alternative, we propose policy-guided diffusion. Our method uses diffusion models to generate entire trajectories under the behavior distribution, applying guidance from the target policy to move synthetic experience further on-policy. We show that policy-guided diffusion models a regularized form of the target distribution that balances action likelihood under both the target and behavior policies, leading to plausible trajectories with high target policy probability, wh......

ZeST: Zero-Shot Material Transfer from a Single Image

图片

We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues. The method works on real images without any training resulting a zero-shot approach. Both qualitative and quantitative results on real and synthetic datasets demonstrate that ZeST outputs photorealistic images with transferred materials. We also show the application of ZeST to perform multiple edits and robust material assignment under different illuminations. Project Page: https://ttchengab.github.io/zest ......

Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

图片

Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency still exists and limited generated resolution, the generation results of such methods still lack intricate textures and complex geometries. To solve this problem, we propose Magic-Boost, a multi-view conditioned diffusion model that significantly refines coarse generative results through a brief period of SDS optimization ($\sim15$min). Compared to the previous text or single image based diffusion models, Magic-Boost exhibits a robust capability to generate images with high consistency from pseudo synthesized multi-view images. It provides precise SDS guidance that well aligns with the i......

SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

图片

Human visual imagination usually begins with analogies or rough sketches. For example, given an image with a girl playing guitar before a building, one may analogously imagine how it seems like if Iron Man playing guitar before Pyramid in Egypt. Nonetheless, visual condition may not be precisely aligned with the imaginary result indicated by text prompt, and existing layout-controllable text-to-image (T2I) generation models is prone to producing degraded generated results with obvious artifacts. To address this issue, we present a novel T2I generation method dubbed SmartControl, which is designed to modify the rough visual conditions for adapting to text prompt. The key idea of our SmartControl is to relax the visual condition on the areas that are conflicted with text prompts. In specific, a Control Scale Predictor (CSP) is designed to identify the conflict regions and predict the local control scales, while a dataset with text prompts and rough visual conditions is construc......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1582790.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

今日arXiv最热大模型论文:Dataverse,针对大模型的开源ETL工具,数据清洗不再难!

引言:大数据时代下的ETL挑战 随着大数据时代的到来,数据处理的规模和复杂性不断增加,尤其是在大语言模型(LLMs)的开发中,对海量数据的需求呈指数级增长。这种所谓的“规模化法则”表明,LLM的性…

Python爬虫之Scrapy框架基础

Scrapy爬虫框架介绍 文档 英文文档中文文档 什么是scrapy 基于twisted搭建的异步爬虫框架. scrapy爬虫框架根据组件化设计理念和丰富的中间件, 使其成为了一个兼具高性能和高扩展的框架 scrapy提供的主要功能 具有优先级功能的调度器去重功能失败后的重试机制并发限制ip使用次…

基于Spring Boot的网上商城购物系统设计与实现

基于Spring Boot的网上商城购物系统设计与实现 开发语言:Java框架:springbootJDK版本:JDK1.8数据库工具:Navicat11开发软件:eclipse/myeclipse/idea 系统部分展示 商品信息界面,在商品信息页面可以查看商…

谷歌(Google)历年编程真题——生命游戏

谷歌历年面试真题——数组和字符串系列真题练习。 生命游戏 根据 百度百科 , 生命游戏 ,简称为 生命 ,是英国数学家约翰何顿康威在 1970 年发明的细胞自动机。 给定一个包含 m n 个格子的面板,每一个格子都可以看成是一个细胞…

Python 全栈体系【四阶】(二十五)

第五章 深度学习 三、计算机视觉基本理论 11. 图像梯度处理 11.1 什么是图像梯度 图像梯度计算的是图像变化的速度。对于图像的边缘部分,其灰度值变化较大,梯度值也较大;相反,对于图像中比较平滑的部分,其灰度值变化…

蓝桥杯复习笔记

文章目录 gridflexhtml表格合并单元格 表单表单元素input类型 select h5文件上传拖拽apiweb Storage css块元素和行内元素转换positionfloat溢出显示隐藏外边距过渡和动画动画变形选择器属性选择伪类选择器 css3边框圆角边框阴影渐变text-overflow与word-wrap jsdom操作documen…

一键下载安装并自动绑定,Xinstall让您的应用推广更高效

在如今的移动互联网时代,应用的下载安装与绑定是用户体验的关键一环。然而,繁琐的操作步骤和复杂的绑定流程往往让用户望而却步,降低了应用的下载和使用率。为了解决这一难题,Xinstall应运而生,为用户提供了一种全新的…

gradio简单搭建——关键词匹配筛选

gradio简单搭建——关键词匹配筛选 界面搭建数据处理过程执行效果展示 上一节使用DataFrame中的apply方法提升了表格数据的筛选效率,本节使用gradio结合apply方法搭建一个关键词匹配筛选的交互界面。 界面搭建 import gradio as gr import pandas as pd from file…

C语言指针—二级指针和指针数组

二级指针和指针数组 二级指针 指针变量也是变量,是变量就有地址,那指针变量的地址存放在哪里? 这就是二级指针 。 int main() {int a 10;int* pa &a;//pa是一个指针变量,同时也是一个一级指针变量*pa 20;//此时解引用pa…

021——搭建TCP网络通信环境(c服务器python客户端)

目录 前言 服务器程序 服务器程序验证过程 客户端程序 前言 驱动开发暂时告一段落了。后面在研究一下OLED和GPS的驱动开发,并且优化前面已经移植过来的这些驱动,我的理念是在封装个逻辑处理层来处理这些驱动程序。server直接操作逻辑处理层的程序。 …

统信UOS(Linux)安装nvm node管理工具

整篇看完再操作,有坑!! 官网 nvm官网 按照官网方式安装,一直报 错 经过不断研究,正确步骤如下 1、下载安装包 可能因为网络安全不能访问github,我是链接热点下载的 wget https://github.com/nvm-sh/…

基于springboot+vue+Mysql的职称评审管理系统

开发语言:Java框架:springbootJDK版本:JDK1.8服务器:tomcat7数据库:mysql 5.7(一定要5.7版本)数据库工具:Navicat11开发软件:eclipse/myeclipse/ideaMaven包:…

Mac安装配置ElasticSearch和Kibana 8.13.2

系统环境:Mac M1 (MacOS Sonoma 14.3.1) 一、准备 从Elasticsearch:官方分布式搜索和分析引擎 | Elastic上下载ElasticSearch和Kibana 笔者下载的是 elasticsearch-8.13.2-darwin-aarch64.tar.gz kibana-8.13.2-darwin-aarch64.tar.gz 并放置到个人…

序列化、反序列化:将对象以字节流的方式,进行写入或读取

序列化:将指定对象,以"字节流"的方式写入一个文件或网络中。 反序列化:从一个文件或网络中,以"字节流"的方式读取到对象。 package com.ztt.Demo01;import java.io.FileNotFoundException; import java.io.Fi…

C++的stack和queue类(一):适配器模式、双端队列与优先级队列

目录 基本概念 适配器模式 stack.h test.cpp 双端队列-deque 仿函数 优先级队列 基本概念 1、stack和queue不是容器是容器适配器,它们没有迭代器 2、stack的quque的默认容器是deque,因为: stack和queue不需要遍历&#xff0…

基于SSM+Jsp+Mysql的农产品供销服务系统

开发语言:Java框架:ssm技术:JSPJDK版本:JDK1.8服务器:tomcat7数据库:mysql 5.7(一定要5.7版本)数据库工具:Navicat11开发软件:eclipse/myeclipse/ideaMaven包…

0基础想进入IT行业,可以从这个框架入手

行业现状 IT、AI都是很多年来的热门话题,以至于时至今日,哪怕IT行业已经卷成狗,依然有无数的人想要挤进这个行业。 大模型、云原生、react等等,无数的技术、概念应运而生。那么作为一个没有基础的人,该如何进入这个行…

第十四届蓝桥杯模拟考试II_物联网设计

还是要稳妥啊,写A板的时候感觉很简单所以将模块都混在一起了,结果不出意外就出BUG了又得从头开始查BUG,多简单的题模块最好都分块写写完就检查,这样一步一个脚印多稳 这个模块出了俩BUG第一个是要检查有没有数据进入if语句,不然标…

Kubernetes(k8s)监控与报警(qq邮箱+钉钉):Prometheus + Grafana + Alertmanager(超详细)

Kubernetes(k8s)监控与报警(qq邮箱钉钉):Prometheus Grafana Alertmanager(超详细) 1、部署环境2、基本概念简介2.1、Prometheus简介2.2、Grafana简介2.3、Alertmanager简介2.4、Prometheus …

OpenCV | 图像读取与显示

OpenCV 对图像进行处理时,常用API如下: API描述cv.imread根据给定的磁盘路径加载对应的图像,默认使用BGR方式加载cv.imshow展示图像cv.imwrite将图像保存到磁盘中cv.waitKey暂停一段时间,接受键盘输出后,继续执行程序…