ArtNeRF、Attention Control、Pixel is a Barrier、FilterPrompt

news2025/1/12 8:48:27

本文首发于公众号:机器感知

ArtNeRF、Attention Control、Pixel is a Barrier、FilterPrompt

图片

ArtNeRF: A Stylized Neural Field for 3D-Aware Cartoonized Face Synthesis

图片

Recent advances in generative visual models and neural radiance fields have greatly boosted 3D-aware image synthesis and stylization tasks. However, previous NeRF-based work is limited to single scene stylization, training a model to generate 3D-aware cartoon faces with arbitrary styles remains unsolved. We propose ArtNeRF, a novel face stylization framework derived from 3D-aware GAN to tackle this problem. In this framework, we utilize an expressive generator to synthesize stylized faces and a triple-branch discriminator module to improve the visual quality and style consistency of the generated faces. Specifically, a style encoder based on contrastive learning is leveraged to extract robust low-dimensional embeddings of style images, empowering the generator with the knowledge of various styles. To smooth the training process of cross-domain transfer learning, we propose an adaptive style blending module which helps inject style information and allows users to freely tune t......

Rethink Arbitrary Style Transfer with Transformer and Contrastive  Learning

图片

Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style fe......

Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text  Consistency and Domain Distribution Gap

图片

The recent advancements in Text-to-Video Artificial Intelligence Generated Content (AIGC) have been remarkable. Compared with traditional videos, the assessment of AIGC videos encounters various challenges: visual inconsistency that defy common sense, discrepancies between content and the textual prompt, and distribution gap between various generative models, etc. Target at these challenges, in this work, we categorize the assessment of AIGC video quality into three dimensions: visual harmony, video-text consistency, and domain distribution gap. For each dimension, we design specific modules to provide a comprehensive quality assessment of AIGC videos. Furthermore, our research identifies significant variations in visual quality, fluidity, and style among videos generated by different text-to-video models. Predicting the source generative model can make the AIGC video features more discriminative, which enhances the quality assessment performance. The proposed method was used......

LASER: Tuning-Free LLM-Driven Attention Control for Efficient  Text-conditioned Image-to-Animation

图片

Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e.g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance. Despite being promising, existing methods focus on texture- or non-rigid-based visual manipulation, which struggles to produce the fine-grained animation of smooth text-conditioned image morphing without fine-tuning, i.e., due to their highly unstructured latent space. In this paper, we introduce a tuning-free LLM-driven attention control framework, encapsulated by the progressive process of LLM planning, prompt-Aware editing, StablE animation geneRation, abbreviated as LASER. LASER employs a large language model (LLM) to refine coarse descriptions into detailed prompts, guiding pre-trained text-to-image models for subsequent image generation. We manipulate the model's spatial features and self-attention mecha......

Generalizable Novel-View Synthesis using a Stereo Camera

图片

In this paper, we propose the first generalizable view synthesis approach that specifically targets multi-view stereo-camera images. Since recent stereo matching has demonstrated accurate geometry prediction, we introduce stereo matching into novel-view synthesis for high-quality geometry reconstruction. To this end, this paper proposes a novel framework, dubbed StereoNeRF, which integrates stereo matching into a NeRF-based generalizable view synthesis approach. StereoNeRF is equipped with three key components to effectively exploit stereo matching in novel-view synthesis: a stereo feature extractor, a depth-guided plane-sweeping, and a stereo depth loss. Moreover, we propose the StereoNVS dataset, the first multi-view dataset of stereo-camera images, encompassing a wide variety of both real and synthetic scenes. Our experimental results demonstrate that StereoNeRF surpasses previous approaches in generalizable view synthesis. ......

Motion-aware Latent Diffusion Models for Video Frame Interpolation

图片

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark ......

Music Consistency Models

图片

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with shared constraints. Experimental results reveal the effectiveness of our model in terms of......

EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively  Constrained Global Bundle Adjustment

图片

We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system utilizing Neural Radiance Fields (NeRF). Although recent NeRF-based SLAM systems have demonstrated encouraging outcomes, they have yet to completely leverage NeRF's capability to constrain pose optimization. By employing an effectively constrained global bundle adjustment (BA) strategy, our system makes use of NeRF's implicit loop closure correction capability. This improves the tracking accuracy by reinforcing the constraints on the keyframes that are most pertinent to the optimized current frame. In addition, by implementing a feature-based and uniform sampling strategy that minimizes the number of ineffective constraint points for pose optimization, we mitigate the effects of random sampling in NeRF. EC-SLAM utilizes sparse parametric encodings and the truncated signed distance field (TSDF) to represent the map in order to facilitate efficient fusion, resulting in reduced mode......

Pixel is a Barrier: Diffusion Models Are More Adversarially Robust Than  We Think

图片

Adversarial examples for diffusion models are widely used as solutions for safety concerns. By adding adversarial perturbations to personal images, attackers can not edit or imitate them easily. However, it is essential to note that all these protections target the latent diffusion model (LDMs), the adversarial examples for diffusion models in the pixel space (PDMs) are largely overlooked. This may mislead us to think that the diffusion models are vulnerable to adversarial attacks like most deep models. In this paper, we show novel findings that: even though gradient-based white-box attacks can be used to attack the LDMs, they fail to attack PDMs. This finding is supported by extensive experiments of almost a wide range of attacking methods on various PDMs and LDMs with different model structures, which means diffusion models are indeed much more robust against adversarial attacks. We also find that PDMs can be used as an off-the-shelf purifier to effectively remove the adver......

FilterPrompt: Guiding Image Transfer in Diffusion Models

图片

In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This integration enables the ima......

BACS: Background Aware Continual Semantic Segmentation

图片

Semantic segmentation plays a crucial role in enabling comprehensive scene understanding for robotic systems. However, generating annotations is challenging, requiring labels for every pixel in an image. In scenarios like autonomous driving, there's a need to progressively incorporate new classes as the operating environment of the deployed agent becomes more complex. For enhanced annotation efficiency, ideally, only pixels belonging to new classes would be annotated. This approach is known as Continual Semantic Segmentation (CSS). Besides the common problem of classical catastrophic forgetting in the continual learning setting, CSS suffers from the inherent ambiguity of the background, a phenomenon we refer to as the "background shift'', since pixels labeled as background could correspond to future classes (forward background shift) or previous classes (backward background shift). As a result, continual learning approaches tend to fail. This paper proposes a Backward Backgro......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1623838.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

文件上传复习(upload-labs18-19关)

Pass-18&#xff08;条件竞争&#xff09; 代码和第17关大差不差&#xff0c;所以查看提示 需要用到代码审计 上传图片木马配合解析漏洞进行getshell 新建一句话木马 18.php&#xff0c;代码为&#xff1a; <?php fputs(fopen(../upload/shell18.php,w),<?php phpin…

js的算法-交换排序(冒泡)

交换排序 所谓交换排序&#xff0c;是指根据序列中两个元素关键字的比较结果来对换这两个记录在序列中的位置。基于交换的排序算法很多&#xff0c;本次介绍冒泡排序和快速排序。 冒泡 基本思想 从后往前&#xff08;或从前往后&#xff09;两两比较相邻元素的值&#xff0…

新风口下的必应bing国内广告投放该怎么做?

必应Bing作为全球搜索引擎市场的重要参与者&#xff0c;正逐渐显现出其在国内市场的独特价值和潜力。随着互联网生态的多元化发展&#xff0c;必应Bing凭借其高质量用户群和精准投放能力&#xff0c;成为了企业寻求新增长点的新风口。 一、洞察先机&#xff0c;精准定位市场 …

考研数学|跟武忠祥做《660》很吃力,要不要换张宇❓

不建议&#xff01; 我就是个妥妥的二战选手&#xff0c;一战听完汤家凤的课发现大量的题还是不会做&#xff0c;于是冒险把张宇的基础课听完一遍&#xff0c;毫不夸张的硕导致我最后没有强化阶段直接进入冲刺。可想而知&#xff0c;这能考好吗&#xff1f;21年数三87分&#…

ChromaDB教程

使用 Chroma DB&#xff0c;管理文本文档、将文本嵌入以及进行相似度搜索。 随着大型语言模型 &#xff08;LLM&#xff09; 及其应用的兴起&#xff0c;我们看到向量数据库越来越受欢迎。这是因为使用 LLM 需要一种与传统机器学习模型不同的方法。 LLM 的核心支持技术之一是…

JavaEE:File类查询一个文件的路径(举例+源码 )

一、File类概述 Java 中通过 java.io.File 类来对一个文件&#xff08;包括目录&#xff09;进行抽象的描述。File 类中的方法可以对文件路径以及文件名等信息进行查询&#xff0c;也可以对文件进行各项增删改操作&#xff0c;本文主要介绍 File 类的查询方法。 二、代码示例 …

git忽略文件配置 !

.gitignore中!表示取反 注意&#xff0c;如果父目录被排除&#xff0c;则父目录下的子目录也会被排除&#xff0c;此时对父目录下的子目录取反也不会生效&#xff0c;比如存在目录结构&#xff0c;再.gitignore目录下配置的 /*&#xff08;排除所有文件&#xff09;&#xff0c…

构建Java线程间的默契:学习wait()、notify()和notifyAll()方法的巧妙运用

哈喽&#xff0c;各位小伙伴们&#xff0c;你们好呀&#xff0c;我是喵手。 今天我要给大家分享一些自己日常学习到的一些知识点&#xff0c;并以文字的形式跟大家一起交流&#xff0c;互相学习&#xff0c;一个人虽可以走的更快&#xff0c;但一群人可以走的更远。 我是一名后…

【源码】权益商城系统源码,支持多种支付方式

权益商城系统源码&#xff0c;支持多种支付方式&#xff0c;后台商品管理&#xff0c;订单管理&#xff0c;串货管理&#xff0c;分站管理&#xff0c; 会员列表&#xff0c;分销日志&#xff0c;应用配置。 上传到服务器&#xff0c;修改数据库信息&#xff0c;导入数据库&a…

C语言入门课程学习记录4

C语言入门课程学习记录4 第18课 - signed 与 unsigned第19课 - 再论数据类型第20课 - 经典问题剖析第21课 - 程序中的辅助语句&#xff08;上&#xff09;第22课 - 程序中的辅助语句&#xff08;下&#xff09; 本文学习自狄泰软件学院 唐佐林老师的 C语言入门课程&#xff0c;…

【高频】基于GBDT-FM模型的level-2高频数据实证研究(二)

【高频】基于GBDT-FM模型的level-2高频数据实证研究&#xff08;二&#xff09; 原创 Yud. 2AMquant 2024-04-04 11:30 广东 上一篇中初步提及了Level2数据中常见变量指标的构建方式&#xff0c;以及其带来的价格冲击。此篇将使用GBDT-LM模型对短程价格走势进行简单预测。 ps…

前端css中的transform(转换)的使用

前端css中的transform的使用 一、前言二、流程图三、举例&#xff08;一&#xff09;、平移1.平移&#xff0c;源码12.源码1运行效果(1).视频效果(2).截图效果 3.平移3d效果&#xff0c;源码24.源码2运行效果&#xff08;1&#xff09;、视频效果&#xff08;2&#xff09;、截…

【C语言】红黑树详解以及C语言模拟

一、红黑树的性质二、红黑树的旋转操作三、红黑树的插入操作四、红黑树的删除操作五、红黑树的应用六、C语言模拟红黑树七、总结 红黑树是一种自平衡二叉查找树&#xff0c;它能够保持树的平衡&#xff0c;从而确保查找、插入和删除的最坏情况时间复杂度为O( l o g n log_n log…

41-数组 _ 数组作为函数参数

41-1 冒泡排序函数的设计 数组传参的时候&#xff0c;形参有2种写法&#xff1a; 1、数组 2、指针 往往我们在写代码的时候&#xff0c;会将数组作为参数传个函数 如&#xff1a;实现一个冒泡排序&#xff0c;将数组的数据排成升序 冒泡排序的核心思想&#xff1a; 1、两…

新能源汽车小米su7

小米su7汽车 function init() {const container document.querySelector( #container );camera new THREE.PerspectiveCamera( 20, window.innerWidth / window.innerHeight, 1, 50000 );camera.position.set( 0, 700, 7000 );scene new THREE.Scene();scene.background ne…

kubebuilder(4)部署测试

将crd部署到k8s make install 日志&#xff1a; kustomize build config/crd | kubectl apply -f - customresourcedefinition.apiextensions.k8s.io/demoes.tutorial.demo.com created 查看下[rootpaas-m-k8s-master-1 demo-operator]# kubectl api-resources | grep demo de…

yolov8 区域声光报警+计数

yolov8 区域报警计数 1. 基础2. 报警功能2. 1声音报警代码2. 2画面显示报警代码 3. 完整代码4. 源码 1. 基础 本项目是在 yolov8 区域多类别计数 的基础上实现的&#xff0c;具体区域计数原理可见上边文章 2. 报警功能 设置一个区域region_points&#xff0c;当行人这一类别…

【AIGC调研系列】Phi-3 VS Llama3

2024-04-24日发布的Phi-3系列模型在多个方面展现出了对Llama-3的性能优势。首先&#xff0c;Phi-3-small&#xff08;7B参数&#xff09;在MMLU上的得分高于Llama-3-8B-Instruct模型&#xff0c;分别为75.3%和66%[1]。此外&#xff0c;具有3.8B参数的Phi-3 Mini在性能上优于Lla…

解密Java多线程同步:掌握线程间同步与互斥技巧

哈喽&#xff0c;各位小伙伴们&#xff0c;你们好呀&#xff0c;我是喵手。 今天我要给大家分享一些自己日常学习到的一些知识点&#xff0c;并以文字的形式跟大家一起交流&#xff0c;互相学习&#xff0c;一个人虽可以走的更快&#xff0c;但一群人可以走的更远。 我是一名后…

JavaScript:js实现在线五子棋人机(人人)对弈

在线五子棋人机对弈 全部使用前端技术,使用HTML,CSS以及JS进行实现. 棋盘在后端就是一个15*15的二维数组 页面设计 页面设计的比较粗糙 主要使用js自带的canvas画布进行绘画 HTML代码如下: <div class"outer"><canvas id"canvas" height&qu…