Latent Guard、Tokenization in LLM、​3D Human Scan、FusionPortableV2

news2024/12/24 8:47:33

本文首发于公众号:机器感知

图片

https://mp.weixin.qq.com/s/HlVV3VnqocBI4XBOT6RFHg

A Multi-Level Framework for Accelerating Training Transformer Models

图片

The fast growing capabilities of large-scale deep learning models, such as Bert, GPT and ViT, are revolutionizing the landscape of NLP, CV and many other domains. Training such models, however, poses an unprecedented demand for computing power, which incurs exponentially increasing energy cost and carbon dioxide emissions. It is thus critical to develop efficient training solutions to reduce the training costs. Motivated by a set of key observations of inter- and intra-layer similarities among feature maps and attentions that can be identified from typical training processes, we propose a multi-level framework for training acceleration. Specifically, the framework is based on three basic operators, Coalescing, De-coalescing and Interpolation, which can be orchestrated to build a multi-level training framework. The framework consists of a V-cycle training process, which progressively down- and up-scales the model size and projects the parameters between adjacent levels of mode......

Latent Guard: a Safety Framework for Text-to-image Generation

图片

With the ability to generate high-quality images, text-to-image (T2I) models can be exploited for creating inappropriate content. To prevent misuse, existing safety measures are either based on text blacklists, which can be easily circumvented, or harmful content classification, requiring large datasets for training and offering low flexibility. Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation. Inspired by blacklist-based approaches, Latent Guard learns a latent space on top of the T2I model's text encoder, where it is possible to check the presence of harmful concepts in the input text embeddings. Our proposed framework is composed of a data generation pipeline specific to the task using large language models, ad-hoc architectural components, and a contrastive learning strategy to benefit from the generated data. The effectiveness of our method is verified on three datasets and against four baselines. Code and data w......

Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic  Segmentation

图片

Despite the significant progress in deep learning for dense visual recognition problems, such as semantic segmentation, traditional methods are constrained by fixed class sets. Meanwhile, vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks, owing to their robust generalizability. Recently, a body of work has investigated utilizing these models in open-vocabulary semantic segmentation (OVSS). However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks. In this work, we propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP), representing a straightforward adaptation of CLIP tailored for this scenario. Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature. By incorporati......

Toward a Theory of Tokenization in LLMs

图片

While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant language models. In this paper, we investigate tokenization from a theoretical point of view by studying the behavior of transformers on simple data generating processes. When trained on data drawn from certain simple $k^{\text{th}}$-order Markov processes for $k > 1$, transformers exhibit a surprising phenomenon - in the absence of tokenization, they empirically fail to learn the right distribution and predict characters according to a unigram model (Makkuva et al., 2024). With the addition of tokenization, however, we empirically observe that transformers break through this barrier and are able to model the probabilities of sequences drawn from the source near-optimally, achieving small cross-entropy loss. With this observation a......

OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering

图片

Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the fe......

3D Human Scan With A Moving Event Camera

图片

Capturing the 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed meth......

FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM  Across Diverse Platforms and Scalable Environments

图片

Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM dataset featuring notable sensor diversity, varied motion patterns, and a wide range of environmental scenarios. Our dataset comprises $27$ sequences, spanning over $2.5$ hours and collected from four distinct platforms: a handheld suite, wheeled and legged robots, and vehicles. These sequences cover diverse settings, including buildings, campuses, and urban areas, with a total length of $38.7km$. Additionally, the dataset includes ground-truth (GT) trajectories and RGB point cloud maps covering approximately $0.3km^2$. To validate the utility of our dataset in advancing SLAM research, w......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1602366.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

WIN7用上最新版Chrome

1.下载WIN10最新版Chrome的离线安装包 谷歌浏览器 Chrome 最新版离线安装包下载地址 v123.0.6312.123 - 每日自动更新 | 异次元软件 文件名称:123.0.6312.123_chrome_installer.exe。 123.0.6312.123_chrome_installer.exe 文件右键解压缩得到 chrome.7z&#x…

为什么用云渲染农场?3D云渲染农场助力影视动画行业发展

​计算机图形技术的进步使得3D渲染成为多个产业发展的重要推动力。设计师和艺术家利用这项技术将创意实现,创造出震撼的视觉作品。但是,高质量的渲染需要大量的计算资源。云渲染农场通过提供这些资源,有效提高了渲染的速度和效率,…

【数据挖掘】实验8:分类与预测建模

实验8:分类与预测建模 一:实验目的与要求 1:学习和掌握回归分析、决策树、人工神经网络、KNN算法、朴素贝叶斯分类等机器学习算法在R语言中的应用。 2:了解其他分类与预测算法函数。 3:学习和掌握分类与预测算法的评…

【VTKExamples::Meshes】第十二期 QuadricDecimation

很高兴在雪易的CSDN遇见你 VTK技术爱好者 QQ:870202403 公众号:VTK忠粉 前言 本文分享VTK样例QuadricDecimation,并解析接口vtkQuadricDecimation,希望对各位小伙伴有所帮助! 感谢各位小伙伴的点赞+关注,小易会继续努力分享,一起进步! 你的点赞就是我的动力(…

服务器raid卡,守护数据安全,赋能新质生产力

RAID卡,全称为独立冗余磁盘阵列卡,在数据中心、服务器、网络存储等领域得到广泛应用,RAID卡通过不同的RAID级别实现数据容错和冗余。例如,RAID 0主要适用于需要高速数据传输但对数据安全要求不高的场景,如数据的缓存&a…

C++ 一些编程问题解决 (C++ some programming error solutions)

电脑配置:window10, 64位操作系统,基于x64的处理器,Microsoft Visual Studio Community 2019 Version 16.4.5 问题1:Unhandled exception at 0x00007FFDB39AA839 in TesseractLACadd1.exe: Microsoft C exception: boost::filesy…

Oracle执行计划优化SPM案例

1.现象 执行下面这段代码,发现子库存表走了全表扫描 SELECT msi.secondary_inventory_name, --子库存msi.description --库存说明FROM inv.mtl_secondary_inventories msi,csi_item_instances ciiWHERE msi.secondary_inventory_name cii.inv_subinve…

【Linux--多线程】

1 . Linux线程概念 1.1 什么是线程 在一个程序里的一个执行路线就叫做线程(thread)。更准确的定义是:线程是“一个进程内部的控制序列” 一切进程至少都有一个执行线程 线程在进程内部执行,本质是在进程地址空间内运行 Linux系…

P1157 组合的输出 (dfs深搜)

题目连接&#xff1a;P1157 组合的输出 - 洛谷 | 计算机科学教育新生态 (luogu.com.cn) 思路&#xff1a; AC代码&#xff1a; #include<iostream> #include<cstring>using namespace std;const int N 30; int st[N];//用来存这个数用没用过&#xff08;1~n个…

第十五届蓝桥杯复盘python大学A组——试题B 召唤数学精灵

按照正常思路解决&#xff0c;由于累乘消耗大量时间&#xff0c;因此这不是一个明智的解决方案。 这段代码执行速度非常慢的原因在于它试图计算非常大的数的阶乘&#xff08;累乘&#xff09;&#xff0c;并且对于每一个i的值都执行这个计算。阶乘的增长是极其迅速的&#xff…

graylog使用Sidecars方式收集springboot程序的日志

1、部署graylog后台服务 使用docker-compose启动三个服务程序&#xff0c;包括graylog、mongodb、opensearch。 docker-compose.yml内容如下 version: 3 services: # MongoDB: https://hub.docker.com/_/mongo/ mongodb: image: mongo:6.0.14 privileged: true …

Mybatis-plus动态数据源

由于服务没有做微服务部署&#xff0c;需要在后台管理系统访问其他服务的库&#xff0c;所以需要用到动态数据源切换 引入依赖 mybatis-plus动态数据源依赖 <dependency><groupId>com.baomidou</groupId><artifactId>dynamic-datasource-spring-boot…

掼蛋小技巧(下篇)

一、记断张和单牌 如果我们手上有断张&#xff0c;那么外面有九成概率成炸&#xff1b;我们的单张外面有七张&#xff0c;有七成概率成炸&#xff0c;实战中两家同时断同一张牌的概率很低&#xff0c;所以要时刻关注自己的断张和单张。 二、情况不明&#xff0c;对子先行 对子可…

【进程地址空间】进程的独立性 | 虚拟地址物理地址 | 页表 | 写时拷贝

目录 前言 基本概念 进程的独立性 虚拟地址&物理地址 进程地址空间 页表&#xff08;虚拟地址☞物理地址&#xff09; 写时拷贝 基本理解 地址空间 写时拷贝&#xff08;浅拷贝&#xff09; 数据独立性的保证☞写时拷贝 写时拷贝的优点 图解分析 前言 我们…

MySQL-多表查询:多表查询分类、SQL99语法实现多表查询、UNION的使用、7种SQL JOINS的实现、SQL99语法新特性、多表查询SQL练习

多表查询 1. 一个案例引发的多表连接1.1 案例说明1.2 笛卡尔积&#xff08;或交叉连接&#xff09;的理解1.3 案例分析与问题解决 2. 多表查询分类讲解分类1&#xff1a;等值连接 vs 非等值连接等值连接非等值连接 分类2&#xff1a;自连接 vs 非自连接分类3&#xff1a;内连接…

JavaSE图书管理系统实战

代码仓库地址&#xff1a;Java图书管理系统 1.前言 该项目将JavaSE的封装继承多态三大特性&#xff0c;使用了大量面向对象的操作&#xff0c;有利于巩固理解 &#xff08;1&#xff09;实现效果 2.实现步骤 第一步先把框架搭建起来&#xff0c;即创建出人&#xff1a;管理员和…

九、OOP面向对象程序设计(四)

1、this、super、static和final关键字的使用 (1)this关键字的使用 当成员变量和局部变量重名时,在方法中使用this时,表示的是该方法所在类中的成员变量。 把当前对象当作参数传递时,可以用this。 有时候,我们会用到一些内部类和匿名类,如事件处理。当在匿名类中用thi…

第20天:信息打点-红蓝队自动化项目资产侦察企查产权武器库部署网络空间

第二十天 一、工具项目-红蓝队&自动化部署 自动化-武器库部署-F8x 项目地址&#xff1a;https://github.com/ffffffff0x/f8x 介绍&#xff1a;一款红/蓝队环境自动化部署工具,支持多种场景,渗透,开发,代理环境,服务可选项等.下载&#xff1a;wget -O f8x https://f8x.io…

从入门到精通C++之类和对象(续)

目录 初始化列表构造函数&#xff1f;拷贝构造&#xff1f;浅谈explicit关键字友元 内部类static成员总结 初始化列表 引入初始化列表&#xff1a;简化代码&#xff0c;提高效率 在编程中&#xff0c;初始化列表是一种用于在创建对象时初始化成员变量的快捷方式。通过初始化列…

神仙级Python入门教程(超级详细),从零基础入门到精通!!

关于Python学习指南 学好 Python 不论是就业还是做副业赚钱都不错&#xff0c;但要学会 Python 还是要有一个学习规划。最后给大家分享一份全套的 Python 学习资料&#xff0c;给那些想学习 Python 的小伙伴们一点帮助&#xff01; 包括&#xff1a;Python激活码安装包、Pyth…