语音与语言处理技术交流会(深圳)

news2024/10/6 7:29:53

 

嘉宾介绍

嘉宾介绍:罗艺,2021年在美国哥伦比亚大学获得博士学位后加入腾讯AI Lab Shenzhen任高级研究员,研究方向主要为音频前端处理,包括但不限于音频分离、单/多通道语音增强等。

报告题目:腾讯AI Lab音频与语音前端处理进展

摘要:本报告介绍腾讯AI Lab音频与语音前端处理团队在音频分离、语音增强、多通道语音处理等方向的研究进展,包括腾讯AI Lab在数据仿真、模型设计、应用场景等方面的探索。

嘉宾介绍:Wei Xue (雪巍) is currently an Assistant Professor at Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science and Technology (HKUST). He received the Bachelor degree in automatic control from Huazhong University of Science and Technology in 2010, and the Ph.D degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences in 2015. From August 2015 to September 2018 he was first a Marie Curie Experienced Researcher and then a Research Associate in Speech and Audio Processing Group, Department of Electrical & Electronic Engineering, Imperial College London, UK. He was a Senior Research Scientist at JD AI Research, Beijing, from November 2018 to December 2021, where he was leading the R&D on front-end speech processing and acoustic modelling for robust speech recognition. From January 2022 to April 2023 he was an Assistant Professor at Department of Computer Sciences, Hong Kong Baptist University. He was a visiting scholar at Université de Toulon and KU Leuven. Wei's research interests are in speech and music intelligence, including AI music generation, speech enhancement and separation, room acoustics, as well as speech and audio event recognition. He was a former Marie Curie Fellow and was selected into the Beijing Overseas Talent Aggregation Project. He currently leads the AI music research in the theme-based Art-Tech project which totally received HK$52.8 million from Hong Kong RGC.

报告题目:Audio Content Generation: Building Digitalized Human and Humanized AI

摘要:We are entering a new era in which the real and virtual worlds are indistinguishable; interactions between the real and virtual worlds remove the physical barriers between people and define new ways of entertainment, healthcare, and communication. Building a new generation of content generation and interaction over human, machine and environment in terms of audio is essential. We will introduce our progresses in recent months in this talk. Specifically, we will introduce how to digitalize the voice of an arbitrary person to produce the virtual singer, which empowers the AI choir in the world’s first human-machine collaborative symphony orchestra at Hong Kong; We will also introduce CoMoSpeech, which adopts the consistency model for speech synthesis and achieves an inference speed more than 150 times faster than real-time on a single NVIDIA A100 GPU, making diffusion-sampling based speech synthesis truly practical.

嘉宾介绍:My name is Yuancheng Wang, Currently I am a senior student majoring in computer science at The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), supervised by Prof. Zhizheng Wu. Now I am a research intern at Microsoft Research Asia, working closely with Xu Tan. Recently, I am focusing on diffusion-based audio and speech generation.

报告题目:Audio Editing by Following Instructions with Latent Diffusion Models

摘要:Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution).

嘉宾介绍:Dr Yannan Wang received his B.S. degree and the Ph.d. degree in Electrical Engineering and Information Science from University of Science and Technology of China in 2011 and 2017 respectively. He is currently a Senior Researcher at Tencent Ethereal Lab. His research interests involve in speech enhancement, speech separation, voice conversion, voice activity detection, speech dereverberation, acoustic scene classification, sound event localization and detection.

报告题目:Speech signal improvement in real-time communication

摘要:Real-time communication (RTC) systems become a necessity in the life and work of individuals, especially in teleconferencing systems. Speech quality is the key element for communication experience. However various problems degrades the speech quality including acoustical capturing, noise/reverberation corruption, bad device acquisition performance and network congestion, etc. In this talk I would like to present our attempts to promote speech signal, especially in far-field scenario when the environment is complex with lower SNR. In the future we would like to devote more effort in more types of speech quality improvement task.

嘉宾介绍:丁万,现任优必选科技人形机器人事业部专家工程师。本科毕业于武汉大学,博士毕业于华中师范大学,曾在新加坡科技局资讯通信研究院任博士后及科学家(Scientist I)职位,主要研究方向为多模态情感识别和多模态语音合成。2019年入职优必选,主要负责优必选在/离线语音合成技术核心算法研发及产品化工作。参与编写了《支持语音和视觉交互的虚拟数字人技术规范》。曾获 EmotioNet 2017脸部动作单元识别挑战赛第一名,MEC 2017多模态情感识别竞赛第二名, ACII Asia 2018 Outstanding Paper Award等荣誉。

报告题目:优必选的多模态机器学习技术

摘要:人形机器人产品需要通过多模态信息来实现准确地感知和表达。相比传统方法,基于深度学习的多模态识别和合成能够达到更好的效果,但是在落地时仍然需要注意过拟合、实时性等问题。本次报告向大家介绍优必选在多模态机器学习方面进行的一些工作。具体包括多模态情感识别、多模态抑郁症检测和2D数字人合成等。

参与方式

本次活动采用线上、线下结合的方式进行

一、线下参加 }

点下方链接,填写报名信息

👇👇👇

问卷系统

二、线上参加 }

直播将通过CSDN进行直播手机端、PC端可同步观看

👇👇👇

https://live.csdn.net/room/weixin_48827824/YOuhJcTg

活动奖品

线下参会观众将有机会通过抽奖环节得到

语音之家棒球帽

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/531652.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

网站备案:阿里云ICP备案服务码是什么?申请流程来了

阿里云备案服务码是什么?ICP备案服务码怎么获取?阿里云备案服务码分为免费和付费两种,申请备案服务码是有限制条件的,需要你的阿里云账号下有可用于申请备案服务码的云产品,如云服务器、建站产品、虚拟主机等&#xff…

关于扇区、簇、块、页等概念的区分

1、什么是扇区和(磁盘)块? 物理层面:一个磁盘按层次分为 : 磁盘组合 -> 单个磁盘 -> 某一盘面 (platter)-> 某一磁道 (track)-> 某一扇区(secto…

未针对内部单位 D 的语言 ZH 定义任何语言特定的单位

在写ABAP程序的时候调用功能函数或者调用BDC的时候会要求输入单位。当我们正常输入单位后调用函数就会报错 提示:未针对内部单位 ** 的语言 ZH 定义任何语言特定的单位。但是我们检查表T006和T006A两个后台表的时候, 发现单位在两个表中都存在&#xff0…

vue通过sync标识符 在子组件中更便捷的修改父组件的值

这里 我们创了一个vue2 项目 根组件 App.vue参考代码如下 <template><div><span>{{ name }}</span><text-data :name "name" /></div> </template><script> import textData from "/components/textData&quo…

python网络爬虫笔记20:批量下载图片并将其转换为pdf文档

对于有些网页,你可以预览所有的页面内容,并且也可以通过F12获取到页面的URL,但是面对动辄几十页的图片,手动下载显然是不可行的。 在这里我们给出一个人机交互的通用解决策略。 第一步:使用F12获取页面所有感兴趣图片的URL 这一步看似简单,其实也暗藏玄机。因为有些网…

Java的继承与组合

继承可以帮助实现类的复用。 所以&#xff0c;很多开发人员在要复用代码时会自然的使用类的继承的方式。 但是&#xff0c;遇到想要复用的场景就直接使用继承&#xff0c;这样做是不对的。长期大量的使用继承会给代码带来很高的维护成本。 本文将介绍一种可以帮助复用的新的…

速锐得解码奔驰Actros 系列网关CAN总线应用车载互联微系统

近年来&#xff0c;改变信号处理方式的低成本高速电子电路和制造技术的进步推动了传感技术的发展。借助这些协同领域内的新发展&#xff0c;传感器和制造商可以采用一套全新的方法&#xff0c;如远程自监控和自校准系统智能化&#xff0c;来提高产品的性能。 类似的&#xff0c…

数据结构与算法lab1-哈工大

title: 数据结构lab1-一元多项式的代数运算 date: 2023-05-16 11:42:26 tags: 数据结构与算法 git地址&#xff1a;https://github.com/944613709/HIT-Data-Structures-and-Algorithms 哈尔滨工业大学计算机科学与技术学院 实验报告 课程名称&#xff1a;数据结构与算法 课…

探索iOS之AudioUnit音效框架

iOS的AVAudioUnit提供的音效包括&#xff1a;混响、延迟、均衡器、失真、变速、变调等。按照类型划分为Audio Effect和Time Effect&#xff0c;其中Audio Effect包括混响、延迟、均衡器和失真&#xff0c;而Time Effect主要是变速、变调。 一、音效应用层框架 音效的应用层框…

Kali-linux使用Metasploit基础

Metasploit是一款开源的安全漏洞检测工具。它可以帮助用户识别安全问题&#xff0c;验证漏洞的缓解措施&#xff0c;并对某些软件进行安全性评估&#xff0c;提供真正的安全风险情报。当用户第一次接触Metasploit渗透测试框架软件&#xff08;MSF&#xff09;时&#xff0c;可能…

限速神器RateLimiter源码解析 | 京东云技术团队

作者&#xff1a;京东科技 李玉亮 目录指引 限流场景 软件系统中一般有两种场景会用到限流&#xff1a; •场景一、高并发的用户端场景。 尤其是C端系统&#xff0c;经常面对海量用户请求&#xff0c;如不做限流&#xff0c;遇到瞬间高并发的场景&#xff0c;则可能压垮系统…

优秀的产品经理需要具备的能力和素质

1. 适应性强。市场不断发展&#xff0c;用户的需求也在不断变化。如果产品不能满足需求&#xff0c;那就改变路线&#xff1b;如果会议不再有效&#xff0c;取消它&#xff1b;如果你需要更多的帮助&#xff0c;尽管开口。了解沉没成本&#xff0c;并采取措施使产品朝着正确的方…

Nature Neuroscience:焦虑为何导致“社恐”?李晓明团队揭示相关脑机制

焦虑是一种常见的负面情绪&#xff0c;也是当今社会的一个热词。在刚刚落幕的《脱口秀大会第五季》中&#xff0c;鸟鸟以一句“躺的时候想卷&#xff0c;卷的时候想躺&#xff0c;永远年轻&#xff0c;永远左右为难&#xff0c;一切都是最不好的安排。”戳中了无数观众的“焦”…

服务器架构-架构图(三)

前言 项目不同&#xff0c;架构自然也不同&#xff0c;所以没有唯一的架构&#xff0c;只有合适项目的架构。 这章以休闲类手游为例。 1&#xff1a;架构图 2张差别&#xff0c;就是中间件 用中间件 主要 异步化提升性能、降低耦合度、流量削峰 根据需求选择一种服务器间的消息…

Sui NFT应用实例:将NFT变成咖啡!

近期在台北智慧城市峰会和博览会中&#xff0c;展示了使用NFT购买咖啡的系统。 在2023年3月28–31日举行的台北智慧城市峰会和博览会中&#xff0c;参与者向大家演示了如何使用NFT兑换一杯香醇的咖啡。此系统由Sui基金会、MomentX以及Suia共同创建&#xff0c;演示了如何使用在…

【牛客网面试必刷TOP101】链表篇(一)

【牛客网面试必刷TOP101】链表篇&#xff08;一&#xff09; 前言刷题网站刷题&#xff01;BM1 反转链表思路一&#xff1a;用栈解决思路二&#xff1a;双链表求解思路三&#xff1a;递归解决 总结 BM2 链表内指定区间反转思路一&#xff1a;头插法迭代思路二&#xff1a;递归 …

几乎涵盖微服务所有操作,阿里2023最新SpringCloudAlibaba实战进阶笔记太强了

近两年&#xff0c;“大厂裁员”总是凭实力冲上各大媒体头条&#xff0c;身在局中的我们早已习以为常。国内的京东&#xff0c;阿里&#xff0c;腾讯&#xff0c;字节&#xff0c;快手&#xff0c;小米等互联网公司都以不同程度的裁员比例向社会输送人才。大量有大厂经验的卷王…

常用性能指标、性能指标评估及性能测试通过标准

一、常用性能指标 1、并发用户数&#xff1a; 指同一时间点对系统进行操作的用户数。准确说为"同时向服务器发送服务请求,给服务器产生压力的用户数量" 并发用户数和注册用户数、在线用户数的概念不同&#xff1a; 并发用户数一定会对服务器产生压力的&#xff0…

Mac FortiClient VPN一直连接不上?正确的安装步骤来了!

Mac FortiClient完整版安装 1、下载FortiClient 这里使用FortiClient 7.0.7的版本。登录fortiClient 7.x 网站下载FortiClient_7.0.7.0245_macosx.dmg。 2、当然要是不想注册&#xff0c;也可以点击这里&#xff01; 3、或者安装我这里的固定版本【百度网盘】提取码&#xff…

极狐GitLab as Code,全面升级你的 GitOps 体验

&#x1f4a1; 近日&#xff0c;由微软和英特尔联合发起的第二届开源云原生开发者日&#xff08;Open Source Cloud Native Developer Day&#xff09;上海站顺利落幕。极狐(GitLab) 资深云原生架构师郭旭东在会上进行了《深度探索 GitOps 平台的更多可能》主题演讲&#xff0c…