【语音之家公开课】SRD: A Dataset and Benchmark Perspective

news2026/3/31 6:05:06

本次语音之家公开课邀请到陈果果进行分享Speech Recognition Development: A Dataset and Benchmark Perspective。

公开课简介

主题：Speech Recognition Development: A Dataset and Benchmark Perspective

时间：12月15日（周四）14:00-15:00

陈果果

嘉宾介绍

Dr. Guoguo Chen holds a Ph.D. degree in Electrical and Computer Engineering from the Johns Hopkins University and a B.Eng. degree in Electronic Engineering from Tsinghua University. During his Ph.D., he spent 5 years at the Center for Language and Speech Processing, Johns Hopkins University, where he worked on various aspects of speech recognition and was one of the key contributors to the open source speech recognition toolkit Kaldi, and the open source deep learning toolkit CNTK. He was the author of LibriSpeech, one of the most cited (3,500+ Google Scholar citations) speech recognition dataset/benchmark. He also spent two summers at Google Inc. where he developed the prototype of Android's wake word detection engine for "Okay Google", serving billions of Android/Google Home users. After graduation, Dr. Chen co-founded KITT.AI, a CBInsights AI 100 company in 2017, that was funded by Amazon’s Alexa Fund, Paul Allen’s Allen Institute for Artificial Intelligence, Madrona Venture Group, Founders’ Co-op, and A Level Capital. The company released two products: a customizable wake word engine and a conversation AI toolkit. It had more than 100,000 developers and customers over 20 countries in 4 continents. In 2017 KITT.AI was acquired by Baidu, which set up its first Seattle office upon the KITT.AI deal. In 2020, Dr. Chen co-founded Seasalt.ai. Dr. Chen also initiated SpeechColab, a volunteer organization for the speech recognition community, which released one of the largest speech recognition dataset GigaSpeech, covering 10,000 hours of transcribed audio and 33,000 hours of total audio for speech recognition research.

课程摘要

The previous decade saw remarkable development in automatic speech recognition technologies. While there are a lot of technical articles explaining the improvements from the model point of view, the impact of datasets and benchmarks to speech recognition development is not well studied. In this talk, we first investigate the contribution of datasets and benchmarks to speech recognition development. We then introduce a large scale English speech recognition dataset named GigaSpeech. We will demonstrate the data creation pipeline, as well as initial benchmarks on this dataset. Finally, we close this talk by outlining our on-going work for speech recognition benchmarks.