=================================================================
音视频入门基础:WAV专题系列文章:
音视频入门基础:WAV专题(1)——使用FFmpeg命令生成WAV音频文件
音视频入门基础:WAV专题(2)——WAV格式简介
音视频入门基础:WAV专题(3)——FFmpeg源码中,判断某文件是否为WAV音频文件的实现
音视频入门基础:WAV专题(4)——FFmpeg源码中获取WAV文件音频压缩编码格式、采样频率、声道数量、采样位数、码率的实现
音视频入门基础:WAV专题(5)——FFmpeg源码中解码WAV Header的实现
=================================================================
一、引言
通过FFmpeg命令可以获取到WAV文件的音频压缩编码格式、采样频率、声道数量、采样位数、码率等信息:
在vlc中也可以获取到这些信息(vlc底层也使用了FFmpeg进行解码):
所以FFmpeg和vlc是怎样获取到这些信息的呢?它们其实是通过WAV Header中的标签为“fmt ”的子区块获取的。在文章《音视频入门基础:WAV专题(2)——WAV格式简介》中,介绍了WAV格式和WAV的Header。WAV Header中的内容以区块(chunk)为最小单位,标签为“fmt ”的子区块被称为Format chunk,记录声道数量、采样率等信息:
而FFmpeg源码(本文演示用的FFmpeg源码版本为5.0.3)中是通过ff_get_wav_header函数来解码Format chunk,获取里面信息的。
二、ff_get_wav_header函数的声明
ff_get_wav_header函数声明在FFmpeg源码的头文件libavformat/riff.h中:
int ff_get_wav_header(AVFormatContext *s, AVIOContext *pb, AVCodecParameters *par, int size, int big_endian);
该函数作用是:解码WAV Header中的Format chunk,获取里面的信息。
形参s:输入型参数。为AVFormatContext类型的指针,这里主要用于打印日志,可忽略。
形参pb:既是输入型参数也是输出型参数。指向一个AVIOContext 类型变量,包含WAV文件中的二进制数据。
pb->buffer:恒指向输入缓冲区的开头,该缓冲区包含WAV文件最前面的二进制数据。由于WAV Header在WAV文件的最前面,所以该缓冲区包含整个WAV Header的二进制数据。
pb->buffer_size:pb->buffer指向的缓冲区的大小,单位为字节。FFmpeg解码WAV Header的时候不会读取完整个WAV文件,只会读取它前面的一部分,比如最开始的32768个字节。只要根据前面的这些字节就足够判断出它的格式了,所以p->buf_size的值一般就是32768。
pb->buf_ptr:指向输入缓冲区中当前读取到的位置。由于当前要读取的是Format chunk,所以pb->buf_ptr指向以“Format chunk”中的“音讯格式”为开头的数据。执行ff_get_wav_header函数后,pb->buf_ptr的值会增加,表示读取完了Format chunk:
pb->buf_end:恒指向输入缓冲区的末尾。
形参par:输出型参数。指向一个AVCodecParameters类型变量,也就是描述编码流属性的结构体。执行ff_get_wav_header函数后,通过par中的成员拿到Format chunk中的声道数量、采样率等信息。AVCodecParameters结构体部分成员如下:
/**
* This struct describes the properties of an encoded stream.
*
* sizeof(AVCodecParameters) is not a part of the public ABI, this struct must
* be allocated with avcodec_parameters_alloc() and freed with
* avcodec_parameters_free().
*/
typedef struct AVCodecParameters {
/**
* General type of the encoded data.
*/
enum AVMediaType codec_type;
enum AVCodecID codec_id;
/**
* The average bitrate of the encoded data (in bits per second).
*/
int64_t bit_rate;
/**
* The number of bits per sample in the codedwords.
*
* This is basically the bitrate per sample. It is mandatory for a bunch of
* formats to actually decode them. It's the number of bits for one sample in
* the actual coded bitstream.
*
* This could be for example 4 for ADPCM
* For PCM formats this matches bits_per_raw_sample
* Can be 0
*/
int bits_per_coded_sample;
/**
* Audio only. The number of audio channels.
*/
int channels;
/**
* Audio only. The number of audio samples per second.
*/
int sample_rate;
/**
* Audio only. The number of bytes per coded audio frame, required by some
* formats.
*
* Corresponds to nBlockAlign in WAVEFORMATEX.
*/
int block_align;
//...
} AVCodecParameters;
执行ff_get_wav_header函数后:
par->codec_type会被赋值为AVMEDIA_TYPE_AUDIO,表示它对应的这路流是音频。
par->codec_id会被赋值为解码器的id。比如对于PCM这种音频压缩编码格式,其解码器id有如下类型:
/* various PCM "codecs" */
AV_CODEC_ID_FIRST_AUDIO = 0x10000, ///< A dummy id pointing at the start of audio codecs
AV_CODEC_ID_PCM_S16LE = 0x10000,
AV_CODEC_ID_PCM_S16BE,
AV_CODEC_ID_PCM_U16LE,
AV_CODEC_ID_PCM_U16BE,
AV_CODEC_ID_PCM_S8,
AV_CODEC_ID_PCM_U8,
AV_CODEC_ID_PCM_MULAW,
AV_CODEC_ID_PCM_ALAW,
AV_CODEC_ID_PCM_S32LE,
AV_CODEC_ID_PCM_S32BE,
AV_CODEC_ID_PCM_U32LE,
AV_CODEC_ID_PCM_U32BE,
AV_CODEC_ID_PCM_S24LE,
AV_CODEC_ID_PCM_S24BE,
AV_CODEC_ID_PCM_U24LE,
AV_CODEC_ID_PCM_U24BE,
AV_CODEC_ID_PCM_S24DAUD,
AV_CODEC_ID_PCM_ZORK,
AV_CODEC_ID_PCM_S16LE_PLANAR,
AV_CODEC_ID_PCM_DVD,
AV_CODEC_ID_PCM_F32BE,
AV_CODEC_ID_PCM_F32LE,
AV_CODEC_ID_PCM_F64BE,
AV_CODEC_ID_PCM_F64LE,
AV_CODEC_ID_PCM_BLURAY,
AV_CODEC_ID_PCM_LXF,
AV_CODEC_ID_S302M,
AV_CODEC_ID_PCM_S8_PLANAR,
AV_CODEC_ID_PCM_S24LE_PLANAR,
AV_CODEC_ID_PCM_S32LE_PLANAR,
AV_CODEC_ID_PCM_S16BE_PLANAR,
AV_CODEC_ID_PCM_S64LE,
AV_CODEC_ID_PCM_S64BE,
AV_CODEC_ID_PCM_F16LE,
AV_CODEC_ID_PCM_F24LE,
AV_CODEC_ID_PCM_VIDC,
AV_CODEC_ID_PCM_SGA,
par->bit_rate会被赋值为该音频的码率,单位为bits/s。
par->bits_per_coded_sample会被赋值为音频的采样位数。
par->channels会被赋值为声道数量。
par->sample_rate会被赋值为音频的采样频率,单位为Hz。
par->block_align会被赋值为“区块对齐”,即每个采样点所需的字节数。
形参size:输入型参数。Format chunk的子区块大小,也就是“子区块1大小”:
形参big_endian:输入型参数,表示WAV Header中的区块中的真正数据是按照小端还是大端存贮。如果该WAV文件遵守RIFF格式的规则,形参big_endian为0,表示小端;如果该WAV文件遵守RIFX格式的规则,形参big_endian为1,表示大端。
三、ff_get_wav_header函数的定义
ff_get_wav_header函数的定义在FFmpeg源码的源文件libavformat/riffdec.c中:
/* "big_endian" values are needed for RIFX file format */
int ff_get_wav_header(AVFormatContext *s, AVIOContext *pb,
AVCodecParameters *par, int size, int big_endian)
{
int id;
uint64_t bitrate = 0;
if (size < 14) {
avpriv_request_sample(s, "wav header size < 14");
return AVERROR_INVALIDDATA;
}
par->codec_type = AVMEDIA_TYPE_AUDIO;
if (!big_endian) {
id = avio_rl16(pb);
if (id != 0x0165) {
par->channels = avio_rl16(pb);
par->sample_rate = avio_rl32(pb);
bitrate = avio_rl32(pb) * 8LL;
par->block_align = avio_rl16(pb);
}
} else {
id = avio_rb16(pb);
par->channels = avio_rb16(pb);
par->sample_rate = avio_rb32(pb);
bitrate = avio_rb32(pb) * 8LL;
par->block_align = avio_rb16(pb);
}
if (size == 14) { /* We're dealing with plain vanilla WAVEFORMAT */
par->bits_per_coded_sample = 8;
} else {
if (!big_endian) {
par->bits_per_coded_sample = avio_rl16(pb);
} else {
par->bits_per_coded_sample = avio_rb16(pb);
}
}
if (id == 0xFFFE) {
par->codec_tag = 0;
} else {
par->codec_tag = id;
par->codec_id = ff_wav_codec_get_id(id,
par->bits_per_coded_sample);
}
if (size >= 18 && id != 0x0165) { /* We're obviously dealing with WAVEFORMATEX */
int cbSize = avio_rl16(pb); /* cbSize */
if (big_endian) {
avpriv_report_missing_feature(s, "WAVEFORMATEX support for RIFX files");
return AVERROR_PATCHWELCOME;
}
size -= 18;
cbSize = FFMIN(size, cbSize);
if (cbSize >= 22 && id == 0xfffe) { /* WAVEFORMATEXTENSIBLE */
parse_waveformatex(s, pb, par);
cbSize -= 22;
size -= 22;
}
if (cbSize > 0) {
if (ff_get_extradata(s, par, pb, cbSize) < 0)
return AVERROR(ENOMEM);
size -= cbSize;
}
/* It is possible for the chunk to contain garbage at the end */
if (size > 0)
avio_skip(pb, size);
} else if (id == 0x0165 && size >= 32) {
int nb_streams, i;
size -= 4;
if (ff_get_extradata(s, par, pb, size) < 0)
return AVERROR(ENOMEM);
nb_streams = AV_RL16(par->extradata + 4);
par->sample_rate = AV_RL32(par->extradata + 12);
par->channels = 0;
bitrate = 0;
if (size < 8 + nb_streams * 20)
return AVERROR_INVALIDDATA;
for (i = 0; i < nb_streams; i++)
par->channels += par->extradata[8 + i * 20 + 17];
}
par->bit_rate = bitrate;
if (par->sample_rate <= 0) {
av_log(s, AV_LOG_ERROR,
"Invalid sample rate: %d\n", par->sample_rate);
return AVERROR_INVALIDDATA;
}
if (par->codec_id == AV_CODEC_ID_AAC_LATM) {
/* Channels and sample_rate values are those prior to applying SBR
* and/or PS. */
par->channels = 0;
par->sample_rate = 0;
}
/* override bits_per_coded_sample for G.726 */
if (par->codec_id == AV_CODEC_ID_ADPCM_G726 && par->sample_rate)
par->bits_per_coded_sample = par->bit_rate / par->sample_rate;
return 0;
}
四、ff_get_wav_header函数的内部实现分析
ff_get_wav_header函数中,首先通过如下语句判断Format chunk的子区块大小,也就是“子区块1大小”是否小于14,如果小于14返回AVERROR_INVALIDDATA,表示不合法:
if (size < 14) {
avpriv_request_sample(s, "wav header size < 14");
return AVERROR_INVALIDDATA;
}
合法的“Format chunk的子区块大小”是不可能小于14,但可以等于14。如果等于14,则Format chunk不包含“位元深度”(采样位数):
然后让par->codec_type被赋值为AVMEDIA_TYPE_AUDIO,表示它对应的这路流是音频。
par->codec_type = AVMEDIA_TYPE_AUDIO;
根据该WAV文件是遵守RIFF格式还是RIFX格式的规则,按照小端/大端模式读取声道数量、音频采样率、码率、“区块对齐”(每个采样点所需的字节数)。这里用到avio_rXXX系列函数,关于它们的用法可以参考:《FFmpeg源码:avio_r8、avio_rl16、avio_rl24、avio_rl32、avio_rl64函数分析》。由于Format chunk中的音频码率单位为byte per second,所以得将该值乘8,得到以bits/s为单位的码率:
if (!big_endian) {
id = avio_rl16(pb);
if (id != 0x0165) {
par->channels = avio_rl16(pb);
par->sample_rate = avio_rl32(pb);
bitrate = avio_rl32(pb) * 8LL;
par->block_align = avio_rl16(pb);
}
} else {
id = avio_rb16(pb);
par->channels = avio_rb16(pb);
par->sample_rate = avio_rb32(pb);
bitrate = avio_rb32(pb) * 8LL;
par->block_align = avio_rb16(pb);
}
当“Format chunk的子区块大小”为14时,表示这是最简单版本不包含音频采样位数的Format chunk,让par->bits_per_coded_sample音频采样位数默认取值为8。如果“Format chunk的子区块大小”不为14,读取音频的采样位数:
if (size == 14) { /* We're dealing with plain vanilla WAVEFORMAT */
par->bits_per_coded_sample = 8;
} else {
if (!big_endian) {
par->bits_per_coded_sample = avio_rl16(pb);
} else {
par->bits_per_coded_sample = avio_rb16(pb);
}
}
根据音频压缩编码格式得到对应的解码器id:
if (id == 0xFFFE) {
par->codec_tag = 0;
} else {
par->codec_tag = id;
par->codec_id = ff_wav_codec_get_id(id,
par->bits_per_coded_sample);
}
“Format chunk的子区块大小”的值一般是16,如果大于16,表示Format chunk包含扩展块。通过下面语句处理Format chunk包含扩展块的情况:
if (size >= 18 && id != 0x0165) { /* We're obviously dealing with WAVEFORMATEX */
int cbSize = avio_rl16(pb); /* cbSize */
if (big_endian) {
avpriv_report_missing_feature(s, "WAVEFORMATEX support for RIFX files");
return AVERROR_PATCHWELCOME;
}
size -= 18;
cbSize = FFMIN(size, cbSize);
if (cbSize >= 22 && id == 0xfffe) { /* WAVEFORMATEXTENSIBLE */
parse_waveformatex(s, pb, par);
cbSize -= 22;
size -= 22;
}
if (cbSize > 0) {
if (ff_get_extradata(s, par, pb, cbSize) < 0)
return AVERROR(ENOMEM);
size -= cbSize;
}
/* It is possible for the chunk to contain garbage at the end */
if (size > 0)
avio_skip(pb, size);
} else if (id == 0x0165 && size >= 32) {
int nb_streams, i;
size -= 4;
if (ff_get_extradata(s, par, pb, size) < 0)
return AVERROR(ENOMEM);
nb_streams = AV_RL16(par->extradata + 4);
par->sample_rate = AV_RL32(par->extradata + 12);
par->channels = 0;
bitrate = 0;
if (size < 8 + nb_streams * 20)
return AVERROR_INVALIDDATA;
for (i = 0; i < nb_streams; i++)
par->channels += par->extradata[8 + i * 20 + 17];
}
音频采样率不可能不大于0,如果不大于0返回AVERROR_INVALIDDATA,表示不合法:
if (par->sample_rate <= 0) {
av_log(s, AV_LOG_ERROR,
"Invalid sample rate: %d\n", par->sample_rate);
return AVERROR_INVALIDDATA;
}