FFmpeg 解码 AAC 格式的音频

FFmpeg 默认是可以解码 AAC 格式的音频，但是如果需要获取 PCM16 此类数据则需要经过音频转码。首先要打开解码器，然后向解码器发送 AAC 音频帧（不带 ADTS），然后从解码器获取解码后的音频帧，数据是 float 类型的，如果需要则进行转码流程将 float 转成整型。

一、AAC 音频

AAC 是高级音频编码（Advanced Audio Coding）的缩写，出现于 1997 年，最初是基于 MPEG-2 的音频编码技术。由Fraunhofer IIS、Dolby Laboratories、AT&T、Sony 等公司共同开发，目的是取代 MP3 格式。2000 年，MPEG-4 标准出台，AAC 重新集成了其它技术（PS、SBR），为区别于传统的 MPEG-2 AAC，故含有 SBR 或 PS 特性的 AAC 又称为 MPEG-4 AAC。

AAC 是新一代的音频有损压缩技术，它通过一些附加的编码技术（比如 PS、SBR 等），衍生出了LC-AAC、HE-AAC、HE-AACv2 三种主要的编码，LC-AAC 就是比较传统的 AAC，相对而言，主要用于中高码率（>=80Kbps），HE-AAC（相当于AAC+SBR）主要用于中低码（<=80Kbps），而新近推出的 HE-AACv2 （相当于AAC+SBR+PS）主要用于低码率（<=48Kbps）,事实上大部分编码器设成 <=48Kbps 自动启用 PS 技术，而 >48Kbps 就不加 PS，就相当于普通的 HE-AAC。

1.1 种类

FFmpeg 中一共定义了十种 Profile 格式的 AAC，带 MPEG2 为 MPEG2 支持，其他的为 MPEG4 支持的。

avcodec.h

#define FF_PROFILE_AAC_MAIN 0
#define FF_PROFILE_AAC_LOW  1
#define FF_PROFILE_AAC_SSR  2
#define FF_PROFILE_AAC_LTP  3
#define FF_PROFILE_AAC_HE   4
#define FF_PROFILE_AAC_HE_V2 28
#define FF_PROFILE_AAC_LD   22
#define FF_PROFILE_AAC_ELD  38
#define FF_PROFILE_MPEG2_AAC_LOW 128
#define FF_PROFILE_MPEG2_AAC_HE  131

MAIN 代表主规格

LOW 低复杂度规格（Low Complexity）

SSR 可变采样率规格（Scaleable Sample Rate）

LTP 长时期预测规格（Long Term Predicition）

HE 高效率规格（High Efficiency）AAC+

HE_V2 高效率 V2 规格（High Efficiency V2）Enhanced AAC+

LD 低延迟规格（Low Delay）

ELD 增强低延迟规格（Enhanced low Low Delay）

1.2 格式

查看《ISO/IEC 13818-7》可进一步了解 AAC 音频的详细格式。AAC 的音频文件格式有 ADIF 和 ADTS。

ADIF：Audio Data Interchange Format

音频数据交换格式。这种格式的特征是可以确定的找到这个音频数据的开始，不需进行在音频数据流中间开始的解码，即它的解码必须在明确定义的开始处进行。故这种格式常用在磁盘文件中。

音频数据交换格式序列包括 ADIF 头，字节对齐和实际数据。

adif_id 表示音频数据交换格式的 ID。它的值是 0x41444946 （最高位在前），这是字符串“ADIF”的 ASCII 表示形式。

copyright_id_present 指示copyright_id是否存在。

copyright_id 该字段由一个8位的 copyright_identifier 和一个64位的 copyright_number 组成。

original_copy 参见 ISO/IEC 11172-3 第 2.4.2.3 款对版权的定义。

home 参见 ISO/IEC 11172-3，第 2.4.2.3 小节对 original_copy 的定义。

bitstream_type 指明位流类型的标志：

“0”恒速率比特流；

“1”可变速率比特流。

bitrate 一个23位无符号整数，指示在恒定速率的比特流中比特流的比特率，或在可变速率的比特流中最大峰值比特率（每帧测量）。0表示不知道比特率。

num_program_config_element program_config_element() 的数量。

adif_buffer_fullness 在 adif_sequence() 中对第一个 raw_data_block() 进行编码后 bit reservoir 的状态。

ADTS：Audio Data Transport Stream

音频数据传输流。这种格式的特征是它是一个有同步字的比特流，解码可以在这个流中任何位置开始。它的特征类似于 mp3 数据流格式。

一般情况下 ADTS 的头信息都是 7 个字节，分为 2 部分：

adts_fixed_header() —— 固定部

adts_variable_header() —— 可变部分

syncword 同步头，总是 0xFFF，代表着一个 ADTS 帧的开始。

ID MPEG 版本：0 代表 MPEG-4，1 代表 MPEG-2。

layer 总是 ‘00’。

profile 表示使用哪个级别的 AAC。

sampling_frequency_index 表示使用的采样率下标，通过这个下标在 Sampling Frequencies[] 数组中查找得知采样率的值。

channel_configuration表示声道数。

frame_length 一个 ADTS 帧的长度包括 ADTS 头和 AAC 原始流。

adts_buffer_fullness ADTS 帧编码过程中 bit reservoir 的状态，0x7FF 说明是码率可变的码流。

protection_absent 指示是否存在 error_check() 数据。

private_bit 参见 ISO/IEC 11172-3，第 2.4.2.3 条。

copyright_identification_bit 表示 72 位版权标识字段。

copyright_identification_start 表示 copyright_identification_bit

音频帧是 72 位版权标识的第一个位。如果没有版权标识被传送，这个位应该被保留 “0”。

number_of_raw_data_blocks_in_frame 被复用的 raw_data_block() 的数目。

二、AAC 音频解码

1.获取 AAC 解码器 Codec，调用 avcodec_find_decoder(AV_CODEC_ID_AAC) 获取；

2.调用 avcodec_alloc_context3(…) 分配 AVCodecContext 结构，它是解码器 Codec 上下文；

3.调用 avcodec_parameters_alloc() 分配 AVCodecParameters 结构，可用来给解码器设置必要参数；

4.将必要解码参数设置到 AVCodecParameters，采样率、声道数、解码后的格式（此处需要注意，实际上 AAC 解码器默认解码后的格式都是 AV_SAMPLE_FMT_FLTP）等；

5.接下来调用 avcodec_parameters_to_context(…) 将 AVCodecParameters 结构中的参数复制到 AVCodecContext，AVCodecParameters 结构完成使命调用 avcodec_parameters_free(…) 释放其内存；

6.现在可以调用 avcodec_open2(…) 打开解码器。

接下来就可以从 PacketQueue 队列（存放 AAC 编码的帧队列，不需要带 ADTS 头，因为解码器中的必要信息已经设置）源源不断获取 AAC 编码后的音频帧送入解码器进行解码。将编码帧送到解码器是调用 avcodec_send_packet(…) 实现的，然后就可以调用 avcodec_receive_frame(…) 获取解码帧。

由于有些平台并不支持 AV_SAMPLE_FMT_FLTP 格式的 PCM 直接播放，所以需要将 float PCM 转成 AV_SAMPLE_FMT_S16。

转码流程

1.调用 swr_alloc() 分配 SwrContext 转码上下文结构；

2.调用 swr_alloc_set_opts(…) 给转码上下文结构设置必要参数；

3.调用 swr_init(…) 初始化 SwrContext 转码上下文结构；

4.重复调用 swr_convert(…) 进行转码。

最后，不在使用的 SwrContext 结构、AVFrame 、AVCodecContext 全部都要调用其释放函数进行收尾工作。

FFmpeg 解码 AAC 格式音频代码

将 AAC 解码封装到 AudioDecoder 类中。

//
// Created by liuhongwei on 2021/12/7.
//
 
#ifndef AUDIODECODER_H
#define AUDIODECODER_H
 
extern "C" {
//编解码
#include "libavcodec/avcodec.h"
#include <libswresample/swresample.h>
}
 
#include "PacketQueue.h"
#include "cb/FrameDataCallback.h"
 
class AudioDecoder {
public:
    AudioDecoder(PacketQueue *packetQueue);
 
    ~AudioDecoder();
 
    bool open(unsigned int sampleFreq, unsigned int channels, unsigned int profile = 1);
 
    void close();
 
    void decode();
 
    static void *_decode(void *self) {
        static_cast<AudioDecoder *>(self)->decode();
        return nullptr;
    }
 
    void setFrameDataCallback(FrameDataCallback *frameDataCallback);
 
private:
    PacketQueue *pPacketQueue;
    AVCodecContext *pAudioAVCodecCtx;
    AVFrame *pFrame;
    unsigned int gSampleFreq;
 
    bool volatile isDecoding;
    pthread_t decodeThread;
    pthread_mutex_t *pFrameDataCallbackMutex;
    FrameDataCallback *pFrameDataCallback;
 
    SwrContext *pSwrContext;
    uint8_t *pPCM16OutBuf;
};
 
 
#endif //AUDIODECODER_H

具体实现

//
// Created by liuhongwei on 2021/12/7.
//
 
#include <unistd.h>
#include "AudioDecoder.h"
 
AudioDecoder::AudioDecoder(PacketQueue *packetQueue) {
    pPacketQueue = packetQueue;
    pFrameDataCallbackMutex = (pthread_mutex_t *) malloc(sizeof(pthread_mutex_t));
    int ret = pthread_mutex_init(pFrameDataCallbackMutex, nullptr);
    if (ret != 0) {
        LOGE("audio FrameDataCallbackMutex init failed.\n");
    }
 
    pFrameDataCallback = nullptr;
    pSwrContext = nullptr;
    pPCM16OutBuf = nullptr;
}
 
AudioDecoder::~AudioDecoder() {
    pthread_mutex_destroy(pFrameDataCallbackMutex);
 
    if (nullptr != pFrameDataCallbackMutex) {
        free(pFrameDataCallbackMutex);
        pFrameDataCallbackMutex = nullptr;
    }
}
 
bool AudioDecoder::open(unsigned int sampleFreq, unsigned int channels, unsigned int profile) {
    gSampleFreq = sampleFreq;
 
    int ret;
    AVCodec *dec = avcodec_find_decoder(AV_CODEC_ID_AAC);
    LOGI("%s audio decoder name: %s", __FUNCTION__, dec->name);
    enum AVSampleFormat sample_fmt = AV_SAMPLE_FMT_FLTP;//注意：设置为其他值并不生效
    int bytesPerSample = av_get_bytes_per_sample(sample_fmt);
 
    pAudioAVCodecCtx = avcodec_alloc_context3(dec);
 
    if (pAudioAVCodecCtx == nullptr) {
        LOGE("%s AudioAVCodecCtx alloc failed", __FUNCTION__);
        return false;
    }
 
    AVCodecParameters *par = avcodec_parameters_alloc();
    if (par == nullptr) {
        LOGE("%s audio AVCodecParameters alloc failed", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        return false;
    }
 
    par->codec_type = AVMEDIA_TYPE_AUDIO;
    par->sample_rate = (int) sampleFreq;
    par->channel_layout = av_get_default_channel_layout((int) channels);
    par->channels = (int) channels;
    par->bit_rate = sampleFreq * channels * bytesPerSample;
    par->format = sample_fmt;
    par->profile = (int) profile;
 
    avcodec_parameters_to_context(pAudioAVCodecCtx, par);
    avcodec_parameters_free(&par);
 
    LOGI("%s sample_rate=%d channels=%d bytesPerSample=%d", __FUNCTION__, sampleFreq, channels,
         bytesPerSample);
    ret = avcodec_open2(pAudioAVCodecCtx, dec, nullptr);
    if (ret < 0) {
        LOGE("%s Can not open audio encoder", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        return false;
    }
    LOGI("%s avcodec_open2 audio SUCC", __FUNCTION__);
    pFrame = av_frame_alloc();
    if (pFrame == nullptr) {
        LOGE("%s audio av_frame_alloc failed", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        return false;
    }
 
    pSwrContext = swr_alloc();
    if (pSwrContext == nullptr) {
        LOGE("%s swr_alloc failed", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        av_frame_free(&pFrame);
        return false;
    }
 
    swr_alloc_set_opts(
            pSwrContext,
            pAudioAVCodecCtx->channel_layout,
            AV_SAMPLE_FMT_S16,
            pAudioAVCodecCtx->sample_rate,
            pAudioAVCodecCtx->channel_layout,
            pAudioAVCodecCtx->sample_fmt,
            pAudioAVCodecCtx->sample_rate,
            0, nullptr
    );
 
    ret = swr_init(pSwrContext);
    if (ret != 0) {
        LOGE("%s swr_init failed", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        av_frame_free(&pFrame);
        swr_free(&pSwrContext);
        return false;
    }
 
    pPCM16OutBuf = (uint8_t *) malloc(
            av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) * 1024);
 
    if (pPCM16OutBuf == nullptr) {
        LOGE("%s PCM16OutBufs malloc failed", __FUNCTION__);
        avcodec_free_context(&pAudioAVCodecCtx);
        av_frame_free(&pFrame);
        swr_free(&pSwrContext);
        return false;
    }
 
    isDecoding = true;
    ret = pthread_create(&decodeThread, nullptr, &AudioDecoder::_decode, (void *) this);
    if (ret != 0) {
        LOGE("audio decode-thread create failed.\n");
        isDecoding = false;
        avcodec_free_context(&pAudioAVCodecCtx);
        av_frame_free(&pFrame);
        swr_free(&pSwrContext);
 
        free(pPCM16OutBuf);
        pPCM16OutBuf = nullptr;
        return false;
    }
 
    return true;
}
 
void AudioDecoder::close() {
    isDecoding = false;
    pthread_join(decodeThread, nullptr);
 
    if (pPCM16OutBuf != nullptr) {
        free(pPCM16OutBuf);
        pPCM16OutBuf = nullptr;
        LOGI("%s PCM16OutBuf free", __FUNCTION__);
    }
 
    if (pSwrContext != nullptr) {
        swr_free(&pSwrContext);
        LOGI("%s SwrContext free", __FUNCTION__);
    }
 
    if (pFrame != nullptr) {
        av_frame_free(&pFrame);
        LOGI("%s audio Frame free", __FUNCTION__);
    }
 
    if (pAudioAVCodecCtx != nullptr) {
        avcodec_free_context(&pAudioAVCodecCtx);
        LOGI("%s audio avcodec_free_context", __FUNCTION__);
    }
}
 
void AudioDecoder::setFrameDataCallback(FrameDataCallback *frameDataCallback) {
    pthread_mutex_lock(pFrameDataCallbackMutex);
    pFrameDataCallback = frameDataCallback;
    pthread_mutex_unlock(pFrameDataCallbackMutex);
}
 
void AudioDecoder::decode() {
    int ret;
    unsigned sleepDelta = 1024 * 1000000 / gSampleFreq / 4;// 一帧音频的 1/4
 
    while (isDecoding) {
        if (pPacketQueue == nullptr) {
            usleep(sleepDelta);
            continue;
        }
 
        AVPacket *pkt = av_packet_alloc();
        if (pkt == nullptr) {
            usleep(sleepDelta);
            continue;
        }
 
        PACKET_STRUCT *packetStruct;
        bool isDone = pPacketQueue->Take(packetStruct);
        if (isDone && packetStruct != nullptr && packetStruct->data != nullptr &&
            packetStruct->data_size > 0) {
            ret = av_new_packet(pkt, packetStruct->data_size);
            if (ret < 0) {
                av_packet_free(&pkt);
                free(packetStruct->data);
                free(packetStruct);
                
                continue;
            }
        } else {
            av_packet_free(&pkt);
            usleep(sleepDelta);
            continue;
        }
 
        memcpy(pkt->data, packetStruct->data, packetStruct->data_size);
 
        pkt->pts = packetStruct->timestamp;
        pkt->dts = packetStruct->timestamp;
 
 
        /* send the packet for decoding */
        ret = avcodec_send_packet(pAudioAVCodecCtx, pkt);
        //LOGD("%s send the audio packet for decoding pkt size=%d", __FUNCTION__, pkt->size);
        free(packetStruct->data);
        free(packetStruct);
 
        av_packet_unref(pkt);
        av_packet_free(&pkt);
 
        if (ret < 0) {
            LOGE("%s Error sending the audio pkt to the decoder ret=%d", __FUNCTION__, ret);
            usleep(sleepDelta);
            continue;
        } else {
            // 编码和解码都是一样的，都是send 1次，然后receive多次, 直到AVERROR(EAGAIN)或者AVERROR_EOF
            while (ret >= 0) {
                ret = avcodec_receive_frame(pAudioAVCodecCtx, pFrame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
                    usleep(sleepDelta);
                    continue;
                } else if (ret < 0) {
                    LOGE("%s Error receive decoding audio frame ret=%d", __FUNCTION__, ret);
                    usleep(sleepDelta);
                    continue;
                }
 
                // 解码固定为 AV_SAMPLE_FMT_FLTP，需要转码为 AV_SAMPLE_FMT_S16
                // 数据都装在 data[0] 中，而大小则为 linesize[0]（实际发现此处大小并不对，大小计算见下面）
                int planeNum = 1;
                int dataLen[planeNum];
                /*dataLen[0] = pFrame->nb_samples *
                             av_get_bytes_per_sample((enum AVSampleFormat) (pFrame->format));*/
                // 重采样转为 S16
                uint8_t *pcmOut[1] = {nullptr};
                pcmOut[0] = pPCM16OutBuf;
                // 音频重采样
                int number = swr_convert(
                        pSwrContext,
                        pcmOut,
                        pFrame->nb_samples,
                        (const uint8_t **) pFrame->data,
                        pFrame->nb_samples
                );
 
                if (number != pFrame->nb_samples) {
                    LOGE("%s swr_convert appear problem number=%d", __FUNCTION__, number);
                } else {
                    dataLen[0] = pFrame->nb_samples *
                                 av_get_bytes_per_sample(AV_SAMPLE_FMT_S16);
                    pthread_mutex_lock(pFrameDataCallbackMutex);
                    if (pFrameDataCallback != nullptr) {
                        //LOGD("%s receive the decode frame size=%d nb_samples=%d", __FUNCTION__, dataLen[0], pFrame->nb_samples);
                        pFrameDataCallback->onDataArrived(StreamType::AUDIO,
                                                          (long long) pFrame->pts,
                                                          (char **) pcmOut,
                                                          dataLen,
                                                          planeNum,
                                                          pAudioAVCodecCtx->channels,
                                                          pAudioAVCodecCtx->sample_rate,
                                                          -1,
                                                          -1);
                    }
                    pthread_mutex_unlock(pFrameDataCallbackMutex);
 
                }
 
 
                av_frame_unref(pFrame);
            }
        }
 
    }
}