【音视频】FFmpeg解封装

解封装

复用器，比如MP4/FLV

在这里插入图片描述

解复用器，MP4/FLV

在这里插入图片描述

封装格式相关函数

avformat_alloc_context(); 负责申请一个AVFormatContext结构的内存,并进行简单初始化
avformat_free_context(); 释放该结构里的所有东西以及该结构本身
avformat_close_input();关闭解复用器。关闭后就不再需要使用avformat_free_context 进行释放。
avformat_open_input(); 打开输入视频文件
avformat_find_stream_info()： 获取视频文件信息
av_read_frame(); 读取音视频包
avformat_seek_file(); 定位文件
av_seek_frame(): 定位文件

解封装流程

在这里插入图片描述

FFmpeg数据结构之间的关系

区分不同的码流

AVMEDIA_TYPE_VIDEO视频流

video_index = av_find_best_stream(ic, AVMEDIA_TYPE_VIDEO,-1,-1, NULL, 0)

AVMEDIA_TYPE_AUDIO 音频流

video_index = av_find_best_stream(ic, AVMEDIA_TYPE_AUDIO,-1,-1, NULL, 0)

AVPacket 里面也有一个index的字段，这个字段存储对应的一个流

重点

avformat_open_input和avformat_find_stream_info分别用于打开一个流和分析流信息。
在初始信息不足的情况下（比如FLV和H264文件），avformat_find_stream_info接口需要在内部调用read_frame_internal接口读取流数据（音视频帧），然后再分析后，设置核心数据结构AVFormatContext
由于需要读取数据包，avformat_find_stream_info接口会带来很大的延迟

实现流程

添加视频文件

在build路径下添加相关mp4，flv，ts文件
设置参数输入

![[Pasted image 20250330141825.png|600]]、

`avformat_open_input`

int ret = avformat_open_input(&ifmt_ctx, in_filename, NULL, NULL);

这个函数主要是用于打开输入的媒体文件或流，并对相关上下文进行初始化：

为AVFormatContext结构体分配内存，然后将其地址存储在 *ps 中，后续对媒体文件的操作，如读取数据包、查找流信息等，都要基于这个上下文。
- 输入媒体文件的文件名或 URL。可以是本地文件的路径，例如 "/path/to/your/file.mp4"；也可以是网络流的 URL，比如 "http://example.com/stream.m3u8"。
第三个参数用于指定输入文件的格式。通常情况下设置为 NULL，让 FFmpeg 自动探测文件格式。不过，在某些特殊情况下，如果你明确知道文件的格式，也可以指定具体的 AVInputFormat 类型。
最后一个参数是指向 AVDictionary 指针的指针，用于传递额外的选项，例如编解码器选项、协议选项等。如果不需要传递额外选项，可以将其设置为 NULL。
输入为mp4文件的时候，这一步可以获取很多信息，如编解码ID、媒体持续时间等

![[Pasted image 20250330142428.png|200]]

输入文件为flv，则无法获取较多信息，因为flv头部信息不足

比如这里的duration没有被设置，是随机值

在这里插入图片描述

输入为ts，效果和flv类似，但信息比flv多一点

在这里插入图片描述

`avformat_find_stream_info`

这个函数主要是用于分析输入媒体的流信息，为上下文结构体补充信息，比如比特率等

在调用 avformat_open_input 打开媒体文件后，AVFormatContext 中仅包含了一些基本的文件信息，而各个流（如音频流、视频流、字幕流等）的详细信息，像编码格式、帧率、采样率、分辨率等，往往还未被解析出来。avformat_find_stream_info 函数的主要功能就是读取媒体文件的一定数量的数据包，对这些数据包进行分析，从而填充 AVFormatContext 中各流的详细信息。

ret = avformat_find_stream_info(ifmt_ctx, NULL);

参考下图，执行后将bite_rate、duration等信息补充到了上下文中

在这里插入图片描述

如果一开始头部信息不足，调用这个函数比较耗费时间，因为需要在内部读入视频帧进行分析

ts文件
flv文件

请添加图片描述

mp4文件

在这里插入图片描述

可以发现，是存在延迟的，如果文件容量更大的话，延迟可能会更大

`av_dump_format`

这个函数将上下文信息打印出来

av_dump_format(ifmt_ctx, 0, in_filename, 0);

使用avformat_open_input后打印上下文信息

请添加图片描述

使用avformat_find_stream_info后打印信息

请添加图片描述

这里还可以将上下文结构体的内容打印出来：
请添加图片描述

 // url: 调用avformat_open_input读取到的媒体文件的路径/名字
    printf("media name:%s\n", ifmt_ctx->url);
    // nb_streams: nb_streams媒体流数量
    printf("stream number:%d\n", ifmt_ctx->nb_streams);
    // bit_rate: 媒体文件的码率,单位为bps
    printf("media average ratio:%lldkbps\n",(int64_t)(ifmt_ctx->bit_rate/1024));
    // 时间
    int total_seconds, hour, minute, second;
    // duration: 媒体文件时长，单位微妙
    total_seconds = (ifmt_ctx->duration) / AV_TIME_BASE;  // 1000us = 1ms, 1000ms = 1秒
    hour = total_seconds / 3600;
    minute = (total_seconds % 3600) / 60;
    second = (total_seconds % 60);
    //通过上述运算，可以得到媒体文件的总时长
    printf("total duration: %02d:%02d:%02d\n", hour, minute, second);
    printf("\n");

注意这里的duration为媒体总时长，单位为微秒，因此转换为秒需要除以1e6，AV_TIME_BASE宏对应就是1e6

![[Pasted image 20250330145622.png|400]]

获取相应流

可以通过遍历上下文中对应的二维流数组来找到自己想要的流，比如音频流和视频流

AVStream 是 FFmpeg 库中一个关键的结构体，主要用于描述媒体文件中的一个流（例如视频流、音频流、字幕流等），结构体内存储了很多关于流的信息：

AVRational time_base ：AVRational 是一个表示分数的结构体，time_base 定义了流中时间戳的基本单位。时间戳（如 PTS 和 DTS）是以 time_base 为单位的。例如，若 time_base 为 {1, 1000}，则表示时间戳的单位是 1/1000 秒。在进行时间戳的转换和计算时，需要使用这个 time_base。AVCodecParameters 结构体包含了流的编解码器相关的参数信息，如视频流的分辨率、帧率、像素格式，音频流的采样率、声道数、采样格式等。这些信息对于选择合适的解码器以及进行解码操作非常重要。

1. `int index`

此成员表示该流在 AVFormatContext 的 streams 数组中的索引。在处理多个流的媒体文件时，可以通过这个索引来区分不同的流。例如，在读取数据包时，AVPacket 结构体中的 stream_index 就会指向这个索引，以此确定该数据包属于哪个流。

2. `AVRational time_base`

AVRational 是一个表示分数的结构体，time_base 定义了流中时间戳的基本单位。时间戳（如 PTS 和 DTS）是以 time_base 为单位的。例如，若 time_base 为 {1, 1000}，则表示时间戳的单位是 1/1000 秒。在进行时间戳的转换和计算时，需要使用这个 time_base。

3. `AVCodecParameters *codecpar`

AVCodecParameters 结构体包含了流的编解码器相关的参数信息，如视频流的分辨率、帧率、像素格式，音频流的采样率、声道数、采样格式等。这些信息对于选择合适的解码器以及进行解码操作非常重要。

4. `int64_t duration`

该成员表示流的总时长，单位是 time_base。要将其转换为秒，可以使用以下公式：duration_in_seconds = (double)duration * av_q2d(time_base)。

5. `int64_t start_time`

表示流的起始时间，单位同样是 time_base。在某些情况下，流的起始时间可能不是从 0 开始的，通过这个成员可以获取到流实际的起始时间。

6. `AVRational avg_frame_rate` 和 `AVRational r_frame_rate`

avg_frame_rate 表示流的平均帧率，是根据流中所有帧的时间间隔计算得出的平均帧率。
r_frame_rate 表示流的实际帧率，通常是固定的帧率，对于一些固定帧率的视频流，这个值会比较准确。

7. `void *priv_data`

这是一个指向私有数据的指针，用于存储特定格式的额外信息。不同的格式可能会使用这个指针来存储一些自定义的数据，一般情况下不需要直接操作这个指针。

在我们的示例中，通过遍历上下文所有的流，每个流都有唯一对应的流索引，因此可以通过流中的编解码参数信息，打印出相应的音视频格式：

获取当前索引的流结构体

AVStream *in_stream = ifmt_ctx->streams[i];// 音频流、视频流、字幕流

通过编解码器的参数获取编解码类型，返回相应的类型宏定义

音频MEDIA_TYPE_AUDIO

if (AVMEDIA_TYPE_AUDIO == in_stream->codecpar->codec_type)

视频 MEDIA_TYPE_VIDEO

else if (AVMEDIA_TYPE_VIDEO == in_stream->codecpar->codec_type)

如果是音频，可以打印出相关的音频信息，如采样率、采样格式（如FLTP、S16P）、通道数、压缩格式（如AAC、MP3）、音频总时长等

 printf("----- Audio info:\n");
            // index: 每个流成分在ffmpeg解复用分析后都有唯一的index作为标识
            printf("index:%d\n", in_stream->index);
            // sample_rate: 音频编解码器的采样率，单位为Hz
            printf("samplerate:%dHz\n", in_stream->codecpar->sample_rate);
            // codecpar->format: 音频采样格式
            if (AV_SAMPLE_FMT_FLTP == in_stream->codecpar->format)
            {
                printf("sampleformat:AV_SAMPLE_FMT_FLTP\n");
            }
            else if (AV_SAMPLE_FMT_S16P == in_stream->codecpar->format)
            {
                printf("sampleformat:AV_SAMPLE_FMT_S16P\n");
            }
            // channels: 音频信道数目
            printf("channel number:%d\n", in_stream->codecpar->channels);
            // codec_id: 音频压缩编码格式
            if (AV_CODEC_ID_AAC == in_stream->codecpar->codec_id)
            {
                printf("audio codec:AAC\n");
            }
            else if (AV_CODEC_ID_MP3 == in_stream->codecpar->codec_id)
            {
                printf("audio codec:MP3\n");
            }
            else
            {
                printf("audio codec_id:%d\n", in_stream->codecpar->codec_id);
            }
            // 音频总时长，单位为秒。注意如果把单位放大为毫秒或者微秒，音频总时长跟视频总时长不一定相等的
            if(in_stream->duration != AV_NOPTS_VALUE)
            {
                int duration_audio = (in_stream->duration) * av_q2d(in_stream->time_base);
                //将音频总时长转换为时分秒的格式打印到控制台上
                printf("audio duration: %02d:%02d:%02d\n",
                       duration_audio / 3600, (duration_audio % 3600) / 60, (duration_audio % 60));
            }
            else
            {
                printf("audio duration unknown");
            }

            printf("\n");

注意，在计算音频时长的时候，AVStream中的duration和上下文AVFormatContext中的单位不一样，这里的单位是时间基time_base，不同的媒体文件可能时间基不同，比如可能是1/1000 s作为一个时间基，那么我们转换为妙就需要如下操作
$s = AVStream->duration*av\_q2d(AVStream->time\_base)$

这里的av_q2d实际上就是将分数形式转换为double类型的小数形式，因此转换实质上上就是：duration* time_base

如果是视频，同样可以提取出视频编解码器的信息，比如视频帧率（FPS）、视频压缩编码格式（H264、MPEG4）、视频帧的宽高（1080x720），转换视频的持续时间的方式与音频一样，注意，time_base的值通常不同：

视频

典型值：{1, 25}（25 FPS）、{1, 30}（30 FPS）、{1, 90000}（精确时间基）
含义：视频帧的时间间隔以帧率倒数为单位。

音频

典型值：{1, 44100}（44.1kHz 采样率）、{1, 48000}（48kHz 采样率）
含义：音频帧的时间间隔以采样周期为单位

printf("----- Video info:\n");
            printf("index:%d\n", in_stream->index);
            // avg_frame_rate: 视频帧率,单位为fps，表示每秒出现多少帧
            printf("fps:%lffps\n", av_q2d(in_stream->avg_frame_rate));
            if (AV_CODEC_ID_MPEG4 == in_stream->codecpar->codec_id) //视频压缩编码格式
            {
                printf("video codec:MPEG4\n");
            }
            else if (AV_CODEC_ID_H264 == in_stream->codecpar->codec_id) //视频压缩编码格式
            {
                printf("video codec:H264\n");
            }
            else
            {
                printf("video codec_id:%d\n", in_stream->codecpar->codec_id);
            }
            // 视频帧宽度和帧高度
            printf("width:%d height:%d\n", in_stream->codecpar->width,
                   in_stream->codecpar->height);
            //视频总时长，单位为秒。注意如果把单位放大为毫秒或者微秒，音频总时长跟视频总时长不一定相等的
            if(in_stream->duration != AV_NOPTS_VALUE)
            {
                int duration_video = (in_stream->duration) * av_q2d(in_stream->time_base);
                printf("video duration: %02d:%02d:%02d\n",
                       duration_video / 3600,
                       (duration_video % 3600) / 60,
                       (duration_video % 60)); //将视频总时长转换为时分秒的格式打印到控制台上
            }
            else
            {
                printf("video duration unknown");
            }

            printf("\n");

获取相应包（`Packet`）

上下文中还存储了压缩的数据包，比如对应的H264、AAC压缩包，我们可以读取这些压缩包

首先我们需要为AVPacket结构体分配内存

 AVPacket *pkt = av_packet_alloc();

通过一个循环来依次读取每一帧的数据包到AVPacket中，每次读取一帧后，内部的指针都会向后移动

while (1)
    {
        ret = av_read_frame(ifmt_ctx, pkt);
    }

判断数据包内的流索引（视频流、音频流），进行相应操作，如打印pts、dts、包的大小size、包对应文件的偏移量pos，以及根据不同的索引在不同AVStream中找到对应的当前帧的持续时间，如下

音频帧数据包持续时间

pkt->duration * av_q2d(ifmt_ctx->streams[audioindex]->time_base)

视频帧数据包持续时间

pkt->duration * av_q2d(ifmt_ctx->streams[videoindex]->time_base)

解码完当前帧数据包后，需要将这一帧数据包释放，否则会导致内存泄漏，直接调用av_packet_unref减少引用计数即可，引用计数为0会自动释放帧数据包的buf内存

av_packet_unref(pkt);

读取所有帧数据包之后，需要释放AVPacket结构体的内存

if(pkt)
	av_packet_free(&pkt);

释放内存

所有操作之后，需要释放上下文内存，并且关闭打开的文件或关闭对应网络流的连接

调用 avformat_close_input函数即可实现上述功能

if(ifmt_ctx)
	avformat_close_input(&ifmt_ctx);

整体代码

main.c

#include <stdio.h>
#include <libavformat/avformat.h>
#include<time.h>

int main(int argc, char **argv)
{
    //打开网络流。这里如果只需要读取本地媒体文件，不需要用到网络功能，可以不用加上这一句
//    avformat_network_init();

    const char *default_filename = "believe.mp4";

    char *in_filename = NULL;

    if(argv[1] == NULL)
    {
        in_filename = default_filename;
    }
    else
    {
        in_filename = argv[1];
    }
    printf("in_filename = %s\n", in_filename);

    //AVFormatContext是描述一个媒体文件或媒体流的构成和基本信息的结构体
    AVFormatContext *ifmt_ctx = NULL;           // 输入文件的demux

    int videoindex = -1;        // 视频索引
    int audioindex = -1;        // 音频索引


    // 打开文件，主要是探测协议类型，如果是网络文件则创建网络链接
    int ret = avformat_open_input(&ifmt_ctx, in_filename, NULL, NULL);
    if (ret < 0)  //如果打开媒体文件失败，打印失败原因
    {
        char buf[1024] = { 0 };
        av_strerror(ret, buf, sizeof(buf) - 1);
        printf("open %s failed:%s\n", in_filename, buf);
        goto failed;
    }
    printf_s("\n==== av_dump_format in_filename:%s ===\n", in_filename);
    av_dump_format(ifmt_ctx, 0, in_filename, 0);
    printf_s("\n==== av_dump_format finish =======\n\n");

    clock_t started = clock();
    ret = avformat_find_stream_info(ifmt_ctx, NULL);
    clock_t ended = clock();
    double elapsed_time = (double)(ended - started) / CLOCKS_PER_SEC;

    printf("avformat_find_stream_info took %f seconds to execute.\n", elapsed_time);

    if (ret < 0)  //如果打开媒体文件失败，打印失败原因
    {
        char buf[1024] = { 0 };
        av_strerror(ret, buf, sizeof(buf) - 1);
        printf("avformat_find_stream_info %s failed:%s\n", in_filename, buf);
        goto failed;
    }


    //打开媒体文件成功
    printf_s("\n==== av_dump_format in_filename:%s ===\n", in_filename);
    av_dump_format(ifmt_ctx, 0, in_filename, 0);
    printf_s("\n==== av_dump_format finish =======\n\n");
    // url: 调用avformat_open_input读取到的媒体文件的路径/名字
    printf("media name:%s\n", ifmt_ctx->url);
    // nb_streams: nb_streams媒体流数量
    printf("stream number:%d\n", ifmt_ctx->nb_streams);
    // bit_rate: 媒体文件的码率,单位为bps
    printf("media average ratio:%lldkbps\n",(int64_t)(ifmt_ctx->bit_rate/1024));
    // 时间
    int total_seconds, hour, minute, second;
    // duration: 媒体文件时长，单位微妙
    total_seconds = (ifmt_ctx->duration) / AV_TIME_BASE;  // 1000us = 1ms, 1000ms = 1秒
    hour = total_seconds / 3600;
    minute = (total_seconds % 3600) / 60;
    second = (total_seconds % 60);
    //通过上述运算，可以得到媒体文件的总时长
    printf("total duration: %02d:%02d:%02d\n", hour, minute, second);
    printf("\n");
    /*
     * 老版本通过遍历的方式读取媒体文件视频和音频的信息
     * 新版本的FFmpeg新增加了函数av_find_best_stream，也可以取得同样的效果
     */
    for (uint32_t i = 0; i < ifmt_ctx->nb_streams; i++)
    {
        AVStream *in_stream = ifmt_ctx->streams[i];// 音频流、视频流、字幕流
        //如果是音频流，则打印音频的信息
        if (AVMEDIA_TYPE_AUDIO == in_stream->codecpar->codec_type)
        {
            printf("----- Audio info:\n");
            // index: 每个流成分在ffmpeg解复用分析后都有唯一的index作为标识
            printf("index:%d\n", in_stream->index);
            // sample_rate: 音频编解码器的采样率，单位为Hz
            printf("samplerate:%dHz\n", in_stream->codecpar->sample_rate);
            // codecpar->format: 音频采样格式
            if (AV_SAMPLE_FMT_FLTP == in_stream->codecpar->format)
            {
                printf("sampleformat:AV_SAMPLE_FMT_FLTP\n");
            }
            else if (AV_SAMPLE_FMT_S16P == in_stream->codecpar->format)
            {
                printf("sampleformat:AV_SAMPLE_FMT_S16P\n");
            }
            // channels: 音频信道数目
            printf("channel number:%d\n", in_stream->codecpar->channels);
            // codec_id: 音频压缩编码格式
            if (AV_CODEC_ID_AAC == in_stream->codecpar->codec_id)
            {
                printf("audio codec:AAC\n");
            }
            else if (AV_CODEC_ID_MP3 == in_stream->codecpar->codec_id)
            {
                printf("audio codec:MP3\n");
            }
            else
            {
                printf("audio codec_id:%d\n", in_stream->codecpar->codec_id);
            }
            // 音频总时长，单位为秒。注意如果把单位放大为毫秒或者微秒，音频总时长跟视频总时长不一定相等的
            if(in_stream->duration != AV_NOPTS_VALUE)
            {
                int duration_audio = (in_stream->duration) * av_q2d(in_stream->time_base);
                //将音频总时长转换为时分秒的格式打印到控制台上
                printf("audio duration: %02d:%02d:%02d\n",
                       duration_audio / 3600, (duration_audio % 3600) / 60, (duration_audio % 60));
            }
            else
            {
                printf("audio duration unknown");
            }

            printf("\n");

            audioindex = i; // 获取音频的索引
        }
        else if (AVMEDIA_TYPE_VIDEO == in_stream->codecpar->codec_type)  //如果是视频流，则打印视频的信息
        {
            printf("----- Video info:\n");
            printf("index:%d\n", in_stream->index);
            // avg_frame_rate: 视频帧率,单位为fps，表示每秒出现多少帧
            printf("fps:%lffps\n", av_q2d(in_stream->avg_frame_rate));
            if (AV_CODEC_ID_MPEG4 == in_stream->codecpar->codec_id) //视频压缩编码格式
            {
                printf("video codec:MPEG4\n");
            }
            else if (AV_CODEC_ID_H264 == in_stream->codecpar->codec_id) //视频压缩编码格式
            {
                printf("video codec:H264\n");
            }
            else
            {
                printf("video codec_id:%d\n", in_stream->codecpar->codec_id);
            }
            // 视频帧宽度和帧高度
            printf("width:%d height:%d\n", in_stream->codecpar->width,
                   in_stream->codecpar->height);
            //视频总时长，单位为秒。注意如果把单位放大为毫秒或者微秒，音频总时长跟视频总时长不一定相等的
            if(in_stream->duration != AV_NOPTS_VALUE)
            {
                int duration_video = (in_stream->duration) * av_q2d(in_stream->time_base);
                printf("video duration: %02d:%02d:%02d\n",
                       duration_video / 3600,
                       (duration_video % 3600) / 60,
                       (duration_video % 60)); //将视频总时长转换为时分秒的格式打印到控制台上
            }
            else
            {
                printf("video duration unknown");
            }

            printf("\n");
            videoindex = i;
        }
    }

    AVPacket *pkt = av_packet_alloc();

    int pkt_count = 0;
    int print_max_count = 10;
    printf("\n-----av_read_frame start\n");
    while (1)
    {
        ret = av_read_frame(ifmt_ctx, pkt);
        if (ret < 0)
        {
            printf("av_read_frame end\n");
            break;
        }

        if(pkt_count++ < print_max_count)
        {
            if (pkt->stream_index == audioindex)
            {
                printf("audio pts: %lld\n", pkt->pts);
                printf("audio dts: %lld\n", pkt->dts);
                printf("audio size: %d\n", pkt->size);
                printf("audio pos: %lld\n", pkt->pos);
                printf("audio duration: %lf\n\n",
                       pkt->duration * av_q2d(ifmt_ctx->streams[audioindex]->time_base));
            }
            else if (pkt->stream_index == videoindex)
            {
                printf("video pts: %lld\n", pkt->pts);
                printf("video dts: %lld\n", pkt->dts);
                printf("video size: %d\n", pkt->size);
                printf("video pos: %lld\n", pkt->pos);
                printf("video duration: %lf\n\n",
                       pkt->duration * av_q2d(ifmt_ctx->streams[videoindex]->time_base));
            }
            else
            {
                printf("unknown stream_index:\n", pkt->stream_index);
            }
        }

        av_packet_unref(pkt);
    }

    if(pkt)
        av_packet_free(&pkt);
failed:
    if(ifmt_ctx)
        avformat_close_input(&ifmt_ctx);


    getchar(); //加上这一句，防止程序打印完信息马上退出
    return 0;
}