本文的内容是解码裸流，即从本地读取AAC码流，然后解码成PCM流的过程。
graph LR
音频格式 --> 音频解码器 --> PCM数据

1、FFmpeg音频解码流程

如上图所示是通过FFmpeg进行音频解码的流程。

2、代码实战

2.1、获取解码器

enum AVCodecID audio_codec_id = AV_CODEC_ID_AAC;
const AVCodec *codec = avcodec_find_decoder(audio_codec_id);
//    const AVCodec *codec = avcodec_find_decoder_by_name("libfdk_aac");
if (!codec) {
   fprintf(stderr, "Codec not found\n");
   return;
}
复制代码

通过调用 avcodec_find_decoder函数根据ID来查找注册的解码器，这里的ID在源码的libavcodec/codec_id.h文件中的AVCodecID枚举中有定义，我们用作音频AAC解码的ID使用AV_CODEC_ID_AAC即可。当然你也可以使用avcodec_find_decoder_by_name函数通过传入解码器的名称来获取解码器，如：avcodec_find_decoder_by_name("libfdk_aac")来获取fdk-aac解码器。

2.2、初始化裸流解析器

// 获取裸流的解析器 AVCodecParserContext(数据)  +  AVCodecParser(方法)
AVCodecParserContext *parser = av_parser_init(codec->id);
if (!parser) {
    fprintf(stderr, "Parser not found\n");
    return;
}
复制代码

调用av_parser_init函数来初始化一个裸流的解析器AVCodecParserContext。传入参数解码器的id。

2.3、创建上下文

// 分配codec上下文
AVCodecContext *codec_ctx = avcodec_alloc_context3(codec);
if(!codec_ctx) {
    fprintf(stderr, "Could not allocate audio codec context\n");
    return;
}

//将解码器和解码器上下文进行关联
int ret = avcodec_open2(codec_ctx, codec, NULL);
if(ret < 0) {
    fprintf(stderr, "Could not open codec\n");
    return;
}
复制代码

avcodec_alloc_context3函数初始化一个上下文，为AVCodecContext分配内存，然后调用avcodec_open2函数打开解码器，将解码器和解码器上下文进行关联。

2.4、打开文件

// 打开输入文件
FILE *infile = fopen(in_file, "rb");
if (!infile) {
    fprintf(stderr, "Could not open %s\n", in_file);
    return;
}

// 打开输出文件
FILE *outfile = fopen(out_file, "wb");
if (!outfile) {
    fprintf(stderr, "Could not open %s\n", in_file);
    return;
}

复制代码

in_file是输入文件的路径，即本地AAC格式文件的路径， out_file是存储将AAC码流解码后得到的pcm码流数据的文件路径。

2.5、创建AVPacket和AVFrame

AVPacket *pkt = av_packet_alloc();
if(!pkt) {
    fprintf(stderr, "Could not alloc avpacket\n");
    return;
}

AVFrame *decoded_frame = av_frame_alloc();
if(!decoded_frame) {
    fprintf(stderr, "Could not allocate audio frame\n");
    return;
}
复制代码

2.6、读取数据并解码

// 输入缓冲区的大小
#define AUDIO_INBUF_SIZE 20480
// 需要再次读取输入文件数据的阈值
#define AUDIO_REFILL_THRESH 4096
复制代码

uint8_t inbuf[AUDIO_INBUF_SIZE + AV_INPUT_BUFFER_PADDING_SIZE];
uint8_t *data = inbuf;
size_t data_size = 0;
//读取AUDIO_INBUF_SIZE大小的数据到inbuf缓存区
data_size = fread(inbuf, 1, AUDIO_INBUF_SIZE, infile);

while(data_size > 0) {
    //解析获得⼀个Packet
    ret = av_parser_parse2(parser, codec_ctx, &pkt->data, &pkt->size, data, (int)data_size, AV_NOPTS_VALUE, AV_NOPTS_VALUE,  0);
    if (ret < 0){
        fprintf(stderr, "Error while parsing\n");
        break;
    }
    data += ret;
    data_size -= ret;
    if(pkt->size)
        decoder(codec_ctx, pkt, decoded_frame, outfile);
    //如果数据不够了，再次读取文件数据
    if(data_size < AUDIO_REFILL_THRESH) {
        //剩余数据移动到缓冲区前
        memmove(inbuf, data, data_size);
        data = inbuf;
        //跨过已有数据，读取文件数据
        size_t len = fread(data + data_size, 1, AUDIO_INBUF_SIZE - data_size, infile);
        if (len > 0)
            data_size += len;
    }

}
复制代码

如上代码所示是读取本地数据并进行解码的过程，首先我们创建了一个AUDIO_INBUF_SIZE + AV_INPUT_BUFFER_PADDING_SIZE大小的数据缓存区，加上AV_INPUT_BUFFER_PADDING_SIZE是为了防止某些优化过的reader一次性读取过多导致越界。然后调用fread函数从本地文件中每次读取AUDIO_INBUF_SIZE大小的数据到缓存区中。

av_parser_parse2函数用来解析出一个完整的Packet，是解码处理过程中的核心函数之一。如下是官方对于该函数的参数说明。

/**

 * Parse a packet.
 * @param s             parser context.
 * @param avctx         codec context.
 * @param poutbuf       set to pointer to parsed buffer or NULL if not yet finished.
 * @param poutbuf_size  set to size of parsed buffer or zero if not yet finished.
 * @param buf           input buffer.
 * @param buf_size      buffer size in bytes without the padding. I.e. the full buffer
                        size is assumed to be buf_size + AV_INPUT_BUFFER_PADDING_SIZE.
                        To signal EOF, this should be 0 (so that the last frame
                        can be output).
 * @param pts           input presentation timestamp.
 * @param dts           input decoding timestamp.
 * @param pos           input byte position in stream.
 * @return the number of bytes of the input bitstream used.

 */

int av_parser_parse2(AVCodecParserContext *s,
                     AVCodecContext *avctx,
                     uint8_t **poutbuf, int *poutbuf_size,
                     const uint8_t *buf, int buf_size,
                     int64_t pts, int64_t dts,
                     int64_t pos);
复制代码

1、s和avctx分别表示解码器和解码器的上下文
2、poutbuf：输出数据地址
3、poutbuf_size：输出数据大小，如果函数执行完后输出数据为空（poutbuf_size为0），则代表解析还没有完成，还需要再次调用av_parser_parse2()解析一部分数据才可以得到解析后的数据帧
4、buf和buf_size分别表示输入数据和输入数据大小
5、函数执行完后返回已经使用的二进制流的数据长度

2.7、解码

decoder方法的代码如下：

   
static void decoder(AVCodecContext *codec_ctx, AVPacket *pkt, AVFrame *frame, FILE *out_file){
    int data_size;
    //将AVPacket压缩数据给解码器
    int ret = avcodec_send_packet(codec_ctx, pkt);
    if(ret == AVERROR(EAGAIN)){
	fprintf(stderr, "Receive_frame and send_packet both returned EAGAIN, which is an API violation.\n");
    }else if (ret < 0){
	fprintf(stderr, "Error submitting the packet to the decoder, err:%s, pkt_size:%d\n",av_get_err(ret), pkt->size);
        return;
    }

    while (ret >= 0) {
        //获取解码后的AVFrame数据
        ret = avcodec_receive_frame(codec_ctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
            return;
        else if (ret < 0){
            fprintf(stderr, "Error during decoding\n");
            return;
        }
        //获取单个sample的数据大小
        data_size = av_get_bytes_per_sample(codec_ctx->sample_fmt);
        if (data_size < 0){
            fprintf(stderr, "Failed to calculate data size\n");
            return;
        }
       
        /**
            P表示Planar（平面），其数据格式排列方式为 :
            LLLLLLRRRRRRLLLLLLRRRRRRLLLLLLRRRRRRL...（每个LLLLLLRRRRRR为一个音频帧）
            而不带P的数据格式（即交错排列）排列方式为：
            LRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLRL...（每个LR为一个音频样本）
         */

        for (int i = 0; i < frame->nb_samples; i++){
            for (int ch = 0; ch < codec_ctx->channels; ch++)
            // 交错的方式写入, 大部分float的格式输出
                fwrite(frame->data[ch] + data_size * i, 1, data_size, out_file);
        }
    }
}
复制代码

2.7.1、函数解析

avcodec_send_packet和avcodec_receive_frame是成双结对出现的。

1、调⽤avcodec_send_packet()给解码器传⼊包含原始的压缩数据的AVPacket对象，需要注意的是输⼊的avpkt-data缓冲区必须⼤于AV_INPUT_BUFFER_PADDING_SIZE，因为优化的字节流读取器必须⼀次读取32或者64⽐特的数据，且在将包发送给解码器的时候，AVCodecContext必须已经通过avcodec_open2打开。输⼊的参数AVPakcet，通常情况下，输⼊数据是⼀个单⼀的视频帧或者⼏个完整的⾳频帧。调⽤者保留包的原有属性，解码器不会修改包的内容。解码器可能创建对包的引⽤。如果包没有引⽤计数将拷⻉⼀份。跟以往的API不⼀样，输⼊的包的数据将被完全地消耗，如果包含有多个帧，要求多次调⽤avcodec_recvive_frame，直到 avcodec_recvive_frame返回AVERROR(EAGAIN)或AVERROR_EOF。输⼊参数可以为NULL，或者AVPacket的data域设置为NULL或者size域设置为0，表示将刷新所有的包，意味着数据流已经结束了。第⼀次发送刷新会总会成功，第⼆次发送刷新包是没有必要的，并且返回AVERROR_EOF,如果解码器缓存了⼀些帧，返回⼀个刷新包，将会返回所有的解码包。

返回值：

0: 表示成功
AVERROR(EAGAIN)：当前状态不接受输⼊，⽤户必须先使⽤avcodec_receive_frame() 读取数据帧；
AVERROR_EOF：解码器已刷新，不能再向其发送新包；
AVERROR(EINVAL)：没有打开解码器，或者这是⼀个编码器，或者要求刷新；
AVERRO(ENOMEN)：⽆法将数据包添加到内部队列

2、调用avcodec_receive_frame从解码器返回已解码的输出数据。在⼀个循环体内去接收数据的输出，即周期性地调⽤avcodec_receive_frame()来接收输出的数据，直到返回AVERROR(EAGAIN)或其他错误。需要注意的是该函数在执⾏其他操作之前，函数内部将始终先调⽤av_frame_unref(frame)进行资源释放。

返回值：

0: 表示成功，返回⼀个帧
AVERROR(EAGAIN)：该状态下没有帧输出，需要使⽤avcodec_send_packet发送新的packet到解码器；
AVERROR_EOF：解码器已经被完全刷新，不再有输出帧；
AVERROR(EINVAL)：编解码器没打开；

2.7.2、音频格式

在这里，最终是以交错模式的方式写入文件的。音频的格式分为交错模式和# Plane模式两种，FFmpeg中带有“P”的表示Plane模式，如AV_SAMPLE_FMT_S16P则表示带符号的16位Plane模式。

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16, ///< signed 16 bits
    AV_SAMPLE_FMT_S32, ///< signed 32 bits
    AV_SAMPLE_FMT_FLT, ///< float
    AV_SAMPLE_FMT_DBL, ///< double

    AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP, ///< float, planar
    AV_SAMPLE_FMT_DBLP, ///< double, planar
    AV_SAMPLE_FMT_S64, ///< signed 64 bits
    AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
    AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically

};
复制代码

Plane模式是按通道划分，一般情况下每个Channel独立占用一段内存。其数据格式排列方式为LLLLLLRRRRRRLLLLLLRRRRRRLLLLLLRRRRRR...（每个LLLLLLRRRRRR为一个音频帧）。Plane模式下的数据组织形式如下图所示

交错模式可以看成是特殊的Plane模式，其数据格式排列方式为LRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLR...（每个LR为一个音频样本）。我们通常接触到的wav/PCM等原始音频数据，基本都是交错模式。交错模式下的数据组织形式如下图所示：

2.8、冲刷解码器

pkt->data = NULL; 
pkt->size = 0;
decoder(codec_ctx, pkt, decoded_frame, outfile);
复制代码

2.9、释放资源

fclose(outfile);
fclose(infile);
avcodec_free_context(&codec_ctx);
av_parser_close(parser);
av_frame_free(&decoded_frame);
av_packet_free(&pkt);
复制代码

2.10、播放PCM文件

解码后我们最终会得到一个pcm文件，可以使用下面指令进行验证输出是否正确。

 ffplay -ar 48000 -ac 2 -f f32le out.pcm
复制代码

FFmpeg的AAC解码实战

1、FFmpeg音频解码流程

2、代码实战

2.1、获取解码器

2.2、初始化裸流解析器

2.3、创建上下文

2.4、打开文件

2.5、创建AVPacket和AVFrame

2.6、读取数据并解码

2.7、解码

2.7.1、函数解析

2.7.2、音频格式

2.8、冲刷解码器

2.9、释放资源

2.10、播放PCM文件

目录

1、FFmpeg音频解码流程

2、 代码实战

2.1、获取解码器

2.2、初始化裸流解析器

2.3、 创建上下文

2.4、打开文件

2.5、创建AVPacket和AVFrame

2.6、读取数据并解码

2.7、解码

2.7.1、函数解析

2.7.2、音频格式

2.8、冲刷解码器

2.9、释放资源

2.10、播放PCM文件

猜你喜欢

目录

热门文章

2、代码实战

2.3、创建上下文