QtPlayer——基于FFmpeg的Qt音视频播放器

本文主要讲解一个基于Qt GUI的，使用FFmpeg音视频库解码的音视频播放器，同时也是记录一点学习心得，本人也是多媒体初学者，也欢迎大家交流，程序运行图如下:
这里写图片描述

QtPlayer基于FFmpeg的Qt音视频播放器

闲话

平常没事干就想多学习学习新东西，然后想想现在的软件全都是一堆广告，所以呢就想着自己做一个播放器。本来Qt5也有现成的QMediaPlayer类，也没去研究过，不过我猜放放普通格式的音视频文件应该没问题，对于多格式的文件就不知道能不能支持了。

那么为什么用FFmpeg呢，因为网上一搜全是这个，没错，就是瞎搞，还有就是播放音频是使用SDL，也是网上的资料比较多而已。其实吧，还有就是考虑到以后说不定还能移植到我的ARM板上玩，总之多学一点总是没错的。

音视频基础

在做这之前完全对音视频方面没有任何专业知识，相信很多人也是一样，这里所要讲的知识也并不什么对某个音视频格式的讲解，只是大概说明一下，所要做的工作，如图：

这里是从雷神那边窃取过来的知识，不知道雷神是谁的请点击。整个音视频播放的流程就是从这四层一步一步往下走。

协议层

协议层主要是说明获取到视频文件的协议，说简单一点就是什么HTTP、RTSP、RTMP或者是本地文件。前面的网络协议自然不用说，本地文件嘛，本来获取文件都是通过地址（URL）获取的，就是平常本地文件的路径。
FFmpeg库已经支持协议层的文件获取，所以这也是极大的方便，所以用别人造好的轮子就是这么舒服，当然最好是了解轮子是怎么造的。

封装层

封装层就是说明多媒体文件的封装格式，例如什么.avi（滑稽），.mp4，.mkv之类的文件格式。一个视频文件其实是由图像和声音两部分封装而成的，当然也可以没声音部分，反正就是把这两个封装成一个文件就是封装层的任务。

压缩层

压缩层所讲述的是我们所看到的视频文件的压缩格式。视频采集到的原始数据，我们不可能一帧一帧的原封不动的保存下来，因为这样保存下载的文件大的吓人，比如平常我们看到的一个10M的视频文件，按原始数据保存下来说不定大几十倍都有可能（我瞎猜的），所以为了在这节省空间，需要对原始数据进行压缩。
当前流行的压缩格式当属H264，不过H265也出了这么多年了，也不知道现在发展的怎么样了。

图像层

图像层也就是原始数据层，主要是描述组成图像数据的格式，大多数时候也就是采集设备，采集到的数据格式，最常用的当属YUV420格式。不过Qt显示图像的格式不支持YUV的格式，所以需要转换成RGB格式。

FFmpeg的音视频处理

FFmpeg但凡搞多媒体的应该都听说过，一个很大的音视频编解码库，想啃下来还是要花点时间，毕竟一个ffplay就是3700行代码，对不起，我晕代码。。。不过为了搞比利，还是要去看，而且不难发现，网上的例子全是用的别人的代码，好歹自己改个变量名啊。而且很多人用的老版本的库，很多方法很不幸都deprecated了。虽然现在我用的方法以后说不定也会过时，不过还是得赶一波新潮。本文用到的FFmpeg版本为3.4。

使用FFmpeg最主要就是用它那强大的编解码方法，首先，我们需要对它进行初始化：

void MainWindow::initFFmpeg()
{
//    av_log_set_level(AV_LOG_INFO);

    avfilter_register_all();

    /* ffmpeg init */
    av_register_all();

    /* ffmpeg network init for rtsp */
    if (avformat_network_init()) {
        qDebug() << "avformat network init failed";
    }

    /* init sdl audio */
    if (SDL_Init(SDL_INIT_AUDIO | SDL_INIT_TIMER)) {
        qDebug() << "SDL init failed";
    }
}

最上面的av_log_set_level()是用来控制FFmpeg的打印等级的，就像Linux Kernel的打印控制方法一样。
avfilter_register_all();注册滤镜，filter是ffmpeg的重要部分啊，可是我也刚入手，也不是很熟悉。
emmm最主要的就是av_register_all()这个方法，注册了所有的编解码混合器，麻麻再也不用担心我的播放器有不支持的格式了。
然后就是avformat_network_init()网络模块初始化，如果想用什么rtsp之类的网络直播视频就必须加这一句。

然后就是处理的主体：

首先需要一个格式化输入输出上下文，就是靠这玩意儿打开文件，所以是核心的结构体：

pFormatCtx = avformat_alloc_context();
if (avformat_open_input(&pFormatCtx, currentFile.toLocal8Bit().data(), NULL, NULL) != 0) {
    qDebug() << "Open file failed.";
    return ;
}

if (avformat_find_stream_info(pFormatCtx, NULL) < 0) {
    qDebug() << "Could't find stream infomation.";
    avformat_free_context(pFormatCtx);
    return;
}

打开视频文件成功之后就需要获取到音视频流的索引（还有一个subtitle，至今还不懂怎么用，望告知）：

/* find video & audio stream index
 */
for (unsigned int i = 0; i < pFormatCtx->nb_streams; i++) {
    if (pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
        videoIndex = i;
        qDebug() << "Find video stream.";
    }

    if (pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
        audioIndex = i;
        qDebug() << "Find audio stream.";
    }

    if (pFormatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_SUBTITLE) {
        subtitleIndex = i;
        qDebug() << "Find subtitle stream.";
    }
}

有了各个类型的数据流索引后就可以获取到解码器和数据流的结构体，以备后面处理：

/* find video decoder */
pCodecCtx = avcodec_alloc_context3(NULL);
avcodec_parameters_to_context(pCodecCtx, pFormatCtx->streams[videoIndex]->codecpar);
videoStream = pFormatCtx->streams[videoIndex];

东西准备好了就可以开始解码，要解码首先当然需要从文件中读数据出来，而且解码这种耗时的东西当然是放在子线程里面，开个死循环慢慢来：

while (true) {
    ...
    /* judge haven't read all frame */
    if (av_read_frame(pFormatCtx, packet) < 0) {
        qDebug() << "Read file completed.";
        isReadFinished = true;
        emit readFinished();
        SDL_Delay(10);
        break;
    }
    ...
}

把数据包读出来过后，就把packet分类，到对应的部分去处理它们：

if (packet->stream_index == videoIndex && currentType == "video") {
    videoQueue.enqueue(packet); // video stream
} else if (packet->stream_index == audioIndex) {
    audioDecoder->packetEnqueue(packet); // audio stream
} else if (packet->stream_index == subtitleIndex) {
      subtitleQueue.enqueue(packet);
    av_packet_unref(packet);    // subtitle stream
} else {
    av_packet_unref(packet);
}

当然解码的速度肯定跟不上你读的速度，所以先把读出来的数据放在队列里，慢慢搞。

视频解码

视频解码相对来说比较简单，把我们刚才读的数据从队列里面取出来，放解码器里面，然后就得到想要的数据帧了= =！

decoder->videoQueue.dequeue(&packet, true);

ret = avcodec_send_packet(decoder->pCodecCtx, &packet);
if ((ret < 0) && (ret != AVERROR(EAGAIN)) && (ret != AVERROR_EOF)) {
    qDebug() << "Video send to decoder failed, error code: " << ret;
    av_packet_unref(&packet);
    continue;
}

ret = avcodec_receive_frame(decoder->pCodecCtx, pFrame);
if ((ret < 0) && (ret != AVERROR_EOF)) {
    qDebug() << "Video frame decode failed, error code: " << ret;
    av_packet_unref(&packet);
    continue;
}

if (av_buffersrc_add_frame(decoder->filterSrcCxt, pFrame) < 0) {
    qDebug() << "av buffersrc add frame failed.";
    av_packet_unref(&packet);
    continue;
}

if (av_buffersink_get_frame(decoder->filterSinkCxt, pFrame) < 0) {
    qDebug() << "av buffersink get frame failed.";
    av_packet_unref(&packet);
    continue;
} else {
    QImage tmpImage(pFrame->data[0], decoder->pCodecCtx->width, decoder->pCodecCtx->height, QImage::Format_RGB32);
    /* deep copy, otherwise when tmpImage data change, this image cannot display */
    QImage image = tmpImage.copy();
    decoder->displayVideo(image);
}

上面的代码主要注意的有两点：

使用avcodec_send_packet()和avcodec_receive_frame()替换原先的一个什么什么decode函数，因为那个方法deprecated了，但是网上一堆代码还是用的那个。
这里我用了avfilter直接对frame进行处理，然后得到处理后的RGB格式的frame后，直接实例一个QImage送去显示。对于得到的Image还是deep copy一份，不然还没显示完，QImage指向的data pointer值被改了就麻烦了。

音频解码

至于音频，因为用到了SDL去play sound所以就按照SDL的步骤走吧，首先需要open一个sound device，其实就是设置音频解码的一些参数：

int AudioDecoder::openAudio(AVFormatContext *pFormatCtx, int index)
{
    AVCodec *codec;
    SDL_AudioSpec wantedSpec;
    int wantedNbChannels;
    const char *env;

    /*  soundtrack array use to adjust */
    int nextNbChannels[]   = {0, 0, 1, 6, 2, 6, 4, 6};
    int nextSampleRates[]  = {0, 44100, 48000, 96000, 192000};
    int nextSampleRateIdx = FF_ARRAY_ELEMS(nextSampleRates) - 1;

    isStop = false;
    isPause = false;
    isreadFinished = false;

    audioSrcFmt = AV_SAMPLE_FMT_NONE;
    audioSrcChannelLayout = 0;
    audioSrcFreq = 0;

    audioBufIndex = 0;
    audioBufSize = 0;
    audioBufSize1 = 0;

    clock = 0;

    pFormatCtx->streams[index]->discard = AVDISCARD_DEFAULT;

    stream = pFormatCtx->streams[index];

    codecCtx = avcodec_alloc_context3(NULL);
    avcodec_parameters_to_context(codecCtx, pFormatCtx->streams[index]->codecpar);

    /* find audio decoder */
    if ((codec = avcodec_find_decoder(codecCtx->codec_id)) == NULL) {
        avcodec_free_context(&codecCtx);
        qDebug() << "Audio decoder not found.";
        return -1;
    }

    /* open audio decoder */
    if (avcodec_open2(codecCtx, codec, NULL) < 0) {
        avcodec_free_context(&codecCtx);
        qDebug() << "Could not open audio decoder.";
        return -1;
    }

    totalTime = pFormatCtx->duration;

    env = SDL_getenv("SDL_AUDIO_CHANNELS");
    if (env) {
        qDebug() << "SDL audio channels";
        wantedNbChannels = atoi(env);
        audioDstChannelLayout = av_get_default_channel_layout(wantedNbChannels);
    }

    wantedNbChannels = codecCtx->channels;
    if (!audioDstChannelLayout ||
        (wantedNbChannels != av_get_channel_layout_nb_channels(audioDstChannelLayout))) {
        audioDstChannelLayout = av_get_default_channel_layout(wantedNbChannels);
        audioDstChannelLayout &= ~AV_CH_LAYOUT_STEREO_DOWNMIX;
    }

    wantedSpec.channels    = av_get_channel_layout_nb_channels(audioDstChannelLayout);
    wantedSpec.freq        = codecCtx->sample_rate;
    if (wantedSpec.freq <= 0 || wantedSpec.channels <= 0) {
        avcodec_free_context(&codecCtx);
        qDebug() << "Invalid sample rate or channel count, freq: " << wantedSpec.freq << " channels: " << wantedSpec.channels;
        return -1;
    }

    while (nextSampleRateIdx && nextSampleRates[nextSampleRateIdx] >= wantedSpec.freq) {
        nextSampleRateIdx--;
    }

    wantedSpec.format      = audioDeviceFormat;
    wantedSpec.silence     = 0;
    wantedSpec.samples     = FFMAX(SDL_AUDIO_MIN_BUFFER_SIZE, 2 << av_log2(wantedSpec.freq / SDL_AUDIO_MAX_CALLBACKS_PER_SEC));
    wantedSpec.callback    = &AudioDecoder::audioCallback;
    wantedSpec.userdata    = this;

    /* This function opens the audio device with the desired parameters, placing
     * the actual hardware parameters in the structure pointed to spec.
     */
    while (1) {
        while (SDL_OpenAudio(&wantedSpec, &spec) < 0) {
            qDebug() << QString("SDL_OpenAudio (%1 channels, %2 Hz): %3")
                    .arg(wantedSpec.channels).arg(wantedSpec.freq).arg(SDL_GetError());
            wantedSpec.channels = nextNbChannels[FFMIN(7, wantedSpec.channels)];
            if (!wantedSpec.channels) {
                wantedSpec.freq = nextSampleRates[nextSampleRateIdx--];
                wantedSpec.channels = wantedNbChannels;
                if (!wantedSpec.freq) {
                    avcodec_free_context(&codecCtx);
                    qDebug() << "No more combinations to try, audio open failed";
                    return -1;
                }
            }
            audioDstChannelLayout = av_get_default_channel_layout(wantedSpec.channels);
        }

        if (spec.format != audioDeviceFormat) {
            qDebug() << "SDL audio format: " << wantedSpec.format << " is not supported"
                     << ", set to advised audio format: " <<  spec.format;
            wantedSpec.format = spec.format;
            audioDeviceFormat = spec.format;
            SDL_CloseAudio();
        } else {
            break;
        }
    }

    if (spec.channels != wantedSpec.channels) {
        audioDstChannelLayout = av_get_default_channel_layout(spec.channels);
        if (!audioDstChannelLayout) {
            avcodec_free_context(&codecCtx);
            qDebug() << "SDL advised channel count " << spec.channels << " is not supported!";
            return -1;
        }
    }

    /* set sample format */
    switch (audioDeviceFormat) {
    case AUDIO_U8:
        audioDstFmt    = AV_SAMPLE_FMT_U8;
        break;

    case AUDIO_S16SYS:
        audioDstFmt    = AV_SAMPLE_FMT_S16;
        break;

    case AUDIO_S32SYS:
        audioDstFmt    = AV_SAMPLE_FMT_S32;
        break;

    case AUDIO_F32SYS:
        audioDstFmt    = AV_SAMPLE_FMT_FLT;
        break;

    default:
        audioDstFmt    = AV_SAMPLE_FMT_S16;
        break;
    }

    /* open sound */
    SDL_PauseAudio(0);

    return 0;
}

其中需要一个SDL的callback函数，在这个函数里面去处理音频信息，并且play出来：

void AudioDecoder::audioCallback(void *userdata, quint8 *stream, int SDL_AudioBufSize)
{
    AudioDecoder *decoder = (AudioDecoder *)userdata;

    int decodedSize;
    /* SDL_BufSize means audio play buffer left size
     * while it greater than 0, means counld fill data to it
     */
    while (SDL_AudioBufSize > 0) {
        if (decoder->isStop) {
            return ;
        }

        if (decoder->isPause) {
            SDL_Delay(10);
            continue;
        }

        /* no data in buffer */
        if (decoder->audioBufIndex >= decoder->audioBufSize) {

            decodedSize = decoder->decodeAudio();
            /* if error, just output silence */
            if (decodedSize < 0) {
                /* if not decoded data, just output silence */
                decoder->audioBufSize = 1024;
                decoder->audioBuf = nullptr;
            } else {
                decoder->audioBufSize = decodedSize;
            }
            decoder->audioBufIndex = 0;
        }

        /* calculate number of data that haven't play */
        int left = decoder->audioBufSize - decoder->audioBufIndex;
        if (left > SDL_AudioBufSize) {
            left = SDL_AudioBufSize;
        }

        if (decoder->audioBuf) {
            memset(stream, 0, left);
            SDL_MixAudio(stream, decoder->audioBuf + decoder->audioBufIndex, left, decoder->volume);
        }

        SDL_AudioBufSize -= left;
        stream += left;
        decoder->audioBufIndex += left;
    }
}

这个callback需要传入的三个参数：

第一个是用户数据，一般就传你当前的数据结构进去啦，对于我这种C++写的，直接在open的时候就传了个this进去；
第二个参数是一个指向播放数据的pointer，解码后的audio data就需要copy到这个pointer播放；
第三个参数是播放数据的空间剩余大小，如果大于0，我们就可以继续copy data到前面的stream里面。

然后就是我们的解码主体，里面基本上和视频解码是相同的，不过是视频转码用sws，音频用swr而已。

需要注意的是有时候一个数据packet里面可能包含多个frame数据，视频的我没遇到，音频的最典型的就是.ape的文件（拥有音乐梦想的人，听歌都是ape和flac的，不知道装逼会不会挨打_ (:з」∠)_）。所以在avcodec_send_packet(）需要对返回值进行判断，如果packet还有其他数据，下次解码的时候就不去读其他的packet，继续搞同一个。

音视频同步

解码了视频和音频，当然要放啊，放出来就GG了，视频那速度快的都不知道是几倍速，我之前试了一下delay了25个ms大概才是正常的速度，这样明显不行嘛，所以我们就需要进行音视频同步。

对于音视频同步我用的最常用的方法，就是视频等音频，毕竟视频放的那么快。那么它们同步的标准呢，就是一个叫做pts(显示时间戳)的东西，当我们读了一个音频和一个视频frame的pts后，比较一下，如果视频的pts大了，证明视频快了，就让它delay一下：

while (1) {
    if (decoder->isStop) {
        break;
    }

    double audioClk = decoder->audioDecoder->getAudioClock();
    pts = decoder->videoClk;

    if (pts <= audioClk) {
         break;
    }
    int delayTime = (pts - audioClk) * 1000;

    delayTime = delayTime > 5 ? 5 : delayTime;

    SDL_Delay(delayTime);
}

因为pts的单位是us，一般延时有ms级别就够了，反正人眼就这么瞎，快了也看不出来，就像打游戏一样其实上了30FPS和你300FPS效果都是差不多的，不过最好就是电脑显示屏的刷新率60Hz就enough了。而且一般的视频帧率也就是25左右，所以用ms级的delay妥妥的。

至于其他的界面和播放控制请参考我的代码（写的差，见谅，还有就是要吐槽CSDN，自己上传的资源自己还不能管理这是什么道理，我这传的是用sws进行视频图像转码的，需要参考avfliter的同学请移步GitHub，我就懒得传2遍了）：

CSDN：
http://download.csdn.net/download/q294971352/10104287

GitHub：
https://github.com/DragonPang/QtPlayer