Basics and Practice of Video Quality Evaluation

a82b4fea99c7a0c46bf2eac95595e437.gif Click "LiveVideoStack" above to follow us

373e1cf6c403bb54d094328adce54e91.png

▲Scan the QR code in the picture or click to read the original text

Learn more about Audio Video Technology Conference

Editor's note: As a popular basic research in recent years, video quality evaluation has gradually been implemented in various business scenarios. In this open class, we invited Mr. Zeng Kai, the co-founder and chief researcher of SSIMWAVE, to introduce the basic concepts and related algorithms of video quality evaluation in detail. Taking the end-to-end video quality monitoring system as an example, he explained the quality evaluation solution The application and benefits of the program in actual implementation.

Text/Zeng Kai

Organize/LiveVideoStack

Hello everyone, I am Zeng Kai. In 2020, I shared with you some understanding of quality evaluation and the journey of founding SSIMWAVE. After two years, we have some new insights, so today we will share with you some of our latest feelings from the technical and product levels.

I went to the University of Waterloo, Canada in 2009 to study for a Ph.D. After graduating with a Ph.D. in 2013, we commercialized the research results on video quality evaluation or image quality evaluation. I personally have been researching in this direction for more than ten years. In fact, I have been deeply cultivating in this direction when I was studying for a master's degree at Xidian University. In 2013, we felt that this was a very useful direction. At that time, we had already had some contacts with some big companies in the industry, such as Facebook, Apple, etc. We saw that many technical departments were interested in this aspect, and we also felt that it had commercial value, so we talked with my Ph.D. It has been nearly 10 years since my tutor and another Ph.D student founded SSIMWAVE. It can be said that we have been more successful than when we started, and now we have acquired dozens of large customers. The current business model is mainly ToB, our customers are mainly radio and television, OTT or streaming media companies, and our core value is the quality evaluation algorithm.

Direct feedback from customers: The quality evaluation algorithm is what we can provide but others cannot. The core value of the algorithm is that it can save time for them. In addition to saving manpower, it is more about improving work efficiency. On the basis of the algorithm, we transform it into a practical technology, then polish it into a product, and then package it into a quality monitoring solution and push it to customers. At the beginning, the client was most interested in the quality monitoring of the live video broadcast, because the live broadcast is very sensitive to the quality, and if many things go wrong, they are wrong, and there is no way to come back and correct them. In response to the customer's pain points at the time, we hope that before the problematic live stream reaches the final consumer, the customer will know that there is a problem, so that it can be corrected in a timely manner or even automatically. So in the beginning, we mainly made live broadcast products, which required our products not only to monitor video streams in real time, but also to cover thousands of channels. After that, we expanded the product to be suitable for on-demand services, and made it a cloud-based SAAS service. Now we are improving, not only the diversification of products (supporting on-demand, live broadcast), but also supporting video on this basis. analyze. Then provide some Video Analytics. At the customer level, we are also as diverse as possible. Because the main customers are radio and television, mainly in the European and American markets, the number of customers is still limited, so we have gradually expanded from large customers to small and medium customers, making our customer base larger and more diversified.

Speaking of the company's name, SSIM in SSIMWAVE is actually the abbreviation of Structure SIMilarity. At that time, we thought that the name of the company should at least include this part. In the latter part, because we are doing streaming media, we also thought about SSIMSTREAM and SSIMFLOW, and finally thought that SSIMEWAVE would be cooler. Then it was named like this, and this is really useful, especially when we go out to participate in some exhibitions, many people probably know SSIMWAVE when they see SSIM, because SSIM itself is actually very famous, so it will still be helpful.

3f84ead1cb00b5b9459edefb97bb4401.jpeg

What I want to share today is the basis and practice of quality evaluation. It is mainly introduced from three aspects: first, the basic concept of quality evaluation, because not everyone understands what quality evaluation is all about. Then the second part is the quality evaluation of some existing algorithms and popular practices. The third part is our monitoring practice. This part will talk about the feedback from customers in the process of our contact with customers. Finally, a brief summary.

01 Introduction to Video Quality Evaluation

Let's first talk about what is video quality evaluation.

e0fc5e762c01abcc29bf33c6fa4dbab6.jpeg

Video quality evaluation means that we use human eyes to give a score to the quality of the video. Whether this score is five-point system or hundred-point system, we hope to have an objective quantitative algorithm or a score that can tell us how the video looks better. Well, it's still worse. So there are actually some more complicated concepts here, such as how do we define quality? Because when people watch a video and think it is better or worse, we have our own feelings, but if you want to use a mathematical model or calculation formula to quantify it, it is more complicated in itself, because it is subjective Sex is too strong. Quality may include sharpness, how to use the formula to calculate sharpness is also a problem, and then quality also includes the richness of color, contrast and so on. Now any display supports users to adjust various modes, but it does not have a button that can improve the quality with one click. This is actually because we have nothing that can be adjusted once to make everyone think that its quality will become better. good.

The research direction of quality evaluation is generally divided into subjective quality evaluation and objective quality evaluation. The so-called subjective quality evaluation is to collect subjective scores. Its task is to study how to do subjective experiments so that the subjective scores we collect are accurate and the efficiency of the experiment is high. The subjective score is actually the video quality score. We conduct a subjective experiment and invite some people to sit in a controllable environment like the picture in the lower right corner, whether it is an office environment, a movie theater, or a scene designed based on an actual algorithm application. We just sat him there and let him relax and watch a video normally, and then told me how high or low he could rate the video, which must have been specially prepared by us. So we can think that this subjective experiment is actually a very time-consuming and labor-intensive thing, and it is difficult to implement on a large scale. For example, we watch hundreds or thousands of videos every day, which is unrealistic. So subjective evaluation is often only used in some specific situations, but when we do research, it must be done. Because in fact, it is the benchmark of quality, and the score it collects is called the Golden Score. Because if we want to research or develop objective algorithms, this Golden Score is essential. In fact, the score output by the algorithm is to approximate the subjective score. So the subjective score is what we call Ground Truth, without it, the objective algorithm we designed has no way to measure its performance.

In the process of existing customers we contacted, many customers actually used our algorithm to truly replace their previous subjective experiments. Let me give you an example. For example, we have a Canadian video operator named Rogers. When the video equipment is upgraded, if they make a big move, they must do a large-scale subjective quality evaluation in the laboratory. It is because, for example, they want to change to a new video encoder or transcoder, because their customer base is large, and the influence of this encoder is very wide, so when they actually launch this new coder and transcoder, they will They check the performance of this transcoder in the laboratory. Then he will actually invite, for example, 100 people to rate the videos transcoded by the new transcoder and the old transcoder to see if the quality has really improved. When we conquered this client, they also conducted corresponding subjective experiments, and compared the objective scores generated by SSIMWAVE's algorithm with the scores of their subjective experiments, they felt that the accuracy rate was high enough, so they were willing to use SSIMWAVE The product.

Objective quality evaluation is actually more research direction, because the ultimate goal of our own quality evaluation is to develop a set of software to replace human subjective scoring, which can be fully automated, save time and effort, and accurately predict subjective quality scores. Then it is a better algorithm. However, it is often difficult to improve the accuracy rate, which is affected by many factors.

The value of quality evaluation itself: On the one hand, we can replace the human brain with algorithms, which can improve our production efficiency. Because no matter when, as long as it can replace human labor, it will definitely see an increase in productivity; on the other hand, it is to unify and improve the measurement of video quality. In fact, in addition to replacing manpower, the research direction of quality evaluation has a deeper meaning: the video quality we often say is a relatively subjective thing, so if we can unify the quality with a quantitative algorithm, Then its meaning is similar to what we call Qin Shihuang's unified weights and measures. Of course, Qin Shihuang has the right, but we don't have the right, but we unify various standards and let everyone recognize the same standard. The operating efficiency of the whole society is not just our industry.

Because we all serve big customers, their own video transmission systems are actually very complicated. From satellite reception to data center transcoding and encryption to subsequent distribution, in fact, various departments often operate independently, that is, they rarely sit down and communicate with each other about the quality. Each of them has their own understanding of quality, so If there is a set of quantitative algorithms or scores that can be generally accepted by them, the efficiency of cooperation between them will be greatly improved. The last is system optimization. This is, for example, the optimization of the video encoder or the optimization of the video network, or even the optimization of the video displayed on the terminal, which can be shared for a long time. I want to say that if there is a new and effective algorithm for video quality evaluation, there are many places that can be optimized, such as video codec. Of course, the encoder can be optimized internally or externally, this is all a detail. The internal optimization of the encoder has higher requirements on the quality evaluation algorithm, but it can fundamentally improve the encoding efficiency, which is equivalent to saving more bandwidth at the same quality. Of course, we can also optimize outside the encoder, which is equivalent to telling the operator how to set the parameters of the encoder. People who use encoders often say, I put my encoder there, and then after I adjust the parameters, the configuration will remain unchanged. Any video is processed with such a set of configuration parameters. Now everyone knows to use A so-called content adaptation to adjust the bit rate and adjust the resolution allows the system to allocate less bit rate to simpler videos, thereby saving bandwidth. Quality evaluation is an important dimension in this optimization. How much I save depends on how much the bit rate drops to an acceptable quality, which can be optimized.

In terms of network CDN, many companies use not only one CDN, but multiple CDNs. Using different CDNs, the quality of transmitted video is switched between different CDNs. Even in terms of the network, if you can control it, the encryption of the network itself and the error correction coding module can be adjusted. All of these things can also be optimized for different goals. Therefore, quality has penetrated into all aspects of audio and video processing, transcoding, and compression. There are many places where it can be used to improve efficiency. The above is a general introduction to quality evaluation.

3b807a2308a2212732bd1c9edb2e755a.jpeg

Why has quality evaluation been studied for more than ten years? Because it was a relatively popular research direction before, and now people are slowly beginning to hope to land in the real industry, but the research is still going on. So why does it have so many things to study? We can take a general look at what will affect the quality of the image. At the top of the picture is the image quality. We want to give it an evaluation, and the image quality of this evaluation can be roughly divided into four categories: the first one is the type and characteristics of the video, or what the video looks like. Because we know that the reason why the video system is very complex is because the video is diverse. Such as UGC, PGC, BGC and so on. Videos may be different movies, animations, landscapes, news, football, basketball, etc. all kinds of videos. The complexity and types of all videos are different, so different types of videos will directly affect the evaluation method of video quality. Because it will be processed differently in the middle, it will directly affect the quality of watching the video. The impact of the intermediate processing is also very, very large.

The process of video distortion is relative to fidelity, if we consider the quality of the video source to be the highest. Of course, the actual situation is not necessarily true, because this piece also involves enhancement, especially for UGC. Our video service providers may find ways to enhance the quality of videos uploaded by users, so as to achieve higher viewing rates, etc. However, no matter what, the processing, compression, and transcoding in the middle, almost any parameter change will affect the final image quality. For example, the code rate and resolution of transcoding, whether I use denoising, enhancement, or deblurring when processing, each processing is equivalent to a filter, there may be several or even a dozen different parameters . And these parameters are actually to adapt to different video types, so this is also a very complicated thing, but it is also a key factor that affects the final image quality.

The last two are the environment of video consumption and the type of consumer. This actually affects the Quality of Experience of the image, that is, how it really looks and feels. Consumption environment For example, what kind of device do I watch on, whether people’s eyes are far or near from the device, what is the environment around me, whether I am in a small dark room, or watching under the covers before going to bed at night, or Say I'm watching it in a movie theater. In fact, people feel different about the quality of the same video in different environments.

Finally, consumers are roughly divided into ordinary consumers, video enthusiasts, and algorithm experts, because different consumers have different sensitivity to video distortion. Ordinary consumers sit on the sofa in the living room and watch TV, so our tolerance for many distortions in videos is relatively high. But some video enthusiasts and enthusiasts want to see 4K or 8K videos. If the video is of poor quality, they would rather not watch it, or wait another month for a new version and a higher-definition version to watch. And video lovers will also look for ways to upgrade their TVs or viewing devices. Algorithm experts are more technical experts engaged in the video industry. If we have a good video quality evaluation algorithm, it will greatly promote the business of Internet video companies. Because students who do video algorithms are not exactly the same as students who do QA in terms of quality understanding, but if there is a score that everyone can accept, it can avoid many problems of efficiency decline caused by inconsistent cognition. The role of algorithm experts exists in many companies, including large European and American companies. There are not many people, sometimes there may be only a few or at most a dozen people, scattered in various groups, we call them Golden Eye. Whenever there is a new software version or new configuration parameters that need to be launched, these Golden Eyes will be checked. They will compare carefully on the right device. What is the quality of the final video? So their sensitivity to distortion is very, very high, and their rights are also great. If they say that your quality is worse than before, then your new one, whether it is software or configuration, basically cannot be used. .

The above are all factors that affect image quality. If we want to design all factors into the algorithm, we can imagine that there are many aspects to consider, which is equivalent to solving a problem in a high-dimensional space with very high dimensions. That's why it's more difficult.

36bef42b8cbfcbe730f1079063ee3fd1.jpeg

A further challenge in video quality assessment lies in the quality of the collected subjective quality scores. For example, the HBO logo, is the image quality good or not? I believe that most people will feel that the quality is OK, because it looks like this. But a lot of people, especially image processing people, will say it's all noise. If this is detected, many people will ignore it as a corner case. Moreover, different people subjectively think that the quality is different. Including the example below, do we like pictures with high contrast, or pictures with less contrast but with many details? What do most people think? To answer this question, we have to do subjective experiments. When we do subjective experiments, we need to make some assumptions to limit the application scenarios of the quality evaluation algorithm, and then find the corresponding people who can subjectively score videos to do a more scientific subjective experiment to unify the video quality. Once we have collected enough subjective scores and enough data, we can develop objective evaluation algorithms on the corresponding data. In this way, the performance of the developed objective algorithm is relatively stable, and its robustness is also relatively good. If we encounter some challenges in practice, such as different people's recognition of scores, we have reason to believe that the result that agrees with most people is correct, because we have enough data. The person we say disagrees with is not wrong, it's just that he doesn't agree with the group we want to serve, that is, he doesn't agree with most of the people we want to serve. On the one hand, the subjectivity of the quality score is particularly strong, and it is difficult to adjust to everyone; on the other hand, when doing objective algorithm research, we do not know enough about the human visual system, because we hope to implement the human visual system into the software. If our understanding of it is not deep enough, in fact, we have no way to fully simulate the visual system. There are already many, of course, these are algorithms designed based on experience or knowledge. There are many mechanisms to simulate the human visual system, such as multi-scale analysis, multi-channel, and various mask effects. These are actually derived from Visual Science and the like. Borrowed concepts, these mechanisms are also very successful.

There are many factors affecting video quality, and it is applied in different video transmission stages with different types, so it is very, very difficult to design a general video quality evaluation algorithm. This is why SSIM has become widely accepted. On the one hand, it is because it is a subversive job compared to PSNR; on the other hand, because it works in various scenarios, it is at least better than PSNR. If we design a better algorithm by ourselves now, we may work well in a certain scene, but not necessarily in other scenes. This is also a perception in the process of making SSIMWAVE. In fact, we cannot expect to design an algorithm to solve all problems. We still need to look at the specific needs of customers and try to cover different application scenarios as much as possible. For example, when we encode, decode and transcode audio and video on the server side, an algorithm can work, but on the client side and mobile side, this algorithm needs to be adjusted accordingly, because the scenarios are different. And it will not appear rebuffering or black screen when transcoding on the server side, but these problems are very easy to appear on the client side. The above is the basic background introduction of video quality evaluation.

02 Video quality evaluation algorithm

Enter the evaluation algorithm below.

187e01f5a3ad407017d94db52ceaa91a.jpeg

We know that video quality evaluation is to score videos. It can be scored in a subjective way or in an objective way. Then we can see, how to do it subjectively? It is subjective to collect some benchmark scores, but it also has various inconveniences. But because this has to be done, we now have several international standards. BT.500 should be a few years ago, as long as you do subjective experiments, most people will use BT.500. Now there are some relatively new methods, such as P.913. P.913, the document for subjective experiments, is more complete and keeps pace with the times. The real main method has a lot of details, we will not go into details, including how to prepare the experimental environment, how to collect video sources, how to process and collect time-frame videos, all of which are explained in detail in P.913, and the introduction is very good .

How we ask people to watch videos and in what order to watch them are generally divided into: 1. Only watch one video at a time, and let people score purely. This is called Single-stimulus. Give you a video, and you give me a score; 2. I Let you watch two videos, one after the other, and you can also compare them with Double-stimulus. Generally, the scores scored one after the other are discrete scores from 0 to 4 or 1 to 5, or from 0 to 4. Any value between 100 and 100 is generally an integer. These things are what we need to think about when designing subjective experiments. Because it has different advantages and disadvantages. For example, if you use the Single-stimulus method, I will give you a video, and you will give me a score. This sounds simple and easy to operate, but it is more difficult for subjective scorers. Yes, because everyone's judgment and understanding of the quality of the video is different, and everyone's acceptance is also different. And if it is Double-stimulus, it is relative, and it will be easier for people to do multiple-choice questions. However, you can imagine that the same person needs to watch two videos each time, so its time will be doubled, which creates the problem of easy fatigue. Of course, these are factors that need to be taken into account in subjective experiments, so I recommend interested students to read P.913.

As long as in the field of quality evaluation research, most people in universities and companies are working around objective algorithms. We make the objective algorithm into a software that can automatically give us a score, and this score is in line with the perception of the human eye. Generally speaking, it is divided into these three categories according to whether there is a video source: full reference, partial reference, and no reference. By reference it means lossless or best quality video. Full reference means that all the information of the video source is available. For example, in the video compression scene, because the output of the encoder is the compressed video, then I can look at it in a full reference way. Compared with my input, What exactly changes the quality of the output video. Part of the reference means that I don’t know all the pixels in the video source, but I can do some statistics. For the convenience of communication systems, such as streaming media and client-side quality evaluation, we have no way to get The video source, because the data volume of the video source itself is very large, but if you can get some statistics and compare them, you can also improve the accuracy.

No reference means that we don’t know what the video looks like at all. If you give me a video, I will give you a score. It is said that there is no reference, but in fact we can imagine that when we watch the video with human eyes, we may feel that I know it is good. It's still bad, but in fact, there are references in our human brains. This is also an interesting question we found when we were in contact with customers. In fact, an ordinary person's understanding of quality has a certain correlation with the bandwidth in his area. If the bandwidth in his area is generally low, then his The degree of tolerance for quality will be relatively high. He can give a relatively high score to a relatively blurry video because he is used to it. Although the score without reference is simple and easy to use, it also needs to be adjusted according to the scene.

8d6e665117bc5d020f9439ecf3151fbc.jpeg

The goal of objective quality evaluation is still to predict subjective quality scores. In the past ten years or so, there have been continuous construction of subjective quality evaluation data sets, because many people, especially in the academic field, many university laboratories will make the data sets they build public, including UGC data sets such as Google and YouTube . Some of the data sets have video sources, and some do not have video sources. It is just a video with a corresponding subjective score, and then these videos may be processed in various ways. I divide the dataset into two categories: one is PGC and the other is UGC. From the perspective of quality, the application scenarios of PGC and UGC are quite different, because when PGC is created, there are directors and cameras. Its equipment, scenes, and shooting environment are very controllable, and the approach is very professional. , the quality of its camera equipment is relatively good, so the quality of the video source it shoots is often very high. Therefore, the quality of video such as PGC is mainly due to the problems introduced in subsequent video transmission or processing, and the quality of video sources is often relatively high. But UGC is different. UGC is User Generated Content. Anyone can shoot a video with a mobile phone anytime, anywhere and upload it. The different gears of the mobile phone and the quality of the camera are different, and the quality of the shooting is varied. The quality of its video source itself is uncontrollable, so there are various quality levels. And its quality may sometimes be relatively poor, and we hope to make some enhancements on the client side, which further increases the complexity of quality evaluation. The quality of PGC can be described in this way. For example, the quality produced by Hollywood studios is the highest, and then there is a monotonous decline process. It rarely needs to be enhanced, and often needs to be transcoded and compressed into different versions. is getting worse. Because video compression will only lose information, we approximately assume that its quality is getting worse and worse, and then it should be the worst when it is watched on the client’s mobile phone or TV. The key difference lies in whose service Well, whoever is worse is less. But UGC is different. Many times the UGC video source is not very good, but the enhancement algorithm on the server side is very powerful. For example, if the motion blur can be corrected, then its quality improvement will be very high. In this case, the assumption of a monotonically declining quality trend is completely untenable, so this area requires specific scenarios and specific analysis.

532e54d5f3a0a36ccb2b1ae41c9a3dce.jpeg

e9b7360a31b3af570465e69fb6c639a0.jpeg

I have listed some data sets, which are basically public and can be downloaded by universities all over the world, but they are relatively large, and interested students can download them by themselves.

2f40a1a085eb5e82454b12995a1a3227.jpeg

If we only look at the upper part of the picture it is Acquisition, Compression, Transmission Over Network and Reconstruction. In fact, it is a process of simplifying the transmission of video streams, first shooting and then compressing and then transmitting. Reconstruction is equivalent to video decoding. In such a process, a full reference can be made before compression and after decompression. If we don't know the version before compression, then use the decompressed version to make a no reference (No Reference) and Reduce Reference is a compromise, because No Reference is a very challenging research direction. Full Reference is relatively more controllable, so someone asked whether it is possible to calculate some statistics as just said, it does not necessarily mean one frame, it can be calculated as one statistic per second, for example, five per second Numbers, send these five statistics to the decoding end, and then do quality evaluation. It is equivalent to a compromise between data volume and accuracy. Of course, the most popular researches now are Full Reference and No Reference. Some references have been accumulated in practical applications, but no references are more common now.

If we use another angle to distinguish these algorithms, we can divide them into knowledge-driven and data-driven. Any image processing can be distinguished in this way. After all, I use knowledge experience or understanding of the human visual system to design some modules to capture Distortion, and then come up with a set of algorithms, which is knowledge-driven. The other one, data-driven, is the way we use neural networks or deep learning. We collect videos, label them, and then do supervised or unsupervised learning to learn a network model for prediction. This is the difference between the two.

17df4f78246c764e443a36a8db3b10c6.jpeg

For objective algorithms, both drivers have their own advantages and disadvantages. People tend to be knowledge-driven and data-driven, but there are also mixed ones. Personally, I think it's good to mix the two because they complement each other. We know some existing mechanisms, especially the mechanism of the human visual system, then we should use it, but more is to let my algorithm see more unknown and different cases, it sees The wider the case, the more general the algorithm should be. And there is another point of difference. Often, knowledge-driven models are more interpretable, because customers will also ask, what makes you think that the quality of my video is low? Is it too noisy, is it blotchy, or is there too much loss of detail? Because different modules have different tasks, you can easily dig in and figure out what is the reason why my algorithm thinks it is of poor quality. Data-driven algorithms are difficult to achieve this. But there are already some models that can be explained more or less through things like attention mechanisms, but it is difficult to explain bit rate, too much distortion, and so on. And the more troublesome thing about data-driven is that it requires a large number of video samples, and it is very difficult to label, because the videos we usually do subjective experiments are all 10-second videos to score a score, then our real algorithm to calculate Sometimes it is often calculated as a score for 1 frame. For example, 24 frames in 1 second, and 240 frames in 10 seconds. When doing data-driven training of the model, it means that the final label score of the 240-frame image is the same. And the quality within 10 seconds also changes, so the label does not necessarily represent the quality of each frame, we need some consideration when training the model. And often when I go to train, it depends more on what your data is like, that is, what is the complexity of your video and other characteristics. This requires us to be more careful when preparing the video to ensure that the various types of videos we want to cover can be covered. The larger the model, the longer the training time, and the more computing power is required. This depends on how our infrastructure is. Generally speaking, people often use this kind of quality evaluation based on knowledge and data. People have made different attempts, but the most common one is to use knowledge-driven algorithms to see what kind of modules can predict quality more effectively, and then use the information captured by these modules as features to input to the training we want In the model, this can also improve its efficiency.

c552a997afb5a18b9375d2405f43018f.jpeg

SSIM and PSNR are knowledge-driven because it can also be written in mathematical formulas. However, BVQA and NIQE are non-reference algorithm models, which are often data-driven. Then the transitional VMAF like the ones in the middle is actually a fusion. It first uses some knowledge-driven models to extract features, and then trains with an SVM model, and finally achieves better results. Therefore, people who do video quality evaluation now use more classic algorithms such as SSIM and PSNR. Now Netflix’s VMA and Apple’s AVQT are commonly used. It is generally believed that SSIMPLUS is better, but SSIPLUS is not open source. , so if you need a video quality evaluation algorithm test, you can try it.

e8876c6c1017a40cf59599f0bf65f5dd.jpeg

So how do we evaluate whether the objective algorithm is good or not? In fact, we often look at these indicators at the end. I only listed 3 here. Of course, there are some others, such as different error calculation methods. However, we usually draw a scatter diagram. The abscissa of this scatter plot is the objective score, the score calculated by the algorithm, and the ordinate is subjective, which is the so-called Golden Score collected by our subjective experiments. Let's see if its correlation is high enough, how is its monotonicity, and whether its mean square of error is small enough. Generally speaking, like this scatter plot, its effect is not very good. This is just an example. It is recommended that the correlation be at least above 0.8, and ordinary people can consider using it in the actual system. This is an empirical value, so these are the things I want to share about the algorithm.

60af219a908b9eafb9a3ac43bf17733f.jpeg

This picture is actually a summary of the evaluation algorithm in the field of academic research. It has traditional research content: full reference, partial reference and no reference, and it can have different models according to different development methods. Now there are more and more new research directions, such as 3D, Screen content (such as video conferencing), VR, multi-view, and 360-degree, especially the recently popular VR, and people’s evaluation of the quality of these videos is also very high. Interested, so from the perspective of academic research, as our new video formats or new video coding standards appear, we all need to update the corresponding algorithms. In fact, as long as video processing is involved, quality evaluation algorithms are actually essential. However, various learning algorithms are also different, which belong to different research directions.

03 End-to-end video quality monitoring practice

Then we move on to the practical part. I just talked about some concepts and algorithms of quality evaluation, which is an end-to-end quality monitoring practice.

9fd0e52f9245b57b0116fad0a70be6ab.jpeg

Let me first talk about why we do this. Let me explain this picture first. It is a simplified version of the video transmission link, which is equivalent to getting it from the video source on the left, going through the encoder, transcoder, packaging, network distribution, and finally the video is displayed on the consumer device. . So why monitor quality? On the one hand, from a broad perspective, the real video service provider is most concerned about bandwidth, because bandwidth directly affects the cost of video transmission. The more video we transmit, the higher the cost. But if we use the advertising model to make money, the volume of videos must be guaranteed, otherwise my advertisements will not be able to be placed. Now there is a contradiction. The more videos I put in, the higher the cost, but the fewer videos I put in, the lower the cost but the lower the advertising revenue, so I need to consider it. But in the same way, after I put out the video, what is my user experience like? If my video is posted, the user feels that the experience is too bad, and he doesn’t pay the bill, and he doesn’t watch it at all, then this means that the business is not doing well. Therefore, it is best to reverse-optimize each module on the video chain and transmission link with the user experience as the center. If we can achieve end-to-end intelligent monitoring, such as setting different monitoring points, as I said just now, if we can effectively and efficiently locate these problems in a live link, find problems in time, and accurately If you can accurately locate the source of the problem, then this is a very good and valuable tool for video service providers. Because if there is a problem with a video, if we don’t find out in advance, when the customer finds out, they will spend at least 8 yuan to solve the problem every time the customer makes a call. 8 yuan doesn’t sound like much, but they have hundreds of dollars. For tens of thousands of customers, making 10 calls per person a year is not a small cost for them. So if the problem can be limited to the source and corrected early, once a customer calls, they will be layer by layer, and different teams will take different measures until the problem is located and solved. If we can have a system of end-to-end monitoring, and ideally locate the problem immediately in a second, the problem can be solved in a few minutes. The value brought by this is huge, so once the monitoring can be implemented, it can save The cost is very much. From another perspective, once we have a better quality evaluation algorithm, we can make large-scale comparisons in the industry. This is what I mean by accurately positioning competing products. Because we often need to make some competing products and compare the same video, what is the situation between our quality and the quality of competing products or our bandwidth and the bandwidth of competing products. In this case, if

de57a4bfa6269676f38a7c5d52929611.jpeg

In the end-to-end video quality monitoring system, what is its output, that is, why do we do this? I just said that we solve problems quickly with the user as the center, which is one aspect. Assuming we have such a system, we collect data 24 hours a day, 7 days a week, and this data is very valuable to people in video service companies like us. We can make a system review, which is to count the viewing experience of each user or even the market, and then classify it differently based on different time, region, and business, so that we can monitor the overall video business market in real time. The business has a clearer understanding. Let me also take the United States as an example. The business covers the east and west coasts of the United States. How about the quality of the east and west coasts? If the quality of the East Bank is relatively poor, but its conversion rate of paying customers is relatively high, then I definitely hope that when I do resource allocation, I will raise the quality of the East Bank to it, so as to attract more users. Therefore, end-to-end quality monitoring can also be used for data analysis of video services. I divide the groups it serves into three layers: management team, operation team, and technical team.

Using a data-driven approach allows my management team to map out how to allocate resources going forward. Should I make a balance in different regions, or should I focus on spending resources on some live broadcast events or have a deeper understanding of my business. Because in the business viewing version, as far as video services are concerned, it is nothing more than bandwidth cost (which is equivalent to my investment) and quality or user experience can be used as output. From this perspective, we can count ROI. But now we often produce quality output without unified data, and it is difficult for you to unify ROI.

For the operation and maintenance team, problems can be quickly located, and then effectively solved to save resources. For the technical team, it is actually a very good tool. For example, when I upload a new algorithm or coder, I can check the online performance of the new algorithm in real time, because for the technical team, one of its pain points is that I The algorithm performed very well in the offline laboratory, but it seems that there is no difference after putting it online. This is a problem. If it is monitored online, it can run directly online, for example, in a small area, and then verify its technical effectiveness. So an end-to-end system helps every layer of the video business.

b1acf85e3c1370ac76d95f9d794bda5e.jpeg

We have many common problems during system transmission, such as still video, no sound, black screen, out-of-sync audio and video. Once the end-to-end quality monitoring is established, we can keep sending modules that capture certain aspects of the problem to It's added. Let me give you an example. For example, there may be insufficient bandwidth in the network. In the media, the audio and video files or audio and video streams may be mislocated or configured incorrectly, resulting in missing streams, or protocol errors that make it impossible to decode, or even network packet loss leading to mosaics. , Blocking, blurring and other distortions, these are actually relatively common system problems.

ae4bc5da9cfb3b7125497aab1c67261e.jpeg

Correspondingly, for example, the upper left corner is actually a black screen picture, not empty, this black screen means nothing can be seen. The blocking effect in the middle is often caused by traditional cable TV packet loss. The upper right corner is the banding effect on the sky often seen in HDR, and the lower left corner is noise. The lower right corner is the most common one. The banding effect, block effect, and blur that occur when the bit rate is too low are all mixed together. They are all distortions caused by too low a bit rate during compression, and we often see it.

6c4747dd6a5af1abfaaba811ed166590.jpeg

Once end-to-end quality monitoring is established, what else can our quality evaluation algorithm do? We need to deploy different quality monitoring algorithms in different places. They may be the same or different, but there are many different Use Cases in the end-to-end process. What we often see, such as the video production end, we call it Source Validation, because we see that the quality of the online video is not very good, then it is a problem during compression, or the quality of the video itself is not good after uploading. In addition, UGC, such as server-side coding system optimization and Per-title Optimization, are very common optimization methods, including the Service switching of CDN I mentioned just now, that is, there are different CDNs that can be switched, and then it depends on whether the quality of the CDN is high or not. Low; on the client side, Smart Streaming is more like HLS or MPEG-DASH, which supports adding some quality-based matedata to optimize the streaming algorithm or packet distribution algorithm, so these are places where a good quality evaluation algorithm can help us , we can use it to further optimize the system.

9c8fb46e5f42ceb2b57fee889dd7ac01.jpeg

We may need to have some different requirements for the quality evaluation algorithm on each different monitoring point, such as in the video source, then it can only be a non-parametric algorithm, because there is no better video. In practical applications, we often only need to judge whether the quality is good enough, whether the quality of this video source can pass the test, in this case, the accuracy required for the entire evaluation algorithm does not have to be so high, and it does not have to be 80 points Or 90 points, so the requirements for different quality evaluation algorithms at different monitoring points are different. No matter in terms of accuracy or complexity, the client can only be non-parametric, because we cannot get the video source, and it requires low complexity and low energy consumption, because it is often on the mobile terminal Running, and it must also be able to take care of the buffering of the system and the switching between different gears.

04 Summary

dc22e75a4c9b5aee1a28d8ad2340661c.jpeg

Finally, a brief summary: There are many factors that affect video quality evaluation. In fact, it is still a developing research direction, and it is now slowly landing in application scenarios. This is still a process. From our point of view, in different application scenarios, people are beginning to realize that we need to do this kind of thing, but what kind of benefits can be brought to us by developing a better quality evaluation algorithm, it seems that it is not very mature. . But we still recommend an end-to-end system. In fact, the so-called end-to-end sounds complicated. We can start from the simple. We just start planning from the complex, but start from the simple. It is often said that we first monitor the quality of the video source, and monitor the quality before the transcoding end is sent to the CDN. The mobile terminal may be more difficult, but we can use different devices. For example, the successful case of our existing customers is that we monitor the TV box for it, which is the HDMI output of the set-top box. In this case, there are several different points of data. Once the data is collected, we can actually see that it involves a lot of effective information. Generally speaking, we admit that quality is not the most important thing, because whether we are creating a new video service or optimizing an existing video service, the quality requirements from 0 to 1 It is often not that high, but it is because the competition in the audio and video field is becoming more and more fierce. Once there is news in the market that a certain operator, if we make a comparison or a ranking, no one is willing to be the bottom one. , because this directly reflects the service quality, if your service quality is not high, the company's reputation or user experience will be very, very poor. I think people don’t seem to have that much motivation to be the best, but no one wants to be the bottom, so this is also a problem we need to consider when selling, how to position this thing, how to dig out the pain points .

The above is all the sharing content of this time, thank you!


▼Recognize the QR code or click the picture below to subscribe to the course

609e376a479ceb99a4bb0ebe25e1316e.jpeg

Click "Looking" if you like our content!ab13e847852438ba89dbb0322b091e3f.gif

Guess you like

Origin blog.csdn.net/vn9PLgZvnPs1522s82g/article/details/128710705