Both quality and efficiency: the innovative "top design" of media services

To do media services, one must have abstract thinking engraved in the bone marrow.

The wave of video is turbulent, the generation artificial intelligence AIGC is iterative at a high speed, and the experience requirements and application scenarios are becoming more and more diverse...Facing the transformation of "video productivity", can we gain insight into the "real needs" of the audio and video industry through the complicated appearance?

Is there an elegant media service design that meets the needs of multiple parties? How to "land" to realize the value? What is the key to maintaining "sustained vitality"?

With the capabilities of AIGC and large models, how will the "full intelligence" of media services evolve?

This article was planned and interviewed by IMMENSE, Zou Juan, head of media services of "Alibaba Cloud Video Cloud", and LiveVideoStack.

Revisiting "Real Needs"

What is the real "need" of the big video industry?

In the field of video, the key is nothing more than the production and consumption of video. Then, regarding the real "demand" of the big video industry, we can also discuss it from the two dimensions of video production and video consumption .

For the video production side, fast and efficient video production can seize the opportunity to release and attract viewers ; while providing better quality, innovative and comprehensive experience video content can retain viewers .

For the video consumer side, the most important thing is "experience": the subject matter is novel, interesting, rich in content and has a "sense of gain"; the picture and sound of the video have good sensory effects; the information obtained is "first-hand" and the freshest...

It seems that the needs are diverse, but in fact, whether it is production or consumption needs, it can be attributed to two key words: "high timeliness" and "high quality".

Can "high timeliness" and "high quality" be both at scale?

"High timeliness" requires an increase in productivity and production efficiency, which means that more video content is produced within the same period of time, which will also lead to an expansion of scale (quantity, duration, industry, scene).

Under the scale, "high timeliness" and "high quality" seem to be incompatible, but with the advent of the era of "cloud computing" and "artificial intelligence", the situation is quite different.

Cloud computing can not only provide massive, high-concurrency, and flexible video processing capabilities, but also use the optimal organization and scheduling methods for various video services and multiple video scenarios, staggering peaks or running in a mixed manner, which realizes two dimensions of "scale". On top of this, the cloud can perfectly reproduce the high-quality characteristics of a single video, thereby quickly realizing the scale of "high-quality" video .

On this basis, with the continuous development and deepening of AI, intelligent capabilities are more accurate and efficient than traditional human resources in some scenarios, which also contributes to "high timeliness" and "high quality" at scale .

In the new era of digital intelligence, cloud and AI are moving towards deep integration, and with the explosion of AIGC, AI is no longer just used as a single-point capability in a certain link, and everything is evolving towards "full intelligence".

"Top-level design" and "engine"

Faced with the "content production revolution", what is the next step for cloud vendors?

Cloud vendors, a natural To B role. Due to different industries, different business scenarios, and customers with different needs, the required functions, performance, timeliness, and implementation effects are quite different.

Therefore, for cloud vendors, the problems of openness, flexibility, and multi-scenario must be solved.

To expand, the whole link of video starts from collection, goes through production, processing, management, distribution, and consumption. Each link contains a lot of required media atomic capabilities. In the face of different scenarios and customer needs in different industries, the depth and combination of these atomic capabilities are very different.

Therefore, the unified "top-level design" formed after summarization, refinement and abstraction is the "magic weapon" for cloud vendors.

Seeking the source, how do we seek a solution at the "top level"?

First of all, it is to "break up" the media atomic service, and then "reorganize".

Here, first, it is necessary to fine-grained split the media atomic services of the video full link, and make each service in-depth; second, use a set of flexible orchestration mechanism to realize free construction and assembly of these atomic services according to the customer's vision, scene, and business flow.

Furthermore, it is the unified design of the underlying media technology.

The video processing process consists of several main links: decapsulation, decoding, pre-processing, encoding, and encapsulation. We need a "media engine" for downlink algorithms and uplink scheduling to build a unified media processing framework, organize these links, support multiple algorithms, flexibly integrate plug-ins, and process various formats.

The "dismantling" and "reorganization" of media atomic services build a business flow that can be flexibly arranged at the "top layer" of media services, and a unified "media engine" is the cornerstone for media tasks to achieve high timeliness, high performance, and rich functions at the bottom "execution layer".

Finally, between the two, a unified "media distributed service framework and media metadata system" is needed for a layer of connection, which includes: cross-product and cross-scenario unified media resource OneMediaID, unified workflow, unified media business flow message processing mechanism, unified media task pipeline scheduling mechanism, etc.

As a result, a unified set of "media services" has been formed.

Among them, the media engine is a well-deserved "engine"?

We talked about the "media engine", which is the underlying core of the entire media service and the executor of all media processing and media production tasks. It needs to handle not only traditional media processing tasks, but also various AI tasks, so as to truly realize the downlink algorithm and uplink scheduling.

The "media engine" involves both the "arrangement layer" technology and the "kernel layer" technology. The "arrangement" here does not refer to the "arrangement" of business flows, but the "arrangement" of each link of single-task processing and the "arrangement" of operators .

Through a unified pipeline and strategy, the "media engine" can flexibly support multiple parameter combinations for different tasks, and make the execution effect of these parameter combinations achieve the comprehensive best of multi-dimensional weights such as image quality, performance, bit rate, and timeliness.

In addition, the "media engine" is also responsible for the optimal execution strategy for the task.

For example: Is it the entire execution or parallel execution? Is it slice-level parallelism or frame-level parallelism? Do you need to call special components or even use special models? And does the operator have dependencies? ... We refer to this type of decision-making capability of the media engine as the "media worker brain".

Under such brain allocation, the pursuit of the optimal execution strategy of the task is also the continuation of the pursuit of "high quality" and "high timeliness".

Sustained vitality: flexible, open, multi-service

What is the source of the continuous vitality of a platform?

The "top-level design" is repeatedly emphasized, because as a ToB cloud vendor, Alibaba Cloud Video Cloud must solve the problems of multi-service, flexibility, and openness.

We need to consider the personality and characteristics of different customers’ businesses, but we cannot customize all cases-by-case. Therefore, we must have abstract thinking that is "engraved in the bone marrow". We need to summarize, refine, and abstract all the time, and this is true for the design of products, modules, services, and APIs.

Therefore, "top-level design" can prevent each business sector or module from "barbaric development" in its own "comfortable" system, and everything is planned and weighed from the overall perspective .

Looking carefully, the "top design" of media services is first based on existing needs and customer scenarios, and according to the five major modules of media services (media aggregation, media processing, media production and production, media management, media consumption), media capabilities are sorted out and summarized, and based on "reusability", they are further broken into fine-grained atomic media capabilities, and services of different scopes are realized through one or more layers of common abstraction.

For example: in the media production and production module, the media service not only provides the atomic VideoDetext subtitle service, but also provides a more comprehensive editing and synthesis service.

At the same time, it is necessary to separate the relatively fixed and changed parts, provide some built-in media processes in the system, and reduce the difficulty of development for customers . For scenarios where customers want more flexibility , similar programmable scripts or strategies are also designed for customization.

Openness is another concern of the top-level design of media services.

The openness of smart media services is reflected in: In addition to supporting relevant protocols and capabilities of Alibaba Cloud products, it also supports international or domestic standard protocols and protocols and capabilities of some third-party manufacturers.

For example, in the field of low-latency transmission, in addition to supporting its own RTS, intelligent media services also support LL-HLS, LHLS, Dash/CMAF, etc.;

For another example, in addition to supporting Alibaba Cloud OSS as the input and output of media processing services, we also support AWS S3 and HTTP URLs;

In addition, in addition to supporting self-developed audio and video and AI algorithms, we also support access to third-party AI operators that have passed security verification.

We believe that only openness and cooperation can keep technology alive.

Can the "top design" make the "high timeliness" even higher?

When the "top design" helps us break through the multi-service, flexible, and open barriers, it will naturally bring about a higher "high timeliness".

Going deep into it, this includes 4 dimensions of technology:

One is to design and implement a "parallel" processing framework at the level of engineering architecture, split the entire video or timeline Timeline into Splits, perform "parallel" processing and then "merge".

The second is to carry out "performance optimization" for "single-chip" tasks, including algorithm optimization, instruction set optimization, engineering optimization of algorithms at the engine layer, pipeline optimization, and joint optimization of algorithms and scheduling, etc., so that tasks can achieve optimal execution in consideration of multi-dimensional conditions such as source file adaptation, task parameter characteristics, model and configuration, and resource water level;

The third is to optimize the arrangement of media business flows in the "distributed service layer" , so that the activities of the process can be freely connected in a wider range, such as: transcribe while recording, transcribe while broadcasting, etc. This allows different products and services to be connected in series through the same process, so as to speed up the process of cross-scenario or even cross-product;

The fourth is the blessing of "AI capabilities". Whether it is at the algorithm layer, engine layer, or distributed service layer, when processing large-scale video, the advantages brought by AI can be brought into full play, and "high time efficiency" can be further improved .

If everything is infinitely solving the "high timeliness" of media services, then the perfect realization of "high quality" can make more use of AI capabilities at the moment.

AI: the ever-changing "acceleration force"

Can you still catch up with the iteration of AI and the evolution of AIGC?

The development of large models and AIGC technology can be described as "changing with each passing day". Its iteration speed is unprecedented, and various vertical application models have emerged, which also makes the application scenarios of the audio and video industry more extensive and diverse.

More importantly, large models and AIGC can use larger amounts of data, more complex algorithms, and stronger computing support to greatly improve the accuracy and effect of audio and video processing, bringing unlimited imagination.

Before this AIGC storm, our media services had been laid out in advance, allowing AI capabilities to flexibly participate in various intelligent video scenarios, and incorporating AI iterations and AIGC evolution into intelligent "top-level design".

For the evolution of AIGC (taking the field of content creation as an example), starting from the prelude, we have set five stages:

➤ The first stage (prelude): AI is responsible for the preprocessing of the material, and arranges it according to the preset template, realizing the first stage of full intelligent video production.

➤ The second stage: In addition to material preprocessing, it can also complete the editing work (script design/Timeline design) that belongs to the video creative link, so as to realize intelligent batch mixed cutting.

➤ The third stage: For the finished product with specific scenes and specific requirements, AI will reversely deconstruct the sub-shots based on the existing film, and is responsible for the search, screening (and generation of some materials), processing, arrangement, and final synthesis of materials.

➤ The fourth stage: For a specific scene, AI is responsible for understanding the requirements of the scene, including material search, screening (and partial material generation), processing, arrangement, and final production and synthesis.

➤ The fifth stage: For a variety of scenarios, based on massive and rich data, AI can discover creative points by itself, truly possessing "creative power".

To put it simply, AI is gradually penetrating business, from capability to scenario, single case first, then universal, situation first, overall, execution first, creativity first, and completes the evolution of AI from auxiliary business to fully intelligent transformation of business.

It can be seen that AI in the past was only an auxiliary creation, but today's AI can already become the protagonist of creation.

Looking forward, regardless of Metaverse or Web3.0, the prosperity of the next-generation Internet requires massive amounts of digital content, which puts forward higher requirements for the quantity, form, and interactivity of content.

For example, many technologies such as image enhancement and real-scene matting based on large models have outperformed traditional AI algorithms in terms of effect; another example, using Text to Video (converting text into video) to generate a few seconds of empty footage, and Image to Video (converting image into video) to generate a video of continuous action can not only solve high-quality problems, but also achieve a breakthrough of "out of nothing".

In the future, using the capabilities of AIGC, intelligent media services can greatly improve the effect of "one-click film production" in the field of production and production. In intelligent generation, intelligent arrangement of timelines, and intelligent editing and packaging, it will break down the pain points of production and production efficiency and quality one by one. In the field of media assets, AIGC can also be used to generate video summaries, etc., providing more new energy for media asset management. Of course, all-round exploration is underway.

Looking forward to AIGC in the era of large models.

July 28 afternoon

LiveVideoStackCon2023 Shanghai Station

Alibaba Cloud Video Cloud Session

Alibaba Cloud Intelligence Senior Technical Expert

"From Scale to Full Intelligence: Reorganization and Evolution of Media Services"

Let's explore the innovative "top design" of media services together!

RustDesk 1.2: Using Flutter to rewrite the desktop version, supporting Wayland accused of deepin V23 successfully adapting to WSL 8 programming languages ​​​​with the most demand in 2023: PHP is strong, C/C++ demand slows down React is experiencing the moment of Angular.js? CentOS project claims to be "open to everyone" MySQL 8.1 and MySQL 8.0.34 are officially released Rust 1.71.0 stable version is released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4713941/blog/10089648