Once a week, an overview of the dry goods in the field of audio and video technology.
News contribution: [email protected].
Geenee AR Provides Virtual Try-on App for Brands and Retailers
ChatGPT powered code reviewer bot for open source projects
ChatGPT can review code: The author uses the open source ChatGPT to make a code review robot that can conduct code reviews and provide feedback on code quality, security and best practices.
https://www.cncf.io/blog/2023/06/06/a-chatgpt-powered-code-reviewer-bot-for-open-source-projects/
Evaluation of TTS models using SQuId
The article evaluates the system performance of TTS. The authors introduce an automated assessment framework called "ManyEars", which can simultaneously process multiple acoustic and linguistic features and use machine learning algorithms to generate objective quality assessment metrics. They also proposed a GAN (Generative Adversarial Network) based data augmentation method to help improve the performance of the TTS model.
https://ai.googleblog.com/2023/06/evaluating-speech-synthesis-in-many.html
Visual Captioning: Enhancing Videoconferencing with Dynamic Visuals Using Large Language Models
This paper introduces a new visual captioning model trained using a large language model to automatically generate descriptions for images. The model may be used in areas such as assisted accessibility input, image search, and automatic image description in the future.
https://ai.googleblog.com/2023/06/visual-captions-using-large-language.html、
The "hyperspectral camera" of Huawei mobile phones
Neuralangelo can generate sculptural 3D structures with intricate details and textures. Creative professionals can then import these 3D objects into design applications, where they can be further edited for use in applications such as art, video game development, robotics, and industrial digital twins.
Capability, Stability and Cost Reduction——Baidu Multimedia Technology Review
The multimedia technology ecology has entered the stock market, and customers need both and become the norm. How to continuously optimize capability, quality, stability, and cost is a compulsory course for every multimedia technology platform. This article takes Baidu Intelligent Video Cloud as an example, and provides an overview of its key capabilities such as RTC, edge computing, and video encoding, as well as its experience in user experience and cost optimization.
How to choose the right microphone correctly?
Summary of audio and video issues--how to be compatible with real-time audio and video encryption?
Audio Format--PCM Introduction
Weakly Supervised Joint Learning for Speech Recognition
Specifically, the approach uses a central server to coordinate model updates for individual clients. The server first extracts as much information as possible from the unlabeled data and combines it with a small amount of labeled data provided by the client to train an initial model. Then send the model to each client, and adjust the model parameters according to the accuracy rate and data distribution of the client feedback. Eventually, the models of all clients are merged to form a global model.
https://www.amazon.science/blog/federated-learning-with-weak-supervision-for-speech-recognition
The Road to Practice of Baidu Video Quality Evaluation
Compared with all previous VR/AR platforms, the emergence of Vision Pro ushers in a new era. From human-computer interaction, to hardware specifications, to operating systems, ecology, and data privacy, Apple has redefined the standards for head-mounted devices.
From an Internet company to a smart terminal solution company, Lu Qiming's transformation may be hard to understand. However, the impact of the economic environment and personal technical difficulties still made him go to an unknown world without hesitation. As Huang Renxun said a few days ago, "retreat" is not easy for smart people. However, strategic retreats, sacrifices, and deciding what to let go are at the very heart of success.
2023LiveVideoStackCon Shanghai Station has entered the full price period
2023 SRT InterOp Plugfest Highlights
In the SRT InterOp Plugfest in 2023, Haivision and YouTube cooperated to demonstrate the high interoperability of video transmission using SRT technology. This demonstrates the capability of the SRT protocol in enabling efficient video transmission between different devices and platforms. Through these demonstrations, people saw how various developers can use the SRT protocol to make the video transmission process more reliable and efficient, and provide advantages that cannot be matched by other video streaming solutions.
https://www.haivision.com/blog/all/highlights-2023-srt-interop-plugfest-with-youtube/
Reinforcement Learning-Driven Low-Latency Video Transmission
LiveVideoStackCon2022 Beijing Station invited Professor Zhou Anfu from Beijing University of Posts and Telecommunications to share with us the relevant research results on low-latency video transmission using reinforcement learning methods.
Deterministic Latency Transmission for Streaming Media: From QUIC to the Future
LiveVideoStackCon2022 Beijing Station invited Ma Chuan from Tsinghua University to introduce the birth of the QUIC protocol, its current expansion results and future development direction.
How streamers should use predictive analytics to improve retention
Benefits of predictive analytics: understand user preferences, behaviors, and needs, and provide more personalized content and services; improve retention rates through in-depth analysis and modeling of data (including the use of machine learning algorithms, data mining tools, and AI) ,increase income.
https://www.streamingmedia.com/Articles/Post/Blog/How-Streaming-Platforms-Can-Harness-Predictive-Analytics-for-Better-Retention-158980.aspx
▲Scan the QR code in the picture or click " Read the original text " ▲
Check out more exciting topics of LiveVideoStackCon 2023 Shanghai Station