Preface
The 2024 Zhongguancun Forum Annual Meeting and Hard Technology Investment and Development Forum was recently held at the Zhongguancun International Innovation Center. 180 investors, financiers, entrepreneurs, industry experts, and relevant government leaders from all over the world, focusing on "capital globalization and technology overseas", "establishment and failure of hard technology investment", and "new global practice of hard technology", In-depth exchanges and discussions were conducted.

Han Geng, deputy secretary-general of the Beijing Municipal People's Government, said in his speech that in the journey of the new era, the innovation and development of hard science and technology, as the core element of new productive forces, has become an indispensable and important force in China's modernization drive. The number of unicorn companies in Beijing has always ranked first in the country, of which hard technology unicorn companies account for more than 60%. Facing the future, Beijing has laid out 20 subdivided industries in 6 major science and technology fields, and established 4 tens of billions of science and technology innovation funds. These funds will focus on strategic emerging industries such as artificial intelligence, robotics, and medical health.

Han Geng, Deputy Secretary-General of the Beijing Municipal People’s Government
Alluxio, as a new generation AI data platform that has attracted much attention in the world, chief architect Dr. Fu Zhengjia was invited to attend the forum and met with outstanding innovation leaders in 2023-2024 in the TED show with the theme of "New Global Practices of Hard Technology" , including Wang Shaolan, President of Zhipu AI, He Huajie, President of Pathfinder Group, Bu Xiangwei, co-founder and co-CEO of Oriental Space, Zhao Yongjie, CMO of Origin Quantum, and Zhong Haizheng, founder of Zhijing Technology, shared the latest hard-core technologies and technologies of various companies. Apply practice.
Alluxio shares the topic:
"New Generation AI Data Platform"
Dr. Fu Zhengjia, Chief Architect of Alluxio
Accelerated evolution of AI vs. data management challenges
AI is currently evolving rapidly and has received widespread attention in both vertical and horizontal fields of various industries. Ray Kurzwell, an investor and futurist from the United States, predicts that “artificial intelligence will reach the level of human intelligence in 2029, and by 2045, the capabilities of biological machine intelligence created by smart technology and human civilization will be expanded by a billion times; NVIDIA CEO Jen-Hsun Huang believes that we are witnessing a surge in demand for global data center restructuring. This decade-long recycling and utilization of existing data centers will eventually lead to a transformation journey of accelerated computing. We are also learning more and more. Many companies are beginning to prepare or are using AI to empower their businesses by training models and applying them to actual businesses to improve production efficiency and create greater value.
At the same time, with the new generation of AI starting from ChatGPT, the model structure has become more and more complex, the number of parameters involved has become larger and larger, and the computing power requirements have also increased. This trend has become increasingly obvious. Therefore, it is generally believed that the three most important cores for the development of AI are computing power, algorithms and data. However, everyone often ignores the importance of AI infrastructure construction. Practice has proved that only by building the AI infrastructure well, the three core aspects of AI can be solved. Only with great core abilities can they be put into better use.
Enterprises also face a series of challenges when building AI infrastructure:
At present, domestic enterprises generally encounter the problems of GPU scarcity, high price and low utilization rate. But even if the problem of GPU can be alleviated, another problem will follow, which is how to better manage and serve the data of GPU. Computing, how to achieve the data access efficiency required by the GPU (when data IO becomes a bottleneck, the GPU utilization will become insufficient, and you need to wait for the data to be loaded into the GPU before training can be performed);
Algorithm and business leaders often require faster model construction cycles and iteration speeds. We also see a very clear growth trend in data, such as data collection for intelligent driving and driverless vehicles, and after the collection and annotation of data from various industries. , many companies need to prepare for the growing size of data.
The growth of scale has two dimensions. On the one hand, it is the growth of the entire data volume. For example, how many billions of pictures and how many voices are there? In particular, in addition to large language models, there are also multi-modal, text-based pictures, text-based videos, and various models. Training requires data preparation, so the amount of data will continue to increase.
On the other hand, the size of the data itself is growing. A few years ago, we saw that a face recognition picture might only be 100KB or 200KB. But now we are seeing video pictures, 4K high-definition pictures, and a picture is 1MB, 4MB, 8MB, the data itself is constantly growing, so when these two dimensions are multiplied, the size of the entire data grows at the square level.
Therefore, we need a better data storage and data I/O solution for the entire training platform to make the training effect better.
Alluxio Solutions
There are some solutions on the market that can meet the needs to a certain extent, but will bring a lot of problems, especially some solutions originally aimed at supercomputing centers, such as commercial storage, which are very expensive, but they are not designed to solve the current problems. Designed for the challenges faced by typical AI scenarios.
Therefore, Alluxio hopes to use a high-performance distributed data access platform to better solve the problems faced by the data platform and data I/O in the entire AI. Alluxio is between computing frameworks (training platform frameworks), such as Pytorch, TensorFlow, Ray, and data storage. It can coordinate and orchestrate slow storage and computing power frameworks. We also call it a distributed data orchestration tool.
Through Alluxio, data can be brought closer to computing nodes, such as GPU and CPU computing power, and the data can be quickly and automatically separated from hot and cold, so that the data can be quickly acquired by GPU training tasks. At the same time, Alluxio can combine different types of underlying data sources to form a cost-effective and high-ROI overall solution composed of low-cost cold storage and Alluxio hot cache.
The first scenario that this solution mainly solves is when an enterprise has its own data, which cannot be placed on the cloud because it is relatively sensitive. It can only be placed locally. At the same time, the local computing power is insufficient and needs to borrow GPUs from other data centers. Computing power, at this time, enterprises need a solution that can support flexible GPU deployment and flexibly utilize data and computing power scheduling. Alluxio can cope with such scenarios very well.
The second scenario is that after model training is completed, it needs to be distributed to online inference clusters. There are a large number of inference clusters that need to frequently update models. There may be I/O bottlenecks in the inference deployment process. In this case, Alluxio can be very efficient. Solve problems encountered during inference deployment.
The value Alluxio brings
Overall, in the new generation of AI training platforms, Alluxio can not only provide acceleration services for the entire training process, but also provide better inference and distribution when the trained model is deployed to the inference cluster. With the ability to accelerate, this overall solution can quickly deploy and use Alluxio compared to purchasing additional very expensive hardware. Enterprises only need to use standard low-cost hardware, truly achieving cost reduction and efficiency improvement.
Through test verification, we can intuitively see that a training task takes 85 minutes without Alluxio, but only takes 17 minutes with Alluxio, and the efficiency is increased by 5 times. DataLoader accounts for 10% of the entire training time. The ratio has also dropped significantly from 82% to 1%. The value brought by this is that Alluxio can increase the utilization rate of GPUs that enterprises spend a lot of money to purchase from the original 17% to 93%, which not only greatly improves the enterprise's infrastructure ROI , while accelerating the final business launch.
Currently, Alluxio is being widely adopted by enterprises and institutions in various industries around the world. We look forward to working with everyone to accelerate the evolution of AI and more efficiently bring return on investment to enterprises.
✦
[Add assistant to get more information]
✦

✦
【Recent Popularity】
✦
✦
【Baodian Market】
✦




This article is shared from the WeChat public account - Alluxio (Alluxio_China).
If there is any infringement, please contact [email protected] for deletion.
This article participates in the " OSC Source Creation Plan ". You who are reading are welcome to join and share together.