AI's 100-model battle: two poles are hot, the middle is empty

Who dares to enter a field at will?

"Don't dare to make a move. China hasn't come out with a large model with an absolute advantage. I can't invest in upper-level applications. I'm worried about betting on the wrong one." Investor Jucy (pseudonym) said to Guangcone Intelligence that AI projects are widely seen and invested. Less is the VC norm during this time.

ChatGPT ignited the AI ​​explosion for 2 months, China has been waiting for its own GPT-3.5.

The AI ​​really offended the workers. The game team replaced 30% of the original painters, the e-commerce team used AIGC to generate low-cost digital human models, and the basic programmers also felt the anxiety of being hit by dimensionality reduction... Seeing that GPT is going to use all fields abroad The trend of doing it all over again, technological disruption is rolling in with the taste of money.

Therefore, in addition to anxious workers, enterprises are eager to use large models to reduce costs and increase efficiency, entrepreneurs are eager to integrate large models to launch new products, the stock market is eager to use the concept of ChatGPT to cut leeks, and training institutions are even more respectful to earn a wave first.

Against the backdrop, it seems that the Chinese technology giants who like to chase the trend are more calm than ever.

Sure enough, cycles make people grow, and so do companies.

Finally, it was long overdue, long overdue, and long overdue. In the second week of April, China also ushered in the intensive release of a new generation of large models.

Four days after the open test of Tongyi Qianwen, Zhang Yong made his debut after taking over Alibaba Cloud, announcing that all Alibaba products will be connected to the "Tongyi Qianwen" model in the future for a comprehensive transformation;

At the technical exchange meeting on the 10th, SenseTime demonstrated the capabilities of the "Daily New" large model: dialogue, AI painting, programming, and digital humans. The next day's opening rose by 9%;

The large model of Huawei Pangu made a low-key appearance on the 8th, but released new products on the 10th;

Star entrepreneur Wang Xiaochuan made a public appearance, and joined hands with Sogou's old partner Ru Liyun to officially start a new journey of AI entrepreneurship, and will launch a large model of Baichuan Smart in the second half of the year;

Momo released DriveGPT Xuehu Hairuo, the first large-scale autonomous driving model, introducing human feedback reinforcement learning into the driving field.

Even the game company Kunlun Wanwei came to join in the fun, announcing that "China's first domestically produced large language model that truly realizes the emergence of intelligence" will start an invitation test on the 17th, but it was later questioned by the media that it was using hot spots to hype the stock price.

Lively and lively, true and false, the large model is a bit messy for a while and gradually becomes charming. Why did the big models in China spring up like mushrooms all of a sudden? If you don't reinvent the wheel, what else can you do?

Although it is crossing the river by touching Open AI, the Chinese large models have also entered no man's land.

01. Before Emergence: Following the same trend, and parting ways

If you want to find a time node for the AI ​​​​big model, 2019 should be the key one.

In February of this year, OpenAI launched GPT-2 on the other side of the ocean. At this time, Microsoft generously invested 1 billion US dollars, turning OpenAI from a "non-profit" organization into a "profit-capped" organization. .

About a month later, on the other side of the Pacific Ocean, Baidu released ERNIE1.0, becoming China's first officially opened pre-training large model.

But there are actually many such firsts, such as Huawei's Pangu large model, the industry's first Chinese language pre-training model with 100 billion parameters; such as Ali's M6, China's first 100 billion parameter multi-modal large model; and Tencent HunYuan , China's first low-cost, landable NLP trillion large model.....

In short, as long as enough attributives are added, you can always be the first in a certain field. During that time, from Silicon Valley to Beijing Xierqi, and then from Wudaokou to Shanghai Lingang, all capable companies, including Huawei, Alibaba, Tencent, and SenseTime, began to get involved in the research of AI large models.

But the "emergence" of the first wave of AI large models in China was two years later.

In 2021, Zhang Hongjiang, who once served as the dean of the Microsoft Asian Academy of Engineering and was later personally invited by Lei Jun to Jinshan to replace Qiu Bojun as CEO, led the establishment of Zhiyuan Research Institute to release "Enlightenment 1.0", including the first Chinese-oriented NLP in China Large-scale models, the first Chinese general-purpose multi-modal large-scale model of graphics and text, and the first ultra-large-scale pre-trained model with cognitive capabilities, etc.

Zhiyuan was established in 2018, five months before OpenAI released GPT-1.0. As a research institution led by the Beijing Municipal Government and the Ministry of Science and Technology, and gathering resources from academia and leading technology companies, Zhiyuan is actually an early exploration of AI in China. A representation of the larger model.

It can be said that "Enlightenment 1.0" is actually a sample of all later AI models in China. In addition, Zhiyuan Research Institute has also built a large-scale pre-training model technology system for China, and built and opened the world's largest Chinese corpus database WuDaoCorpora, which laid the foundation for other companies to develop AI large models.

It was also after "Enlightenment 1.0" that China's large-scale models began to blow out.

In 2021, based on Ascend AI and Pengcheng Lab, Huawei will jointly release the large model of Pengcheng Pangu. In 2022, Ali released the "Tongyi" large-scale model series, and Tencent released the Hunyuan AI large-scale model...

While Chinese AI large models are springing up like mushrooms after rain, foreign AI large models have also reached the point of change from quantitative to qualitative change.

In November 2022, OpenAI released ChatGPT based on GPT-3.5, which completely opened the magic box of artificial intelligence, and then came the wave of AI 2.0 sweeping the world.

In fact, if we take the release of GPT-1 in 2018 as a node, the development of China's AI large model has always followed the development trend of foreign countries, but why did ChatGPT not appear in China?

This is actually related to two different development paths of AI large models at home and abroad.

Judging from the current representative AI large-scale model products abroad, such as ChatGPT, Midjourney, Notion AI or Stable diffusion, etc., they are all products based on C-end users.

On the other hand, in China, the main application scenarios of large models are all on the B side.

For example, the typical application scenarios of Ali’s “Tongyi” large model include e-commerce cross-modal search, AI-assisted design, open-domain human-computer dialogue, legal document learning, medical text understanding, etc., while Tencent’s HunYuan-NLP-1T large The model is applied to Tencent's internal products such as advertising, search, and dialogue, or a large model like SenseTime to provide perception and understanding capabilities for common scene tasks such as autonomous driving and robots.

An important reason for choosing To B is that the B side can be more easily commercialized.

The characteristics of the To B industry lead to the fact that China's AI large model does not need to achieve a very large parameter scale. Even after ChatGPT came out, an important direction for domestic companies to discuss is how to "make the existing large model scale smaller." ", applied to specific industries.

Therefore, there will be more large-scale AI models using Google's BERT route in China. With smaller parameters, it will be more efficient and more suitable for vertical scenarios.

So to some extent, from the first day of birth, the Chinese big model has the task of commercialization.

The large models of foreign To C are different. For example, the number of users of ChatGPT reached 100 million in just two months. Its underlying pre-trained large model GPT-3.5 is used as a general large model, and "big" becomes a basic requirement for parameters.

To some extent, this promotes OpenAI to continuously add parameters to GPT, and then stimulates a more powerful "emergence" phenomenon, and finally realizes ChatGPT that "works hard to make miracles".

Therefore, the two completely different development paths of To B and To C have also led the AI ​​models of China and the United States to two completely different development directions.

02. Don’t reinvent the wheel, but everyone wants to be a wheel

So far, China has released 5 AI large-scale model products, and after that, there are 5 more AI large-scale model products on the way.

The model battle has begun.

Most of the domestic large-scale model capabilities are at the level of GPT-2, but the attention is much higher than when GPT-2 was launched, which created an embarrassing situation-knowing that it is not fully ready, but But I have to actively promote the release of the model, and it seems that the entire market will be missed if it is a little later.

Indeed, both the market and the technology itself are asking companies to bring large models to market faster.

Technically speaking, the sooner you enter the market, the sooner you can obtain user usage data, and then promote model optimization iterations. From a market perspective, while the combination of foreign AI large models and industries brings higher efficiency, domestic enterprises also have the same needs.

For example, at present, Lightcone Intelligence has investigated many SaaS companies and found that almost all of them have been connected to GPT-3.5, and they are currently testing Wenxin Yiyan.

For companies that launch large-scale models, it is particularly important to seize market opportunities at this time.

An investor in charge of AI at a leading institution told Lightcone Intelligence, "It is very dangerous for China to be excluded from the ChatGPT ecosystem."

He believes that although there are greater entrepreneurial opportunities in the application layer, all applications in the application layer depend on the existence of large models. Just like in the PC Internet era, all desktop applications are developed based on Windows, and in the mobile Internet era, all apps are based on Android or iOS systems. In the model-as-a-service era, some underlying large models at the "operating system" level are also required. .

At present, GPT-4 abroad has made it clear that it can become such an existence, but there is no corresponding large model in China. Therefore, when the pattern of the underlying large-scale model is not yet clear, once the market pattern of the large-scale model changes, the applications built on the large-scale model will also be in vain.

This has also become the reason why many investors are unwilling to leave the market now. They want to run the market a little longer, waiting for the emergence of a large underlying model that can clearly become an "operating system" level.

Therefore, whether it is Baidu or Ali, after launching the large model, the first thing to care about is whether more companies can reach cooperation.

For example, after clarifying the launch plan of Wenxin Yiyan in February, Baidu began to actively promote the access of enterprises in different industries to Wenxin Yiyan. By the time Baidu released Wenxin Yiyan on March 16, more than 650 companies had announced Access Wenxin Yiyan ecology. On April 7, after Ali officially announced the "Tongyi Thousand Questions", the first thing to do was to open test invitations to enterprises.

Nowadays, the domestic AI large-scale model is in the stage of competition "who can become the underlying operating system". Each company actively launches its own large-scale model, opens internal testing, and guides enterprises to settle in. A core goal is to build its own model around the large-scale model ecology.

This is the key to whether a big factory can continue to be a big factory in the next era. The ticket for the next AI era is not the big model, but the ecology built around the big model.

Therefore, even though everyone keeps saying not to reinvent the wheel, and not to waste resources to build the same large model, but at present, everyone is reinventing the wheel.

But now from Baidu to Ali, and then from Huawei to SenseTime, the war of the bottom-level large models has just begun. After all, it is not just technology giants like Tencent and Byte, but also entrepreneurial giants like Wang Xiaochuan, Wang Huiwen, and Li Kaifu. The guy is also eyeing.

Both Wang Xiaochuan and Wang Huiwen successively settled in Sohu Network Technology Building, and Wudaokou seems to have regained its previous glory.

After all, many felt, "This is a renaissance."

So far, more competitive players have not completely ended, but the "hundred-regiment battle" of the bottom-level model is imminent.

03. AI heat is "polarized", with a vacuum in the middle

Large models make AI companies more and more heavy.

On April 10, when SenseTime announced the "SenseNova" large-scale model system, it also mentioned another key point, which is to rely on the large AI device SenseCore to realize the "large-scale model + large computing power" research and development system.

In order to meet the needs of massive data training for large models, algorithm companies, which could have gone into battle lightly, began to build their own cloud and build their own artificial intelligence data center (AIDC).

Another case is Haomo, an autonomous driving company that built its own intelligent computing center in order to use large-scale model training data.

One of the most important reasons why these vertical AI giants and unicorns have to do so much by themselves is that there are almost no high-performance off-the-shelf products on the market that can satisfy them.

In recent years, the amount of large model parameters has increased exponentially, and the amount of data will also increase on a large scale with the introduction of multi-modality, which will inevitably lead to a sharp increase in the demand for computing power. For example, in the past five years, the parameters of the large-parameter AI model have increased by an order of magnitude almost every year. In the past 10 years, the demand for computing power of the best AI algorithms has increased by more than 1 million times.

A SenseTime employee said that the design power consumption of SenseTime Shanghai Lingang AIDC’s server cabinets ranges from 10 kW to 25 kW, and can accommodate up to four Nvidia A100 servers at the same time. However, the general design power consumption of ordinary server cabinets is mostly 5 kW. The power consumption of a single A100 server is as high as 4.5 kilowatts.

This is especially true for technology giants. Each giant hopes to form a closed loop in its own ecology, to a certain extent, because the entire domestic open source ecology is not strong enough.

At present, the large-scale model industry chain can be roughly divided into three levels: data preparation, model construction, and model products. In foreign countries, the industry chain of AI large-scale models is relatively mature, and a large number of AI Infra (architecture) companies have been formed, but this market is still relatively blank in China.

In China, the giants have their own training structure.

For example, Huawei's model adopts a three-tier architecture. The bottom layer is a generalized large model with super robustness and generalization. On top of this is the industry large model and the deployment for specific scenarios and workflows. Model. The advantage of this architecture is that when the trained large model is deployed to vertical industries, repeated training is unnecessary, and the cost is only 5% to 7% of the previous layer.

Ali has created a unified base for AI. Whether it is CV, NLP, or Vincent graph large models can be put into this unified base for training. The energy consumption required by Ali to train the M6 ​​large model is only 1% of GPT-3.

Baidu and Tencent also have corresponding layouts. Baidu has a Chinese knowledge map covering more than 5 billion entities. Tencent's hot start course learning can reduce the training cost of trillions of large models to one-eighth of cold start.

On the whole, although the focus of each major factory is different, the main feature is to reduce costs and increase efficiency, and this can be achieved largely because of the closed-loop training system of "hands-on-hand".

This model certainly has advantages within a single large factory, but from an industry perspective, there are also some problems.

The mature AI industry chain abroad has formed a large number of AI Infra companies, some of which specialize in data labeling, data quality, or model architecture.

The professionalism of these companies enables them to do better in terms of efficiency, cost, and quality in a single link than the big factories themselves.

For example, Anomalo, a data quality company, is a supplier of Google Cloud and Notion. It can realize in-depth data observation and data quality inspection through ML automatic evaluation and generalized data quality inspection capabilities.

These companies are like Tier 1 in the automotive industry. Through professional division of labor, large-scale model companies do not have to reinvent the wheel, but only need to integrate supplier resources to quickly build their own model structure, thereby reducing costs.

But the country is not mature in this aspect. The reason is that: on the one hand, the main players of domestic large-scale models are big manufacturers, they all have their own training system, and there is almost no chance for external suppliers to enter; on the other hand, there is also a lack of domestic With a sufficiently large entrepreneurial ecology and small and medium-sized enterprises, it is difficult for AI suppliers to find a living space outside of large factories.

Take Google as an example. Google is willing to share the results of its own training data with its data quality suppliers to help suppliers improve their data processing capabilities. After the suppliers' capabilities are improved, they will in turn provide Google with more high-quality data, thereby Form a virtuous circle.

The lack of domestic AI Infra ecology directly leads to the high threshold for large-scale model entrepreneurship.

Wang Huiwen proposed an investment of 50 million U.S. dollars when he left the field to do Light Years Beyond. The money was actually calculated for him by Li Zhifei. Specifically, it can be divided into 20 million U.S. dollars for computing power, 20 million U.S. dollars for finding someone, and 10 million U.S. dollars. Do data. This reflects a direct problem. If building a large-scale model in China is compared to having a hot meal, it must start with digging the ground and growing vegetables.

At present, in the upsurge of AI 2.0, an important feature is "polarization": the most popular is either the large model layer or the application layer. However, the middle layer similar to AI Infra (architecture) has a big vacuum.

Don't focus on making wheels, it is also important to be able to make a good screw.

04. Conclusion: Giants & Innovators

The war of words between Wang Xiaochuan and Baidu has become a lively episode in the recent melee of large-scale models.

"Tall, rich and handsome" Li Yanhong believes that China will basically not produce OpenAI, and it will be fine to use giants.

"Straight man" Wang Xiaochuan said that some people in the industry (Robin Li) have never judged the future correctly and have been living in a parallel universe.

In addition to old grievances, this can generally be seen as a confrontation between giants and entrepreneurs: giants like to do everything, while entrepreneurs like to break the rules.

And the success of the technology industry seems to depend more on innovation. After all, from DeepMind, which built AlphaGo, to OpenAI, which released ChatGPT, none of them were hatched from giants.

This is the innovator's dilemma.

For technology giants, it is important to build their own wheels, but why not find and incubate the next OpenAI?

Everyone's support is needed here

Guess you like

Origin blog.csdn.net/youyi300200/article/details/130277044