OpenAI CEO Sam Altman: The era of giant AI models is coming to an end!

35d4bdb6318ac82fe3a9718f1ed336d6.gif

Organize | Su Mi

Listing | CSDN (ID: CSDNnews)

"The era of giant AI models is coming to an end." When this sentence came out of the mouth of OpenAI CEO Sam Altman, the industry was in an uproar.

After all, in the past period of time, because of the arrival of GPT-4 and ChatGPT conversational AI, it has triggered a carnival of AIGC and large models, which is obvious to all. This has also attracted many major technology companies and start-up companies to enter the AI ​​track one after another, launching various large-scale model applications and products.

Now, in an MIT video talk last week, Sam Altman warned that the research strategy that gave birth to ChatGPT is over. It's unclear where future advances will emerge.

What is the meaning behind these remarks?

394fd1aec9465f19b16d9a9e4385108e.png

Sam Altman: We're at the end of the giant model era

In recent years, OpenAI has made a series of impressive advances in language-related artificial intelligence by taking existing machine learning algorithms and scaling them up to previously unimaginable scales.

The latest GPT-4 launched this year can be regarded as one of the most advanced models in OpenAI and even in the industry. According to Wired, GPT-4 may be trained using trillions of text words and thousands of powerful computer chips. , a process that cost more than $100 million.

At this point, Microsoft has previously shared the inside story on its official blog :

Microsoft connected tens of thousands of Nvidia A100 chips together and redesigned the service architecture, which enabled OpenAI to train more and more powerful AI models. At the same time, it also helped it unlock the AI ​​functions of tools such as Bing and Edge. The project has cost Microsoft hundreds of millions of dollars.

However, Sam Altman said at the moment that further progress in AI technology will not come from making the model bigger. "I think we're at the end of the era of giant models, and eventually we're going to make them better in other ways."

In fact, since OpenAI launched ChatGPT in November, Microsoft has used the underlying technology to add a chatbot to its Bing search engine, Google has launched a large model called Bard, and Baidu has launched "Wen Xin Yi Yan" . ", Ali internally tested " Tongyi Qianwen " and so on.

Meanwhile, a host of well-funded startups, including Anthropic, AI21, Cohere, and Character.AI, are devoting enormous resources to building ever-bigger algorithms in an effort to catch up to OpenAI's technology.

The latest announcement from Sam Altman suggests that GPT-4 may be the last major development in OpenAI's strategy of making models bigger and feeding them more data.

In his latest sharing, he also did not say what research strategy or technology might replace it. However, in the previous paper on  the technical details of GPT-4 , the OpenAI research team said that according to estimates, the rewards of expanding the scale of the model will be less and less. Sam Altman has also said that there are physical limits to how many data centers OpenAI can build and how quickly they can be built.

9c71058d691f867cef742838c76550cb.png

Scaling up the model doesn't always work

In fact, looking back at the GPT series models, the parameters are really bigger than one:

  • GPT-2 released in 2019 has 1.5 billion parameters;

  • GPT-3 released in 2020 has up to 175 billion parameters;

  • The parameter volume of the GPT-3.5 model is 200 billion;

  • In consideration of the competitive landscape and the security impact of large models, OpenAI announced that it will no longer disclose the latest GPT-4 model parameters to the public. It is not difficult to guess its huge scale.

However, the model does not mean that the larger the parameters, the better, nor is it a good thing to blindly pay attention to the model parameters. For this point of view, there are actually many experts who agree with it.

Cohere co-founder Nick Frosst, who previously worked on artificial intelligence at Google, said Altman's notion that scaling doesn't work forever sounds right, according to Wired. He agrees that progress on Transformers (the core machine learning model type for GPT-4 and its competitors) goes beyond scaling. In Nick Frosst's view, "There are many ways to make Transformer better and more useful, and many ways do not involve adding parameters to the model. New artificial intelligence model designs or architectures, and further fine-tuning based on human feedback, are many promising direction that researchers are already exploring."

In fact, regarding the scale of model parameters, Robin Li, founder, chairman and CEO of Baidu, also said in an interview with CSDN that hundreds of billions of scale is a threshold, but it is not meaningful to discuss the scale of large model parameters:

Just three years ago, what we called a large model was a large model with parameters on the order of billions. Today when we talk about large models, most people understand that the parameters are large models with parameters on the order of hundreds of billions. This evolution and technological iteration In fact, the speed of evolution exceeds the familiar evolution speed like Moore's Law, which is still amazing.

Baidu's general-purpose large model must be on the order of hundreds of billions. Because this is a threshold, if it is less than 100 billion, there will be no emergence of intelligence, which has been proved by past experiments. However, it is of little significance to announce the specific parameters. After 100 billion, parameters that are not trillions must be better than 100 billion. Before GPT-4 came out, I was optimistic that the multimedia guess was trillions of parameters, and the direction was wrong. The large model does not rely on increasing the parameter scale, but on other aspects, so don't worry too much.

In an early interview with CSDN, Jia Yangqing also said:

Take AlexNet, a convolutional neural network that was very successful in the ImageNet large-scale visual recognition challenge in 2012, as an example. The total number of parameters of the model is 60 million. Its rise has given many AI practitioners a relatively simple idea, that is, the larger and deeper the model or the more model parameters, the better the effect.

But by 2014, GoogLeNet, a deep neural network model based on the Inception module, can achieve the same or even better results on the basis of 6 million model parameters. Therefore, in the field of super-large models, many people create a phenomenon that the larger the parameter scale, the better the simulation effect in order to pursue the promotion effect. Over time, when users are tired of the size of the model, they will find that details such as the structure of the model and the interpretability of the model become more important.

However, this phenomenon is also a typical development process of technological iteration in the field of scientific research, that is, explosive technology attracts countless people to flock to it, and when everyone finds that this direction is too one-sided, they will return to their original position.

Or understand this well, Altman also responded last week that OpenAI does not currently have, and there will be no plan to develop GPT-5 for some time. Finally, what do you think of the large model that pursues the number of parameters is coming to an end?

Reference link:

https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/

9315f448975e89a120b72e72013db076.gif

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/130234038