LeCun throws cold water again: A language model that can only read books will never achieve "human-like intelligence"

picture

Xi Xiaoyao's technology sharing
source | Xinzhiyuan

The problem is not the learning algorithm of the language model, but the inherent limitations of the language itself. Multimodality will lead the next AI explosion!

Since the release of ChatGPT last year, there has been a wave of enthusiasm for large-scale language models both inside and outside the industry. Some Google employees even claimed that the company's internal language model has consciousness.

 Large model research test portal

GPT-4 Portal (free of wall, can be tested directly, if you encounter browser warning point advanced/continue to visit):
Hello, GPT4!

Recently, Yann LeCun, a professor at New York University, chief AI scientist at Meta, winner of the Turing Award, and Jacob Browning, a postdoctoral fellow at the Department of Computer Science at New York University, published a long article. They believe that the limitations of the language itself prevent the intelligence of LLM from being improved.

picture

While language models are becoming more general and powerful, we are less and less able to understand the model's thought process.

Models can achieve very high accuracy rates on various commonsense reasoning benchmarks, but why do they still talk nonsense and give dangerous advice?

That is, why are language models so smart, yet so limited?

The researchers believe that the problem is not the AI ​​algorithm at all, but the limitations of language. Once we abandon the "language is thinking" assumption, we will find that although LLMs perform well, they will never reach a level of intelligence close to humans.

What the hell is a language model?

In the philosophical and scientific research of the 19th and 20th centuries, the mainstream understanding was "knowledge as linguistics", that is, knowing something simply means thinking of the correct sentence and knowing how to relate to the known real network. Connect other sentences.

According to this logic, the ideal form of language should be a purely formal, logical-mathematical form composed of arbitrary symbols connected according to strict rules of inference, but natural language may also require semantic disambiguation and imprecision of.

The Austrian philosopher Wittgenstein once said that the sum of true propositions is the whole of natural science.

Although there is still controversy in the field of cognitive maps and mental images, the linguistic foundation established in the 20th century is symbolism.

picture

This point of view has been accepted by many people so far: if the encyclopedia can contain all known content, then as long as you read all the books, you can have a comprehensive understanding of the world.

Early research on artificial intelligence also followed this idea, defining symbolic operations to bind language symbols together in different ways according to logical rules.

For the researchers at that time, the knowledge of artificial intelligence was stored in a huge database composed of real sentences connected by artificial logic. If the artificial intelligence system spit out the correct sentence at the right time, it could perform symbolic manipulation in an appropriate way. If , it can be considered as an intelligent system.

This idea is also the basis of the Turing test: If a machine can say what it knows at the right time, it means it knows what it is saying and when to apply its knowledge.

picture

But opponents believe that just because a machine can chat does not mean that it can understand the specific content of the conversation, because language cannot exhaust knowledge, on the contrary, language is only a highly specific and very limited representation of knowledge.

All languages, whether programming languages, symbolic logic, or spoken language, are simply a particular type of representational schema for expressing discrete objects and properties and their relationships to each other at an extremely high level of abstraction.

Still, there's a huge gulf between reading sheet music and listening to music, and an even bigger gap between playing technique.

Linguistic representation is more like the compression of certain specific information, such as describing irregular shapes, movement of objects, functions of complex mechanisms, etc. Other non-linguistic representations can also convey information in an understandable way, such as images , recordings, graphs, etc.

language limitations

Language is a very low-bandwidth transmission, and isolated words or sentences convey very little information out of context, and many sentences are semantically ambiguous due to the high number of homonyms and pronouns.

Chomsky suggested decades ago that language is not a clear and unambiguous communication tool.

But humans don't need a perfect communication tool. Our understanding of a sentence usually depends on the context in which the sentence appears to infer the meaning of the sentence.

In most cases, we are discussing the things in front of us, such as the ongoing football game, or facing certain social roles, such as ordering food with the waiter, and communicating some clear goals.

picture

When reading a short passage, the main concern is using common reading comprehension strategies to understand the text, but research has shown that the amount of background knowledge a child has on a topic is actually a key factor affecting comprehension.

It is clear that these systems are doomed to a shallow understanding that will never approximate the full-bodied thinking we see in humans.

It is clear that AI systems are doomed to only a superficial understanding of the world, never approaching the comprehensive thinking that humans possess.

The inherent contextual nature of words and sentences is key to understanding how LLMs operate.

Neural networks usually represent knowledge as know-how, that is, they are highly sensitive to context, and can find both concrete and abstract rules to achieve fine-grained processing of task-related inputs.

In LLM, the whole process involves the system identifying patterns at multiple levels of existing text, either to see how individual words are connected in a paragraph or how sentences are connected together to build a larger discourse paragraphs.

As a result, LLM's understanding of language is definitely contextualized, understanding words not in terms of dictionary meanings, but in terms of the role they play in different sets of sentences.

Moreover, the use of many words, such as carbonizer, menu, debugging, electron, etc., are almost only used in specific fields, even in an isolated sentence, the word will have contextual semantics.

picture

In short, the training process of LLM is to learn the background knowledge of each sentence, find surrounding words and sentences to piece together the context, so that the model can take the infinite possibilities of different sentences or phrases as input, and come up with a reasonable way to Continuing the conversation or continuing the article, etc.

A system trained on all text written by humans should be able to develop the general understanding required for conversation.

What LLM learns is only superficial knowledge

Some people don't think that LLM has the initial "understanding" ability or so-called "intelligence". Critics think that these systems are just better imitated, because LLM's understanding of language is still very superficial, just like pretending to be advanced in the classroom. Students, who actually don't know what they are talking about, are just unconscious imitations of the professor or the text.

LLM has this superficial understanding of everything. Systems like GPT-3 are trained by masking out future words in sentences or passages, and forcing the machine to guess the most likely word, then correcting wrong guesses. The system ended up being able to adeptly guess the most likely word, making it an effective predictive system.

picture

For example, GPT-3 only requires the model to guess specific words and correct them by masking certain words in the sentence, and finally trains to become a prediction system.

But this approach also leads us to a better understanding of language, in fact, for any question or puzzle, there are usually only a few correct answers, and an infinite number of wrong answers.

For specific language skills, such as interpreting jokes, words, logical puzzles, etc., it is actually predicting the correct answer to the question, which in turn enables the machine to perform abbreviation, rewriting, paraphrase, and other tasks that require language understanding.

As expected from symbolic AI, the representation of knowledge is context-dependent, outputting a plausible sentence given the premises.

Abandoning the view that all knowledge is linguistic permits us to realize how much of our knowledge is nonlinguistic
.

However, the ability to verbally explain a concept is not the same as the ability to actually use it.

For example, a language system can explain how to execute an algorithm, but it does not have the ability to execute it; it can also explain which words are offensive, but it cannot be used.

Further analysis can also find that the attention and memory of the language model is only for a short period of time, and it is more inclined to focus on the first two sentences or the next sentence.

When it comes to complex conversational skills such as active listening, recalling and revisiting previous conversations, sticking to a topic to make a specific point while avoiding distractions, etc., the memory deficit of the language model is exposed, just a few minutes of chatting You will find problems such as inconsistency between the front and rear calibers.

If you retract too much, the system reboots, accepting new perspectives, or admitting that it believes everything you say, and the understanding necessary to form a coherent worldview extends far beyond the knowledge of language models.

more than language

Although books contain a lot of information that can be decompressed and used, information in other formats is also important. For example, IKEA’s instruction manual has only drawings and no text. Researchers often read the diagrams in the paper first, and then browse the text after grasping the structure of the paper; Tourists can follow the red line or green line on the map to navigate in the urban area, etc.

Human beings have learned a lot in the process of exploring the world. A system that only accepts language training, even if it is trained to the end of the universe from now on, will not be able to have intelligence close to human beings.

Language is important because of its ability to convey large amounts of information in a small format, especially after the invention of the printing press and the Internet, that can be easily replicated and applied on a large scale.

But compressing linguistic information doesn't come for free: it takes a lot of effort to decipher an obscure text.

Courses in the humanities can require a lot of extracurricular reading, which would also explain why a language-trained machine can know so much and know so little.

It has access to all human knowledge, but every sentence in the book contains a lot of information, and it is still difficult to understand.

There are no ghosts in language models

Of course, flaws in language models don't mean machines are stupid, only that there are inherent limits to how intelligent they can be.

In many cases, we don't actually need a human-like agent, for example, we won't apply the Turing test to another human, force other people to do multi-digit multiplication, etc. Most of the talk is Just chatting.

Language may be a useful tool for us to explore the world, but language is not the whole of intelligence. Deep "non-linguistic understanding" ability is the basis for understanding language, which can deepen our understanding of the world and allow us to understand what others are saying What.

This kind of nonverbal, context-sensitive, biologically relevant, embodied knowledge is what AI researchers are more concerned about than linguistics.

Large-scale language models do not have a stable body or long-lasting attention to perceive the world, and the world that can only be learned from language is very limited, so the common sense learned is always superficial.

Guess you like

Origin blog.csdn.net/xixiaoyaoww/article/details/132622698