"First principles" of large models: the connection between technological innovation and social value

d69a14fd195298703bc13a59e43e73a8.jpeg

As time comes to the third quarter of 2023, the number of large domestic models has reached more than 100, and the "Hundred Models War" has officially started.

For the large model, we have many choices, and it is beginning to show a certain homogeneity. In addition to competing parameters, comparing backgrounds, and looking at the list, is there any other way for us to better judge the value of a large model?

Aristotle believes that any system has its own first principle, which is a fundamental proposition or assumption, which cannot be defaulted or violated.

Seeing the essence through the appearance, you will find that the "first principles" of various large models are very different. And this core difference will also drive the large model to a different development model.

For example, the core of OpenAI is AGI, and the big language model takes the goal of "general artificial intelligence" as the starting point, and it has only recently begun to enter the industry.

Some general-purpose large models, the core of which is "scientific research", have achieved good results on the list, while the supporting tools and computing power infrastructure required for industrial applications are not considered by developers and gradually fade out of mainstream vision.

Some industrial large-scale models, the core is "application", to be quickly competent for some specific task scenarios, adding industry knowledge and proprietary data "special training", but the basic model ability is average, encountering "common sense problems" will lose the chain . 

970f5c9f9b599cffe04227b7e61c0268.png

Tools can be developed, computing power can be purchased, and the core value of a large model cannot be easily replaced.

We peeled off the Jingdong model layer by layer, and saw a core, that is-"industry".

Recently, it coincided with the WAIC and JDD conferences. We had many exchanges with Dr. He Xiaodong, the dean of JD Discovery Research Institute and the president of JD Technology’s Intelligent Service and Product Department. He shared a lot of JD’s thinking on large models in depth.

Taking "industrial value" as the "first principle" of the large model, what kind of differentiated development path will the large model take? We might as well take JD.com as an example to look forward to the future of the industrial model.

683e34712e202b8c3eeca214543894a3.png

The value starting point of the large model

Buffett once said that investing is like rolling a snowball. Find the "long slope" and "very wet snow". Once the snowball starts and sticks to it, it will grow bigger and bigger.

From the perspective of the industry, you will find that the core value of the "snowball" of the large model is self-certified, and there are indeed many problems, such as:

1. The usability of the technology is questionable.

The problem of the last mile of industrialization may seem trivial, but it is the boundary that determines whether the large model can be used or not.

JD.com has been paying attention to large-scale models since 2017, and has paid great attention to the implementation of the industry, and has also encountered some experience and lessons.

Dr. He said bluntly, "Take the technology of brushing the list to show to the business department, and encounter various small problems when using it. It is not usable for users. Later, people don't believe you. No matter what you say You're not skilled enough."

2. The connection of the industry is not sufficient enough.

When the large model is implemented, it is necessary to solve specific problems one by one. It is impossible for these problems to "emerge" in the laboratory. What the industry needs and what are the constraints must be answered from industrial practice and application.

Dr. He Xiaodong believes that the large-scale model cannot be thought out by sitting in the office and patting your head. It must be condensed by the industry.

However, such AI research institutes are scarce by touching academia upwards and taking root in the industry downwards.

3. Return on value is still hazy.

Introducing a large model means that the enterprise will increase various costs and consume a lot of resources. When companies introduce large models, they also hope to obtain products that have been proven to be problem-free after repeated attempts. At present, the value of many large models is self-certified, which is not strong enough.

Dr. He mentioned that the general generative language model has a content accuracy rate of about 83% to 85%. To C users think it is okay, but when it is serious for commercial use, the model accuracy rate must reach more than 95% to reach the enterprise level. Require. "Obviously this activity is 20% off, but the marketing copy generated by the big model says 50% off, this is commercially unacceptable."

It is difficult to do anything, and the large model urgently needs a self-proof and other proof of industrial value.

The cracking method is actually very simple, that is, step by step, step on all the pits that should be stepped on, and solve all the problems encountered one by one.

So at the beginning of the year, when the big language model was making great strides, JD.com didn't follow suit. What was it doing? Busy solving problems.

Jingdong's long slope

abf0446f9e8ff3fce35097443163b422.png

Taking "industrial value" as the "first principle", the first thing JD.com pays attention to when making a large-scale model is not how many orders are signed and when the development conference will be launched, but to consolidate the infrastructure.

The three elements of AI: data, computing power, and algorithms all need to be upgraded in order to support the era of large models, like a "long slope" that allows the "value snowball" of large models to roll forward.

Let's start with the data. Large-scale industrial models generally use public domain data to train the basic model, and then use industry-specific data for "special training", which is equivalent to taking general education courses in middle schools and then learning professional skills in universities. JD.com’s thinking is very different. Yanxi’s large-scale model training integrates 70% of general data and 30% of the original data of the digital intelligence supply chain, and puts the know-how and data of retail, finance, health, and logistics into the base model. Among them, it is equivalent to not only doing general education, but also taking many professional courses, so as to have more understanding of the industry.

Therefore, as soon as JD.com’s large-scale model is launched, it is oriented to knowledge-intensive and task-based industrial scenarios and can already solve real industrial problems.

Let's talk about the algorithm. Algorithms are the core competency of large models and the key to widening the gap in product experience. At present, single-point algorithms are no longer enough to support large models, and large language models include a series of optimizations such as reinforcement learning, hint learning, and pre-training. A systematic technical system and algorithm innovation are also easier to form a moat.

Then there is computing power. Many large models are closed and the number of interactions is limited shortly after they are launched, because the computing power is insufficient or the cost is high, and enterprises cannot afford to use them if they want to. Therefore, whether the large model can continue to be used by the industry in the future, computing power cannot become a shortcoming.

JD.com established the most advanced DGX cluster in 2021, and launched the country’s first ultra-large-scale computing cluster based on the SuperPOD architecture in Chongqing, Tianqin α. The reasoning speed was increased by 6.2 times, and the reasoning cost was reduced by 90%. Training and iteration provide assurance to stay competitive.

b393b022073f8eae4874db1f1f23ee71.png

Deeply cultivating the three elements of AI all the way, JD.com has built a "long slope" for large models from technology to industry. So far, the value base of the large model has been very solid.

544c9f9a6a0b3d5ab390bb4b1097bc6b.png

industry's heavy snow

Looking back at the previous stage of industrial intelligence, many technical capabilities remain on the surface, making it difficult to go deep into the industry or replicate them on a large scale. The value of the large model has become full, and it is necessary to roll up the "thick snow" of the industry and make the snowball bigger and bigger.

In a series of Jingdong large-scale model technologies and practices shared by Dr. He Xiaodong, we can see the various postures of large-scale models sticking to the "thick snow" of the industry:

One is technology stickiness.

Many problems in the implementation of industrial AI are actually caused by technical bottlenecks, such as the inexplicability of deep learning, insufficient generalization ability, and insufficient model accuracy. Through systematic breakthroughs in basic technologies, JD.com has made large models highly available and formed end-to-end product value.

For example, digital humans, based on more than 10 years of experience in intelligent dialogue and accumulation of multi-modal interaction technology, JD Cloud’s multi-modal digital humans only need a small amount of sample materials, and can automatically generate digital humans with rich voice and emotion after 5 minutes of simple shooting, allowing small and medium-sized Merchants and individuals can afford and use digital human services, and high thresholds such as computing power, development cycle, and talents are no longer a problem.

The second is tool adhesion.

At present, many large-scale model manufacturers have opened the MaaS service of API calls, but a problem that is easily overlooked is that adjusting APIs also requires certain capabilities and development work. Many users in traditional industries do not even have the ability to call APIs based on AI. have.

Rich, minimalist, and out-of-the-box tools are an indispensable condition for industrial landing. JD.com's goal is to allow users who have no knowledge of AI to directly use large models, provide full-cycle management from data models to application services, and extend from the basic layer, model layer, MaaS, to application layer SaaS services .

The industry knowledge base of Yanxi's large-scale model open computing platform has deposited more than 100 training and reasoning optimization tools. It takes less than a week to complete the whole process from data preparation, model training, to model deployment.

The third is to practice sticking.

Compared with the written effect on a certain list, enterprises apply large models, and they hope to see the real effect in actual application, and have a clearer perception of technical capabilities and value benefits. JD.com has already carried out a large number of industrial practices, and its advantages are particularly obvious.

At present, the large-scale model is tempered on a large scale in JD.com's internal high-complexity scenarios such as retail, finance, health, and logistics, and the integration of industry solutions is exported to the outside world, which can reduce the worries and costs of large-scale model implementation.

For example, text generation, copywriting in the retail field, and sensitive information review thresholds for each category are different. Based on the rich product data accumulation and large models of JD. The cost of each set of pictures required for e-commerce operations, such as product master pictures, marketing poster pictures, and business detail pictures, has been reduced by 90%, and the cycle has been shortened from 7 days to half a day, and there is no need to worry about risks in the content, because the large models are safe and reliable. The retail interior has been tempered.

Another example is health diagnosis and treatment, which is a scene that requires very high content professionalism and reliability. JD Health applies a large model to provide health assistants and auxiliary diagnosis and treatment, covering professional services for over a thousand diseases, with a cumulative total of over 30 million. Quality doctor-patient dialogue, million-level medical knowledge map. These accumulate.

ff88aa2608866851d45d86593403bfc7.png

Sticking to the "thick snow" of the industry and realizing the technical dividends of large-scale models are the real opportunities that this round of AI boom brings to technology companies.

In JD.com, the industrial model has gradually changed from a technical idea to a clear development direction and an executable action plan, which is leveraging the next possibility of industrial AI.

Embrace one and become the world

Exploring the Value of Large Models

e7e6c83c05a466083af45ec6be362575.png

Keeping calm amidst the hustle and bustle at the beginning of the year, JD.com took the lead in giving the value formula of large models when the "Hundred Models War" approached: the value of large models = algorithm × computing power × data × square of industry thickness.

At the current stage, the barbaric growth of large models has come to an end and is entering a new era of application. Promoting the transformation of large models from "parameter-centric" to "application-centric" is the current core issue. Why does JD.com come later? Perhaps it is the "unsolved mystery" in the minds of many readers.

But through the "first principles", we can see that the differentiation of JD's large models is an "inevitability".

The "first principle" of Jingdong technology is "industrial value".

As Xu Ran, CEO of JD.com, said, every technology developed by JD.com takes industrial attributes as the starting point and industrial value as the goal: technology originates from industrial needs, experiences in industrial scenarios, and creates industrial value.

This "first principle" has created the line division of JD's large model - a high degree of integration of production, learning, research and application.

Different starting points: Different from large scientific research models that "keep their ears open" and industry large models that "only focus on sweeping the door", JD.com cuts into large models from the industrial side, which requires tempering the base model Advanced technology needs to be considered to create value for the entire industry and society. This road is like "waiting for Mount Everest from the north slope", which is more difficult and more valuable.

1b32cde8ce035089cc5a96947ae0d8b2.png

Different pedestrians: When you see the road, you need climbers. JD.com’s technicians, represented by Dr. He Xiaodong, see human fireworks in their eyes, focus on industrial scenarios, and serve JD.com’s own business needs. The technology is polished and verified in real scenarios, and then decoupled to empower other partners; they also have stars in their eyes The sea, Jingdong Exploration Research Institute, when the technology was just exposed on the horizon, it saw that some revolutionary changes might occur in five years, and started a forward-looking layout. The large model is one of them. Modal large models, AGI, etc., are all directions that JD.com is paying attention to.

Different roadmaps: JD.com is not aiming for nothing when it comes to large-scale models. For the application of large models, it has a clear "three-step" plan. The first step is to build a general large-scale model based on internal practice; the second step is to In retail, finance, health, logistics and other internal high-complex scenarios of JD.com, it will be tempered on a large scale, and integrated industry solutions will be exported to the outside world; the third step is to open the large-scale model capabilities to the outside world for serious business scenarios. Currently, JD Cloud has built a general large model based on internal practices. By the end of this year, JD.com will iteratively produce solid industrial services through large-scale tempering of highly complex scenarios. It is expected that in early 2024, it will open its large-scale model capabilities to external serious business scenarios. At present, Jingdong has reached the second step and has achieved rich practical results internally.

Taking "industrial value" as the "first principle" of large-scale models has driven JD.com to a differentiated development model in the field of large-scale models, taking the lead in completing the creation and accumulation of industrial value, and entering the application era earlier.

The "value snowball" of Jingdong's large model is getting bigger and bigger with the "Matthew Effect", accelerating towards thousands of industries and fireworks in the world. The industrial value of large-scale models will eventually be condensed in the smiles of thousands of sentient beings.

552ceba559f56f0a96f5113fc0c3e23b.gif

Guess you like

Origin blog.csdn.net/R5A81qHe857X8/article/details/131714203