Complete a complete ordering robot from theory to practice

97b185171575f70b89316e3b5edf5c99.jpeg

The dialogue system is a classic NLP application scenario, and it has become very popular in recent years! Classic NLP applications like ChatGPT are also popular around the world, and everyone really feels the ability and charm of AI. Then followed by the rapid development of NLP, and the demand for related jobs has also increased greatly. In any case, every algorithm engineer, data engineer, and programmer who is engaged in NLP-related work can understand the typical application of the dialogue system. This will help your career development and technical strength. The author hopes to help everyone better understand and master the core theory and practical application of NLP technology by explaining the core theory of the dialogue system and a complete practical project (code implementation) of the ordering robot. The improvement is helpful.

1. Scene introduction

Let's first introduce the main application scenarios of the ordering robot project, the dialogue process tree and the specific application interface are as follows:

214e3bf83c0c27c31ea1fff9327fc84d.png

The scene described in the above figure is mainly: the user can complete the ordering operation by talking with the ordering robot. For example: purchase dishes, check the dishes that have been ordered, delete dishes, confirm the order, and finally enter the mobile phone number to confirm the result of the order, and finally complete the order.

This scenario basically covers several core parts of the dialogue system, including intent recognition, slot extraction and filling, state tracking, dialogue strategies, database operations, and text generation. Of course, this dialogue system is only used for minimal display, to let everyone understand the core framework implementation, and then you can carry out interactive and product redesign and reuse according to your own actual scenarios.

2. Intent recognition

After having the concept of the scene, let's explain to you the core theory of each part in turn. The first thing to talk about is the intention recognition part of natural language understanding (NLU), which means that we must first know what the other party said before we can have follow-up operations. Please read the following two PPTs first.

a45644fc96cbbac0f9e1d4a3396ba3e3.png de0149123b065449b4101460e121c5c7.png

In general, there are two schemes for intent recognition: one is to use text classification, and the other is to use text matching to process. Text classification is mainly implemented by means of classification models, which require pre-specified categories and corresponding training data for each category. The text matching mainly uses a series of candidate questions in advance, and then we use the questions to match the candidate questions through the text matching algorithm, and then score, and the one with the highest score is hit.

The advantage of the text classification method is that the efficiency is relatively stable and the speed is fast. Text matching has certain advantages in adding new matching questions and correct answers. A drawback of text matching is that the efficiency is affected by the number of matching problems. In general, we tend to adopt the text matching scheme from the perspective of cost, flexibility and correctness. Of course, when the intention is relatively fixed, it is also possible to consider the text classification scheme.

The operating logic of text matching is mainly as follows: firstly, preprocess user questions, such as word segmentation, stop word removal, punctuation removal, case conversion, part-of-speech tagging, etc. After preprocessing, it is necessary to calculate the similarity with the questions in the FAQ library, and then get the answer.

7e22db955c6008a0b967b23cf606272f.png

So what is the standard FAQ library? It mainly has a series of record collections consisting of serial numbers, questions, answers, and similar questions.

ae10494fbdae1d042a59b9c04c64303a.png

Several common text matching algorithms: edit distance, Jaccard, BM25, word2vec, deep learning, etc.

edit distance

a22bad8a8df2738067da81543f60eac6.png

Advantages : strong interpretability, cross-lingual effectiveness, no need to train models.

Disadvantages : There is no semantic similarity between characters, it is greatly affected by irrelevant words/stop words, it is greatly affected by word order, and the length of text has a great influence on speed.

Jaccard distance

d2d1eaccadbf5524fb71c89b408c67d1.png

Advantages : Word order does not affect scores (word bag model), simple implementation, and fast speed. Cross-lingual, no training required.

Disadvantages : Word order does not affect the score (double-edged sword), there is no similarity measurement between words, it is affected by irrelevant words, and non-consistent text may have full marks.

word2vec

6b4792f0a16a352489d02f3d2e64e79c.png

Advantages :
Two texts contain words with similar semantics, which will increase the similarity;
the data required for training is simple (pure text is expected);
the calculation speed is fast, and the vector can be pre-calculated for the knowledge base;
the text is converted into a number, so that subsequent Complex models are possible.

Disadvantages :
The effect of word vector determines the effect of sentence vector;
it is difficult to deal with polysemy;
it is greatly affected by stop words and text length (also a bag of words model);
changing languages ​​or even changing fields requires retraining.

The in-depth analysis and discussion of the above algorithms as well as BM25 and deep learning will be explained in detail in our courses.

3. Information extraction

The first step in implementing a dialogue system is natural language understanding (NLU), which includes two parts: intent recognition and information extraction. We have finished talking about intent recognition above, that is, let the machine understand what the user said and understand the user's intent. At the same time, we introduce several commonly used methods for information extraction. Information extraction is a necessary link in the dialogue system. Let's introduce several information extraction schemes.

rule-based extraction

Usually we use regular expressions to match specific sentence patterns and vocabulary. In principle, if the rules can handle it well, try not to use the model, because it is more controllable and more efficient. Precision and recall can also be calculated using rules. In addition, it should be noted that the order of the rules sometimes affects the results, so you need to pay attention when debugging.

A regular expression describes a string matching pattern (pattern), which can be used to check whether a string contains a certain substring, replace a matched substring, or extract a string that meets a certain condition. substring etc.

  • lre.search(pattern, string)

  • lre.match(pattern, string)

  • lre.findall(pattern, string)

  • lre.sub(pattern, repl, string)

  • lre.split(pattern, string)

Based on deep learning model

7a5e7b3a7fe6bfcf7a176d340d1cf336.png

It is a common practice to predict the classification label for each word based on the neural network and to perform named entity recognition in combination with CRF and other methods to complete the information extraction. Regarding the construction of the sequence labeling model, the principle of CRF and the combination with the neural network, Viterbi decoding and other content will be explained in detail in our courses.

4. Dialogue state control

We have finished talking about how to identify user intent (intent recognition) and how to extract key information provided by the user (information extraction), let's talk about dialogue state control. This piece consists of two parts: one is dialogue strategy and process control and the other is database control.

8c1e9e2fc5579052658c24206b37b360.png

Let's review the conversation process together. First, the user inputs voice or text, for example: the user said to help me book a ticket to Beijing. Then the next step is to do intent recognition and semantic slot filling, for example: the domain is an air ticket, the intent is to book an air ticket, and the destination of the semantic slot is Beijing. After completing these steps, our next step is to conduct dialogue management, further fill the semantic slots, and achieve user goals. Then we will talk about the specific implementation of this section in this section.

51250922e9c85b69d4417ae5e1d69c6c.png

First of all, we want to talk about dialogue state tracking. In fact, we need to complete the user goal of booking an airline ticket. Some necessary information must be provided. These necessary information are slots. These slots are very important to the dialogue system, because the information inside the slots is actually equivalent to the dialogue system’s memory of the current dialogue. In the process of multiple rounds of interaction, these slots are continuously passed on, which is actually equivalent to passing memory. The process of completing the slot filling is the tracking of the dialogue state.

9e975fd632a26171b86e205d0edbdcb7.png

Then we also need to understand the dialogue strategy. What is it? That is, according to the dialogue state and the progress of slot filling, what do we let the robot do next? called dialogue strategy. For example, the system should judge according to the situation whether it should ask the user to collect information, or it can directly answer the user's question, or it should query the database. With the concept of dialogue strategy, let's talk about the dialogue process tree.

ec15066a459b32fe5a3e2aeda59226e1.png

When we implement dialogue strategies in actual work scenarios, we generally build a dialogue process tree. This process tree embodies all our expectations for a robot. To put it simply, according to the intention expressed by the user, how the robot responds is reflected in the process tree, including how to answer, which slots need to be filled, how to feedback and clarify the words, etc. We generally use a custom file (Json, XML, etc.) to record it, so that the program can process various intentions of the user according to this process tree, and jump to each processing node.

Where is database control used? That is, in the process of interacting with humans, robots often need to interact with databases at the same time, because many information required by business scenarios (such as product price information) are stored in databases or memory, such as: Redis, MongoDB, MySQL, etc.

5. Text generation

We have introduced the knowledge of the natural language understanding part (NLU) and the dialog management part (DM), so let's talk about how to generate text to interact with users. That is Natural Language Text Generation (NLG). Text generation generally adopts a method based on template slot filling.

31b5999e6e66b3a8115c439a8a4f459e.png

Template-based implementation is relatively simple, and slot information can also be output as template content, but it also has its disadvantages that it is more uniform, and the advantage is that the output content is controllable and has high accuracy. Another method is based on model generation, but this method is relatively uncontrollable for the output content. If the text is incoherent and the output does not meet expectations, it is difficult to compare the controllability if you want to quickly adjust the model to our expected effect. Difference. There will be detailed explanations and code sharing in the course on generative tasks, language models and training, text generation implementation, Encoder-Decoder structure, and Attention mechanism. The theory and basic implementation used in the basic dialogue system have been explained. Now everyone should have a theoretical framework for how to implement a dialogue robot system, so let's see how to implement it.

6. Complete realization

6ff6828d4dcb184bc9986e2527fe44c9.png 786cc35915ee8938bdd45fd45a67af4c.png 5d73685bb5bc9460558c515d5e3c4d1b.png 45eb61bc01d7a46a9916d8b773190c2f.png

Due to space limitations, it is impossible to fully share the content and detailed explanation of the code. If you are interested or need it, you can ask the teacher for it. The engineering code of the ordering robot can be reused in many scenarios, such as ordering food, booking tickets, and making inquiries. It is suggested that you can run the engineering code by yourself, and run it from theory to practice, which will help you remember and gain greater gains. I hope today's sharing can be helpful to everyone! Finally, I wish you success in your work and learning.

author introduction

Richard , Doctor of Engineering. He has 15 years of experience in communication and Internet software development and management, and more than 10 years of experience in machine learning/algorithm development and management. He has worked as a technical manager in large companies such as Tencent and Huawei. Applied for more than 10 algorithm patent authorizations.

Arthur , Chief Data Scientist. Graduated from the Computer Department of Zhejiang University. 10 years of enterprise-level software service and large-scale telecom value-added business software R&D management experience, 8 years of machine learning/deep learning R&D and management experience. Worked as a technical manager in a well-known large company. Published 2 books on deep learning, and applied for multiple algorithm patent authorizations.

David , Master of Computer Science from Purdue University. 10 years experience in machine learning/algorithm research and development. The main direction of work revolves around natural language understanding and dialogue systems, and there are many types of projects that have been contacted and implemented. Applied for more than 10 algorithm patent authorizations, and won the top three results in multiple algorithm competitions.

eb2003f0dbcacf27d90cacf141780f59.png

Enter the NLP group —> join the NLP exchange group

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/131336032