NLP first lesson (I also began to learn)

  Idle boredom, I would ask myself, also have five years experience in programming, in addition to CRUD, what I will, one day I quit, go to the interview, I can expect salary and those younger than low of young graduates, what are the advantages I have, and I'm just a college student in the Department of Electrical, actually do the software programming, all good drama, and gradually give yourself brainwashed themselves forget that they are out of the training institutions, he said so many complain about, training institutions did not say bad, we did not say that the difference between college educated people born with a certain ratio, in the final analysis still need to learn it, taught himself more than six months python, reported a vacation now training to learn NLP English is flawed, himself said he was helpless.

  Closer to home, I come to share with you, that this time I learned what (2019-06-15 to 2019-07-01).

  Since we have chosen artificial intelligence, we should know what is artificial intelligence, what is NLP, we first define a few terms that right.

  AI (Artificial Intelligence), the English abbreviation for AI. It is a research and development for simulation, extension and expansion of a new human science and technology intelligence theories, methods, techniques and applications. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence, and produce a new intelligent machines can respond in a similar way to human intelligence, research in this area include robotics, speech recognition, image recognition, natural language processing and expert systems. Since the birth of artificial intelligence, theory and technology has become more sophisticated, application areas continue to expand, it is contemplated that the future brings artificial intelligence technology products, it will be human wisdom "container." Artificial intelligence, consciousness can simulate human thinking process information. AI is not human intelligence, but able to think like people, may exceed human intelligence. Artificial intelligence is a challenging science, engaged in this work must understand computer knowledge, psychology and philosophy. Artificial intelligence is a very wide range of science, it is composed of different areas, such as machine learning, computer vision, and so, in general, a major goal of artificial intelligence research is to make a machine capable of some usually it requires human intelligence to complete complex work. But different times, different people's understanding of this "complex work" is different.  [1] December 2017, artificial intelligence appeared in the "2017 Annual Top Ten Chinese media buzzwords." (Baidu removal)

  For my part, I think artificial intelligence is to let our existing equipment (such as computers, robots) to help people do more things, and can be given to the human mind, the machine acts human reason and humanity. For example, we are now smart parking, voice call, automatic robot factory and so on.

  NLP is the abbreviation NLP (Neuro-Linguistic Programming) is. Generally known as natural language processing, NLP I think this is mainly is to let the machine can give normal human language giving the exchange and function of the human mind. Such as chat robot, such as news classify spam.

  We know that Chinese grammar is more, let's take a simple, generate word. A conventional text, generally

= Host "" " 
Host = Time noun verb adjective noun subject matters nouns 
time = noun morning, afternoon, yesterday noon, midnight, last year, tomorrow 
the subject noun = students, the people, old, women, gay, uncle 
adjective quickly = quickly, quietly, quietly 
verb = to fight, chase, pounding, shouting, staring 
affairs noun = snails, cheetahs, Otto, baseball, fighter, Pluto 
. "" "

We can see that the first line of the text of an element of the sentence has what time noun + subject + (adjective) + verb + (noun affairs) may constitute a word, for example, tonight we have to work overtime fiercely. May also, tomorrow leaders invited us to eat a big meal.

All statements in accordance with our time noun + subject + (adjective) + verb + (affairs noun) format can be composed of any of the above text (do not consider some of the issues through not fluent), let's look at the implementation of the code.

Ideas, to give 1- text, according to the line to cut 2-, 3- format sentence elements to give 4-random value from the following set of elements, 5- to splice into sentences according to the format.

Code:

# ! / Usr / bin / env Python 
# - * - Coding: UTF-8 - * - 
Import Random 

Host = "" " 
Host = time NP subjects noun adjective verb Affairs nouns 
time noun = morning, afternoon, yesterday noon, midnight last year, tomorrow 
the subject noun = students, the people, old, women, gay, uncle 
adjective = quickly, quickly, quietly, quietly 
verb = to fight, chase, pounding, shouting, staring 
affairs noun = snails, cheetahs, Otto, baseball, fighter, Pluto 
"" " 


# which mainly converts text data into a dictionary 
DEF create_grammar (grammar_str, Split = ' = ' , line_split = ' \ n- ' , code_split = ' , ' ): 
    Grammar = {}
     for Line in grammar_str.split(line_split):
        if line is '':
            continue
        k, v = line.split(split)
        grammar[k.strip()] = [s.split() for s in v.split(code_split)]
    return grammar


# 随机选择
choice = random.choice


# 得到句子
def generate(gram, target):
    if target not in gram:
        return target
    else:
        li = choice(gram[target])
        sentence = ''  # 最终的句子
        aa = [generate(gram, t) for t in li]
        for s in aa:
            if s is not '/n':
                sentence += s
        return sentence


if __name__ == '__main__':
    for i in range(10):
        adj = generate(create_grammar(host, '='), target='host')
        print(adj)

After the operation, we will find a lot sentence is wrong, totally inconsistent with our speech habits, so we are such a random choice is not a wise choice, we are going to deal with this unwise choice, try to do so that he can follow human speech output mode "fixed format" words.

  Now I do a little game, also common in television programs, give tips, let you guess words, such as a pot of gold, lit; most of us will first think of JDB, for example, we say classrooms, black we think blackboard; another example, in the afternoon, 6:00, we think overtime. Etc. There are also human beings according to the words given earlier, we will give a response immediately. This is what I recently learned N-gram

  N-gram language model is a model (Language Model, LM), the language model is a probabilistic model determination, which is the input (the word order of the sequence) word, the output probability of the sentence, i.e., those words the joint probability (joint probability).

  N-Gram is based on an assumption: n-th word appears related to the previous n-1 words, while not associated with any other term (which is also among the hidden Markov assumptions). The probability of appearance of the entire sentence is equivalent to the product of the probability of each word appears. The probability of each word can be calculated by statistical corpus. Typically N-Gram or from text corpus. When referred unigram N = 1, N = 2 is referred bigram, N = 3 is called trigram's, the next word are assumed to appear in front of it a word dependent, i.e. of bigrams in, the next word are assumed to appear in front of it relies two word, that trigram, and so on. In theory, n the bigger the better, the experience, the most trigram used, however, in principle, can be used to solve bigram, never use trigram.

  Let's look at a few formulas:

  1-gram: P(w1, w2, w3, … , wn)=P(w1)P(w2|w1)P(w3|w1w2)P(w4|w1w2w3)…P(wn|w1w2…wn-1)≈P(w1)P(w2|w1)P(w3|w2)P(w4|w3)…P(wn|wn-1)

  2-gram:P(w1, w2, w3, … , wn)=P(w1)P(w2|w1)P(w3|w1w2)P(w4|w1w2w3)…P(wn|w1w2…wn-1)≈P(w1)P(w2|w1)P(w3|w1w2)P(w4|w2w3)…P(wn|wn-2wn-1)

  What probably explain that, (w1) is the probability w1 appear in the corpus P, that is, count (w1) / total; P (w2 | w1) is the probability w1 w2 appear in the back, that count (w1w2 ) / count (w2); 2-gram we then look at a real example.

  We like to read, after cutting into a word we like to see the book, then becomes P (we like to see, books) = p (we) p (like | we) p (see | we like) p (book | like to see)

  p (we) express the probability of "we" word appears in the corpus inside;

   p (like | us) to "Like" this word appears in the "we" behind the probability; (like the number of times the words appear as the denominator, the number of occurrences of words as you like molecules)

 

   p (see | we like) represents "see" the word appears in the "we like" behind the probability; (see number appears as the denominator, the number of occurrences we like to see as molecules)

 

   p (Book | like to see) means "books" the word appears in the "we like to see" the back of probability. (Number of occurrences of the book as a denominator, appear like reading as a molecule)

  

  And so, when we use 2-gram, only consider the first two words on it.

 

  Then we will get their numbers multiplied a 0-1, unlimited close to 0 indicates that the statement is likely to be wrong, infinitely close to the time of 1 indicates that the statement is likely to be right

 

  Code implementation is not yet complete. More recent work dilemma, write code part, has not fully get that done in. A few days will be filled.

Guess you like

Origin www.cnblogs.com/cxiaocai/p/11121574.html