On a reptile of self-cultivation 6: Regular Expressions 2

## metacharacters

所有的元字符包括:.   ^    $     *     +     ?     { }     [ ]     \     |     ( )  

     (Another part is the backslash with a special symbol of common symbols, it has a special meaning.)

       ※ (.): Match any character except newline.

       ※ |: the equivalent of logic or,
Here Insert Picture Description

       ※ torr character (^): positioning, the matching start position of the string, is used to determine a location).
Here Insert Picture Description
     (First sentence of time can not match because the character he was asked to determine positioning in front of him that it is the beginning of the string, such as the F is going to start)

       ※ dollar sign ($), $ matches the input end of the string is often used in conjunction with the prop character (^).
Here Insert Picture Description
       ※ parentheses (...): parenthesis with mathematics is the same, the thing as a whole, then put it quotes.

       ※ backslash (\), the backslash in a regular expression is the most widely used application, It can be an ordinary character into a special character, but also can relieve metacharacters special functions
     (eg \. Matches (.) is not any character except a newline, he is a match point)
           ※ If the backslash plus is digital, so there are two representation schemes:
            1, if the number of words it is 1 to 99, it is a reference to the matched string value corresponding to the group number
Here Insert Picture Description
     (Analysis: FishC.com first matching is not matched, since a backslash number, then the number is 1-99 behind he is a character string matching the corresponding group, this group is the first FishC, because he expanded in braces, r "(FishC) \ 1 " is equivalent to r "FishCFishC", calculated from the start number is 1 since the beginning of 0 means that a number of octal)

            2, if the number is followed by 0 or 3-digit number, then it is an octal number. It indicates the character corresponding to the octal ASCII code
Here Insert Picture Description

     (Analysis: characters 0 to 48 corresponding decimal, octal number corresponding to 60, so write \ 60, matching characters will match no less than 0, or you can enter a three-digit direct said three digits, 141 is the ASCII97 octal representation of a character can be matched to a)

       ※ brackets ([]): can be used to generate a character class
     (Resolution: character class is meant a set of characters, surrounded by his meta-characters in it, will lose special features, like a backslash yuan the characters are the same)
Here Insert Picture Description
     (Analysis: character class, which means that the contents inside it are viewed as a normal character, except for a few special characters, explain the following findall method, as follows :)

       ※ findall: find string regular expression matched all sub-strings, and return them as a list.
            1 small dash (-), we use it to express range
Here Insert Picture Description

            2, the backslash (\), the backslash character class on [], which is not expressed itself, otherwise it will error, the backslash character class, represents the Python string escape character. For example \ n represents the transport means
Here Insert Picture Description
            3, prop ^ character, a character class [], the means 'addition' is negated meaning, it is to be noted that the holder must be placed on top character ^:
Here Insert Picture Description
     (Analysis : If on the back, is the character which matches asked itself: as follows :)
Here Insert Picture Description

       ※ braces {}: is repeated for doing things such as {M, N} (requires M, N are non-negative integers and M <= N) indicates the matching between the preceding RE M ~ N times.
       {M,} represents at least M times match,
       {, N 0} is equivalent to {, N},
       {N} N times said matching

Here Insert Picture Description
     (resolved: the regular expression, it is noted that all write programming time, we may note that in appearance, could add some space, but if the regular expression, you do not add a space, because the space will be parsed as a regular expression)

## special characters

       ※ asterisk (*): Match the preceding subexpression zero or more times, equivalent to {0}

       ※ plus sign (+): Match the preceding subexpression one or more times, equivalent to {1}

       ※ question mark (?): Matches the preceding subexpression zero or one, equivalent to {0,1}
     (in regular expressions, if the conditions to achieve the same, it is recommended that you use the left-* + these three, do not? use braces {} represented in the form of, because: first, asterisk, plus sign, a question mark more concise; secondly, the regular expression inside the three symbols will optimize the efficiency will be higher than several braces).

## greed and non-greed.
     (Repeat this operation on, one thing to note is that the regular expression is enabled by default greedy pattern to match)
       ※ greed is greed, that is, as long as the conditions are met under, it will go as much as possible match
Here Insert Picture Description
     (Analysis: for example, just to match the <html> tag, but he came out of the match, because the + sign in front of repeating anything, he met the right angle bracket, then he would stop, and that as much as possible, he will look to find the last, then not match the last, and then back backwards, to find a match for the first time, he stopped, just that last is a string of angle brackets, so that it matches the entire string, which greed.)
     (in this case, we must enable non-greedy mode)
       ※ indicates that the following non-greed in duplicate meta characters plus a question mark, this time, it does not mean that the question mark 0 or 1, but rather represent enabled non-greedy mode:
Here Insert Picture Description

Published 247 original articles · won praise 116 · views 280 000 +

Guess you like

Origin blog.csdn.net/w15977858408/article/details/104123179