What is a regular expression:
A rule to match strings
What can regular expressions do:
-
A rule can be formulated:
-
To confirm whether a certain string meets the rules
-
Find content that meets the rules from a large section of string
-
Program area:
-
Login form registration form verification
-
reptile
-
Automated development: log analysis
Regular expression syntax:
-
Metacharacters
-
Character group [], what can appear at the position of a character
-
quantifier
-
Special usages and phenomena
-
The use of?: Followed by a? In the quantifier to cancel greedy matching, that is, the lazy mechanism
character:
Metacharacters | Matched content |
---|---|
. | Match any character except newline |
\w | Match letters or numbers or underscores |
\s | Match any whitespace |
\d | Match number |
\n | Matches a newline |
\t | Match a tab |
\b | Match the end of a word |
^ | Match the beginning of the string |
$ | Match the end of the string |
\W | Match non-letters or numbers or underscores |
\D | Match non-numeric |
\S | Match non-whitespace characters |
a | b | Match character a or character b |
() | Matches the expression in parentheses, also means a group |
[…] | Match the characters in the character group |
[^…] | Match all characters except the characters in the character group |
quantifier:
quantifier | Instructions |
---|---|
* | Repeat zero or more times |
+ | Repeat one or more times |
? | Repeat zero or one time |
{n} | Repeat n times |
{n,} | Repeat n or more times |
{n,m} | Repeat n to m times |
Graphic memory? * + Meaning:
. ^ $
Regular | Characters to be matched | Match result | Explanation |
sea. | Haiyan Haijiao Haidong | Haiyan Haijiao Haidong | Match all "sea." Characters |
^ Sea. | Haiyan Haijiao Haidong | Haiyan | Match "海." Only from the beginning |
Sea. $ | Haiyan Haijiao Haidong | Haidong | Only match the ending "海. $" |
Several commonly used non-greedy matching patterns
- *? Repeat any number of times, but repeat as little as possible
- +? Repeat 1 or more times, but repeat as little as possible
- ?? Repeat 0 or 1 times, but repeat as little as possible
- {n, m}? Repeat n to m times, but repeat as little as possible
- {n,}? Repeat n times or more, but repeat as little as possible
Common methods under the re module:
-
Matching method
-
findall matches all items and returns a list
-
search If there is a return value, return an object, no return value, return none, the returned object through the group to get the first match
-
match
-
replace
-
sub returns the replaced result
-
subn returns a tuple (replace the result, how many times it has been replaced)
-
Cut
-
split
-
Advanced method:
-
compile compiles a regular expression into a regular expression object,
-
finditer returns an iterator to store matching results
Examples:
#findall方法
import re
ret = re.findall('\d','hello123python456')
print(ret)
#search
ret = re.search('\d','hello123python456')
print(ret) #返回一个对象
print(ret.group()) 返回第一个匹配的结果
#split
ret = re.split('\d+','alex40taibai35codegod21')
print(ret) #返回以数字切割后的列表
#sub,subn 替换的方法区别
ret = re.sub('\d','H','codegod1ello')
print(ret)
ret2 = re.subn('\d+','Joke','123')
print(ret2) #返回一个元组,(替换后的值,替换几次)
#compile
obj = re.compile('\d{3}')
ret = obj.search('abc123eeee')
print(ret.group())
ret = re.finditer('\d','sf230f3f9r39')
print(ret) #返回<callable_iterator object at 0x00000206653E6130>
print(next(ret).group()) #查看第一个结果
print(next(ret).group()) #查看第二个结果
print([i.group() for i in ret]) #查看剩余的左右结果
note:
- findall priority query
import re
ret = re.findall('www.(baidu|oldboy).com','www.oldboy.com')
print(tet) #返回['oldboy'] 这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可
ret = re.findall('www.(?:baidu|oldboy).com','www.oldboy.com')
print(ret
2. Split priority query
ret = re.split("\d+","eva3egon4yuan")
print(ret)
ret2 = re.split("(\d+)",'eva3egon4yuan')
print(ret2)
The result after adding () to the regular matching part is different. Those without () do not retain the matched items, but with () they can retain the matched items.