Regular expression and re module in python

What is a regular expression:

A rule to match strings

What can regular expressions do:

  • A rule can be formulated:

  • To confirm whether a certain string meets the rules

  • Find content that meets the rules from a large section of string

  • Program area:

  • Login form registration form verification

  • reptile

  • Automated development: log analysis


Regular expression syntax:

  • Metacharacters

  • Character group [], what can appear at the position of a character

  • quantifier

  • Special usages and phenomena

  • The use of?: Followed by a? In the quantifier to cancel greedy matching, that is, the lazy mechanism


character:

Metacharacters Matched content
. Match any character except newline
\w Match letters or numbers or underscores
\s Match any whitespace
\d Match number
\n Matches a newline
\t Match a tab
\b Match the end of a word
^ Match the beginning of the string
$ Match the end of the string
\W Match non-letters or numbers or underscores
\D Match non-numeric
\S Match non-whitespace characters
a | b Match character a or character b
() Matches the expression in parentheses, also means a group
[…] Match the characters in the character group
[^…] Match all characters except the characters in the character group

quantifier:

quantifier Instructions
* Repeat zero or more times
+ Repeat one or more times
? Repeat zero or one time
{n} Repeat n times
{n,} Repeat n or more times
{n,m} Repeat n to m times

Graphic memory? * + Meaning:
Insert picture description here

. ^ $

Regular Characters to be matched Match result Explanation
sea. Haiyan Haijiao Haidong Haiyan Haijiao Haidong Match all "sea." Characters
^ Sea. Haiyan Haijiao Haidong Haiyan Match "海." Only from the beginning
Sea. $ Haiyan Haijiao Haidong Haidong Only match the ending "海. $"

Several commonly used non-greedy matching patterns

  • *? Repeat any number of times, but repeat as little as possible
  • +? Repeat 1 or more times, but repeat as little as possible
  • ?? Repeat 0 or 1 times, but repeat as little as possible
  • {n, m}? Repeat n to m times, but repeat as little as possible
  • {n,}? Repeat n times or more, but repeat as little as possible

Common methods under the re module:

  • Matching method

  • findall matches all items and returns a list

  • search If there is a return value, return an object, no return value, return none, the returned object through the group to get the first match

  • match

  • replace

  • sub returns the replaced result

  • subn returns a tuple (replace the result, how many times it has been replaced)

  • Cut

  • split

  • Advanced method:

  • compile compiles a regular expression into a regular expression object,

  • finditer returns an iterator to store matching results

Examples:

#findall方法
import re
ret = re.findall('\d','hello123python456')
print(ret)

Insert picture description here


#search
ret = re.search('\d','hello123python456')
print(ret) #返回一个对象
print(ret.group()) 返回第一个匹配的结果

Insert picture description here


#split
ret = re.split('\d+','alex40taibai35codegod21')
print(ret) #返回以数字切割后的列表

Insert picture description here


#sub,subn 替换的方法区别
ret = re.sub('\d','H','codegod1ello')
print(ret)
ret2 = re.subn('\d+','Joke','123')
print(ret2) #返回一个元组,(替换后的值,替换几次)

Insert picture description here


#compile
obj = re.compile('\d{3}')
ret = obj.search('abc123eeee')
print(ret.group())

Insert picture description here


ret = re.finditer('\d','sf230f3f9r39')
print(ret) #返回<callable_iterator object at 0x00000206653E6130>
print(next(ret).group()) #查看第一个结果
print(next(ret).group()) #查看第二个结果
print([i.group() for i in ret]) #查看剩余的左右结果

Insert picture description here

note:

  1. findall priority query
import re
ret = re.findall('www.(baidu|oldboy).com','www.oldboy.com')
print(tet) #返回['oldboy'] 这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可
ret = re.findall('www.(?:baidu|oldboy).com','www.oldboy.com')
print(ret

Insert picture description here 2. Split priority query

ret = re.split("\d+","eva3egon4yuan")
print(ret) 
ret2 = re.split("(\d+)",'eva3egon4yuan')
print(ret2)

Insert picture description here

The result after adding () to the regular matching part is different. Those without () do not retain the matched items, but with () they can retain the matched items.

Published 26 original articles · praised 5 · visits 777

Guess you like

Origin blog.csdn.net/weixin_44730235/article/details/105350806