正则表达式处理字符（Python）

.	匹配任意一个字符，除了换行符
*	匹配0个或多个表达式
+	匹配1个或多个表达式
.*	匹配任意长度字符串
\s	匹配任意空白字符，等价于[\t\n\r\f]
\S	匹配任意非空字符
\w	匹配字母数字和下划线
\d	匹配任意数字
{n}	精确匹配n个前面表达式；如：\d{4}
^	匹配字符串开头
$	匹配字符串结尾

内置库re

re.match()

re.match(pattern，str，flags=0)返回一个Match对象，判断正则表达式和字符串是否匹配
特别注意：pattern和str的第一个字符必须匹配！


import re
content = 'Hello 1234567 World_This is mine'
result = re.match('^Hello\s\d{7}\s.*$',content)
print(len(content))
print(result.group())	#返回匹配结果
print(result.span())	#返回匹配范围
result = re.match('^Hello\s(\d+).*\s(\w+)$',content)
print(result.group(1),result.group(2))  #使用括号得到匹配目标

output:
32
Hello 1234567 World_This is mine
(0, 32)
1234567 mine

贪婪匹配

尽可能多的匹配字符
例子：.*

import re
content = 'Hello 1234567 World_This is mine'
result = re.match('^H.*(\d+).*$',content)
print(result.group(1))

output:
7

非贪婪匹配（优先使用）

尽可能少的匹配字符
例子：.*？

import re
content = 'Hello 1234567 World_This is mine'
result = re.match('^H.*?(\d+).*$',content)
print(result.group(1))

output:
1234567

匹配模式

有换行符须flags=re.S

import re
content = 'Hello 1234567\nWorld_This is mine'
result = re.match('^.*?(\w+)$',content,re.S) #让 . 能够匹配换行符
print(result.group(1))

output:
mine

re.search()

re.search(pattern, str, flags) 扫描整个字符串并返回第一个成功的匹配
re.search()与re.match()的区别仅在于：search无需从字符串头部开始匹配
为了方便，尽可能用search代替match

re.findall()

搜索字符串，以列表形式返回全部能匹配的子串

import re
content = 'Noting 576 hello 1234567 World_This 321 is mine'
result = re.findall('\s(\d+)\s',content)
print(result)

output:
['576', '1234567', '321']

re.sub()

re.sub(pattern, sub_str,str) 用sub_str替换掉能和pattern匹配的子串；返回替换后的字符串

import re
content = 'Noting hello 1234567 World_This is mine'
result = re.sub('\d+','sub',content)
print(result)

output:
Noting hello sub World_This is mine

re.compile()

re.compile(pattern, flags)将一个正则表达式编译成正则对象，以便重复使用

import re
content = 'Noting hello 123\n4567 World_This is mine'
pattern = re.compile('\d.*\d',re.S)
result = re.search(pattern,content)
print(result.group())

正则表达式处理字符（Python）

正则表达式处理字符（Python）

内置库re

贪婪匹配

非贪婪匹配（优先使用）

匹配模式

猜你喜欢