python正则表达式的常用用法

先来一个正则表达式的表格，内容很多，但是用熟了就好了，下面是python3有关正则表达式的一些常用函数的使用方法

符号表示	表达意义
\w	匹配字母数字以及下划线 [A-Za-z0-9_]
\W	匹配除了字母数字和下划线之外的符号 [^A-Za-z0-9_]
\d	匹配任意数字，就是[0-9]
\D	匹配任意非数字的字符 ,就是[^0-9]
\s	匹配任意空白字符，如空格换行 tab等 [\t\n\r\f]
\S	匹配任意非空白字符，就是除了空格换行 tab等 [^\t\n\r\f]
\A	从字符串开始位置匹配，该字符串即使包含换行也算一个字符串，与下面的 ^ 有区别
\Z	从字符串末尾开始匹配，该字符串即使包含换行也算一个字符串，与下面的 $ 有区别，但是如果字符串结尾有回车，这不会匹配回车
\z	从字符串末尾开始匹配，和 \Z 效果一样，只是如果末尾有换行，也会匹配到换行
^	从字符串开头部分匹配，可以匹配多行字符串，与/A 有区别
$	从字符串的末尾开始匹配，但可以匹配多行的结束，与 /z 有区别
.	一个点匹配除了换行符之外的任意字符，当指定re.DOTALL标记的时候，可以匹配包括换行符的任意字符
[xxx]	表示一组字符，比如[A-Z]表示从A到Z
[^xxx]	表示除了括号里的东西，其他都匹配，比如[^0-9]表示除了数字其他都匹配
*	匹配0或多个的表达式
+	匹配一个或者多个的表达式
?	匹配0或者1个表达式，非贪婪模式
{ n }	精确匹配n个指定的的表达式
{ m , n }	匹配m到n次指定的表达式，贪婪模式
a \| b	匹配a或者b
()	匹配括号内的表达式，也表示一个组

1. `re.match()`

该函数从字符串的开头部分开始匹配，如下

import re
text = "hello 123 world, hello new world"
result = re.match("hello", text)
print(result)
(结果:)<re.Match object; span=(0, 5), match='hello'>
'''
span表示跨越的范围，表示在(0, 5)这个范围内匹配到了， match表示匹配结果 'hello'
'''
result = re.match("world", text)
print(result)
(结果)None  # 不在开头的部分是匹配不到的

###################################################################
res = re.match(r"hello(\s\d\d\d)", text)
print(res)
(结果:) <re.Match object; span=(0, 9), match='hello 123'>
'''
()括号表示匹配的表达式
\s 表示空白字符就是匹配了空格
\d\d\d 则是匹配了三个数字
其中\d\d\d也可写作\d{3}表示三个数字
'''

res2 = re.match(r"(\w{5}\s)123(\s\w{5})", text)
print(res2)
(结果:)<re.Match object; span=(0, 15), match='hello 123 world'>
print(res2.group(0))  # 这里group(0)表示匹配到的一整句话，这里指的是'hello 123 world'
print(res2.group(1))  # group(1)则表示匹配到的第一个结果 'hello '
print(res2.group(2))  # group(2)表示匹配到的第二个结果 ' world'

#####################################################################
res = re.match(r"hello(.*)world", text)
print(res)
(结果)<re.Match object; span=(0, 32), match='hello 123 world, hello new world'>
'''
可以看到.*是匹配了中间的所有字符，把整个text都匹配了下来，与下边的做对比
'''

res = re.match(r"hello(.*?)world", text)
print(res)
(结果)<re.Match object; span=(0, 15), match='hello 123 world'>
'''
可以看到这里的(.*?)只匹配了0-15范围的字符串，这是非贪婪匹配，就是说尽可能少的匹配，上边的那个匹配了整个字符串，是贪婪匹配，尽可能匹配多的字符串
下面我们再来举一个例子
'''

res = re.match(r"hello (.*)(\d+) world", text)
print(res.group(1))  # 打印第一个括号匹配的结果
(结果)12
print(res.group(2))  # 打印第二个括号匹配的结果
(结果)3
'''
(\d+)这里的+号是匹配一个或者多个字符，而前边的(.*)是贪婪匹配，会匹配尽可能多的，所以留给(\d+)一个字符，他自己匹配了12个字符
'''
res = re.match(r"hello (.*?)(/d+) world", text)
print(res.group(1))
(结果) (空，什么结果也没有)
print(res.group(2))
(结果) 123
'''
(.*?)是非贪婪匹配，能少匹配就少匹配，
所以后边的(\d+)可以匹配3个字符，那么(.*?)就偷个懒，不匹配了，
这就是?的非贪婪匹配
'''

##################################################################
text = '''hello 123 world
hello new world'''  #  我们来一个带回车的字符串来测试
res = re.match(r"hello(.*)new world", text)
print(res)
(结果)None 
'''震惊，不是说(.*)可以匹配所有字符?，原来他是不可以匹配换行符的，
我们加上一个匹配模式，就可以匹配了，就是加上re.S，看如下例子
'''

res = re.match(r"hello(.*)new world", text, re.S)
print(res)
(结果)<re.Match object; span=(0, 30), match='hello 123 world\nhello new world'>
'''这就可以匹配到整个句子了
re.S 可以让(.*)匹配换行符
类似的还有re.I可以不区分大小写来匹配
'''
##################################################################

2.`re.search()`

如果说re.match()只能从开头匹配很鸡肋，那么re.search()就可以解决该问题，他可以从任何地方开始匹配，并返回第一个成功的匹配

import re
text = "hello 123 world, hello new world"
res = re.search("world", text)
print(res)
(结果)<re.Match object; span=(10, 15), match='world'>
'''可以看到他可以从任何位置开始匹配,并返回第一个world的位置'''

###################################################################
res = re.search(r"[a-z] world", text)
print(res)
(结果)<re.Match object; span=(24, 31), match='w world'>
'''可以看到，[a-z]匹配到了一个字母w,如果我们想匹配多个字母，就这样[a-z]{3} 匹配3个字符'''
res = re.search(r"[a-z]{3} world", text)
print(res)
(结果)<re.Match object; span=(22, 31), match='new world'>
'''结果就是new world'''

res = re.search(r"[0-9]{3} world", text)
print(res)
(结果)<re.Match object; span=(6, 15), match='123 world'>
'''匹配到了123 world'''

'''re.search()方法和re.match()区别就是match只能从开头匹配，search可以任意位置，其他用法都一样'''

3.`re.findall()`

如果你说，虽然re.search()解决了re.match()只能从开头匹配的问题，但是他只能返回一个结果，也很鸡肋，那么re.findall()则是re.search()的加强版，听他的名字就知道他可以找到所有的符合的表达式并返回

import re
text = "hello 123 world, hello new world"
res = re.findall("world", text)
print(res)
(结果)['world', 'world']
'''可以看到他返回了所有找到的结果，并以列表的形式返回，实为强大'''

res = re.findall(r"[a-z0-9]{3}[\s]world",text)
print(res)
(结果)['123 world', 'new world']
'''用法和上边两个没啥区别，就是可以返回所有的匹配结果'''

4.`re.sub()`

这个函数呢，就是一个用来做替换的函数，就是把 用正则表达式匹配到的结果 替换成别的数据

import re
text = "hello 123 world, hello new world"
res = re.sub(r"[\d+]", "x", text)
print(res)
(结果)hello xxx world, hello new world
'''可以看到，text里的所有数字都被换成了x'''

res = re.sub(r"[\d]{4}", "x", text)
print(res)
(结果)hello x world, hello new world
'''注意这两种写法的区别，[\d+]是将每一个数字作为一个个体，而[\d]{4}则是一个大整体来替代'''

5.`re.compile()`

最后再来一个比较鸡肋的方法，他是将正则表达式编译成正则表达式对象的方法，(啥玩意？),如下例子：

import re
text = "hello 123 world, hello new world"
pattern = re.compile(r"[\d]{3}")
res = re.search(pattern, text)
print(res)
(结果)<re.Match object; span=(6, 9), match='123'>
'''怎么说， 就是把正则表达式那部分，抽取出来用的时候不用写那么一长串，好吧，我们提取出一个字符串也可以实现的好吗'''

string = r"[\d]{3}"
res = re.search(string, text)
print(res)
(结果)<re.Match object; span=(6, 9), match='123'>
'''一样的，是不是很鸡肋，不过compile()也可以写作下边的形式'''

res = mattern.search(text)
print(res)
(结果)<re.Match object; span=(6, 9), match='123'>
'''这种形式的确会简单那么一丢丢......'''

以上就是python3正则表达式的一些常用用法，以上实例均经过测试，测试的python版本为3.7.4 windows10平台

锋霜利雪

发布了62 篇原创文章 · 获赞 20 · 访问量 5808

私信关注

python正则表达式的常用用法

1. re.match()

2.re.search()

3.re.findall()

4.re.sub()

5.re.compile()

猜你喜欢

1. `re.match()`

2.`re.search()`

3.`re.findall()`

4.`re.sub()`

5.`re.compile()`