一. 正则表达式
1. 概述
- 概念
(1)Regular Expression
(2) 一种文本模式,描述在搜索文本时要匹配一个或多个字符串 - 典型场景
(1)数据验证
(2)文本扫描
(3)文本提取
(4)文本替换
(5)文本分割 - 语法
(1)字面值:普通字符,需转义(,^,$…)
(2)元字符 - 匹配
(1)单字,预定义元字符:. 除/n外所有字符,\d数字,\D 所有非数字, \s 空白字符,\S 非空白字符,\w 字母数字字符,\W 非字线数字
(2)批量备选:| yes|no
(3) 量词(字符、元字符,字符集如何重复):?0或1次,* 0或多次,+1或多次,特定{}, {n} n次,{n,}至少n次,{,m}最多m次
(3)贪婪或非贪婪:贪婪(尽量匹配最大的范围),非贪婪(尽量匹配最小范围,*后加?)
(4)边界匹配:^行首,$行尾,\b单词边界
2. Python 正则
1.模块
(1)import re
import re
text = 'Tom is 8 years old. Mike is 25 years old.'
pattern = re.compile('\d+')
pattern.findall(text)
['8', '25']
re.findall('\d',text)
['8', '2', '5']
- RegexDbject
(1) 模式对象,表现编译后的正则表达式
(2)编译: re.compile(r’模式’)
s = '\\author:Tom'
pattern = re.compile('\\author')
pattern.findall(s)
[]
pattern = re.compile('\\\\author')
pattern.findall(s)
['\\author']
pattern = re.compile(r'\\author')
pattern.findall(s)
['\\author']
(3)findall():查找所有非重叠匹配,返回list
text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
pattern = re.compile(r'\d+')
pattern.findall(text)
['8', '23', '87']
p_name = re.compile(r'[A-Z]\w+')
p_name.findall(text)
['Tom', 'Mike', 'Peter']
(4).match(string[,pos[,endpos]]):匹配,仅从起始位置, 返回 HatchObject
text = '<html><head></head><body></body></html>'
pattern.match(text)
<_sre.SRE_Match object; span=(0, 6), match='<html>'>
text2 = ' <html><head></head><body></body></html>'
pattern.match(text2)
pattern.match(text2,1)
<_sre.SRE_Match object; span=(1, 7), match='<html>'>
(5).search(string[,pos[,endpos]]): 任意位置搜索,返回 HatchObject
text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
p1 = re.compile(r'\d')
p2 = re.compile(r'[A-Z]\w+')
p1.match(text)
p2.match(text)
<_sre.SRE_Match object; span=(0, 3), match='Tom'>
p1.search(text)
<_sre.SRE_Match object; span=(7, 8), match='8'>
(6) .finditer(): 返回可迭代对象,返回 HatchObject
text
'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
p1
re.compile('\\d')
p1.findall(text)
['8', '2', '3', '8', '7']
it = p1.finditer(text)
for m in it:
... print(m)
...
<_sre.SRE_Match object; span=(7, 8), match='8'>
<_sre.SRE_Match object; span=(28, 29), match='2'>
<_sre.SRE_Match object; span=(29, 30), match='3'>
<_sre.SRE_Match object; span=(51, 52), match='8'>
<_sre.SRE_Match object; span=(52, 53), match='7'>
- MatchObject 匹配对象
(1) 表现被匹配的模式
(2). group(): 参数为0或空返回整个匹配
import re
text = 'Tom is 8 years old. Jerry is 23 years old.'
pattern = re.compile(r'\d+')
pattern.findall(text)
['8', '23']
pattern = re.compile(r'(\d+).*?(\d+)')
m = pattern.search(text)
m
<_sre.SRE_Match object; span=(7, 31), match='8 years old. Jerry is 23'>
m.group()
'8 years old. Jerry is 23'
m.group(0) #返回整个匹配
'8 years old. Jerry is 23'
m.group(1)
'8'
m.group(2)
'23'
m.start(1) #返回特定分组的起始索引
7
m.end(1)
8
m.start(2)
29
m.end(2)
31
m.groups()
('8', '23')
pattern = re.compile(r'(\w+) (\w+)')
text = "Beautiful is better than ugly"
pattern.findall(text)
[('Beautiful', 'is'), ('better', 'than')]
it = pattern.finditer(text)
for m in it:
... print(m.group())
...
Beautiful is
better than
- Group编组
(1)场景
import re
re.search(r'(ab)+c','ababc') #从匹配模式中提取信息,创建子正则以应用量词
<_sre.SRE_Match object; span=(0, 5), match='ababc'>
re.search(r'Center|re','Center')
<_sre.SRE_Match object; span=(0, 6), match='Center'>
re.search(r'Center|re','Centre')
<_sre.SRE_Match object; span=(4, 6), match='re'>
re.search(r'Cent(er|re)','Centre')#限制备选范围
<_sre.SRE_Match object; span=(0, 6), match='Centre'>
re.search(r'(\w+) \1','hello hello world') #重用正则模式中应用的内容
<_sre.SRE_Match object; span=(0, 11), match='hello hello'>
(2) 声明:(模式),(?P模式)
text = "Tom:98"
pattern = re.compile(r'(?P<name>\w+):(?P<score>\d+)')
m = pattern.search(text)
m.group()
'Tom:98'
m.group(1)
'Tom'
m.group('name')
'Tom'
m.group('score')
'98'
(3)引用: 匹配对象内 m.group(‘name’), 模式内(?P=name),表现内\h
- 应用
(1)字符串操作:
.split() 分割
text = "Beautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex."
p = re.compile(r'\n')
p.split(text)
['Beautiful is better than ugly.', 'Explicit is better than implicit.', 'Simple is better than complex.']
re.split(r'\n',text)
['Beautiful is better than ugly.', 'Explicit is better than implicit.', 'Simple is better than complex.']
re.split(r'\W','Good morning')
['Good', 'morning']
re.split(r'-','Good-morning')
['Good', 'morning']
re.split(r'(-)','Good-morning')
['Good', '-', 'morning']
.sub() 替换
ords = 'ORD000\nORD001\nORD003'
re.sub(r'\d+','-',ords)
'ORD-\nORD-\nORD-'
text = "Beautiful is *better* than ugly."
re.sub(r'\*(.*?)\*','<strong></strong>',text)
'Beautiful is <strong></strong> than ugly.'
re.sub(r'\*(.*?)\*','<strong>\g<1></strong>',text)
'Beautiful is <strong>better</strong> than ugly.'
re.sub(r'\*(?P<html>.*?)\*','<strong>\g<html></strong>',text)
'Beautiful is <strong>better</strong> than ugly.'
ords
'ORD000\nORD001\nORD003'
re.sub(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)
'000-ORD\n001-ORD\n003-ORD'
re.subn(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)
('000-ORD\n001-ORD\n003-ORD', 3)
(3) 编译标记:改变正则的默认行为,re.I 忽略大小写, re.M 匹配多行, re.S 指定匹配所有字符
text = 'Python python PYTHON'
re.search(r'python',text)
<_sre.SRE_Match object; span=(7, 13), match='python'>
re.findall(r'python',text)
['python']
re.findall(r'python',text,re.I)
['Python', 'python', 'PYTHON']
re.findall(r'^<html>','\n<html>')
[]
re.findall(r'^<html>','\n<html>',re.M)
['<html>']
re.findall(r'\d(.)','1\ne',re.DOTALL)
['\n']
- 模块级别操作
(1) re.purge() 清除内存
(2) re.escape() 逃逸字符
re.findall(r'^','^python')
['']
re.findall(re.escape('^'),'^python')
['^']
二. 系统工具
1. 概念
- 命令行工具
- Shell 脚本
- 系统管理
2. 系统模块
- sys
(1) 提供一组功能映射Python运行时的操作系统 - os
(1)提供跨平台可移植的操作系统编程接口
(2)os.path 提供文件及目录工具的可移植编程接口
3. sys
- 平台与版本
import sys
dir(sys)
sys.platform
'darwin'
sys.version
'3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]'
sys.path
- 观察异常细节
import traceback
try:
... raise KeyError
...except:
... print(sys.exc_info()) 获取最后一次异常细节
... traceback.print_tb(sys.exc_info()[2]) #捕获异常
...
(<class 'KeyError'>, KeyError(), <traceback object at 0x110a87e88>)
File "<input>", line 2, in <module>
- 命令行参数
import sys
def add(a,b):
#a = 5
#b = 3
return a + b
a = 0
b = 0
if sys.argv[1]:
a = int(sys.argv[1])
if sys.argv[2]:
b = int(sys.argv[2])
print(add(a,b))
#print(add())
#print(sys.argv)
#print(sys.argv[0])
#print(sys.argv[1])
#print(sys.argv[2])
- 标准流
import sys
sys.stdout.write('Hello') #标准输入流 等同于print
Hello5
print('input information:');sys.stdin.readline()[:] #标准输入流 等同于 input
input information:
Python
print('input information:');x = sys.stdin.readline()[:]
input information:
Python
x
'Python\n'
sys.stderr.write('Error')
5
sys.stderr.flush()
Error
4. os
- shell 变量
- 管理工具
import os
os.getcwd() #获当前命令
'/Users/yizhou/PycharmProjects/Python基础学习'
os.listdir()
['.idea', 'data.txt', 'data3.txt', 'pickle.db', 'pickle_db', 'Python基础学习', 'shelve_student.db', 'venv']
os.chdir('Python基础学习') #改变途径
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习'
os.getpid() #获取当前进程ID
509
os.getppid()
491
-
运行shell命令
(1)os. system() Python 脚本中运行shell命令
(2)os.popen() 运行命令并连接输入输出流 -
文件处理
os.chdir('系统模块')
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块'
os.mkdir('test')
os.listdir()
['add number.py', 'test']
os.chdir('test')
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块/test'
os.listdir()
[]
open('info.txt','w',encoding='utf8').write('Hello')
5
os.listdir()
['info.txt']
os.rename('info.txt','detail.txt')
os.listdir()
['detail.txt']
os.remove('detail.txt')
os.chdir('..') #返回上级目录
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块'
os.rmdir('test')
- 可移植工具
os.sep
'/'
os.pathsep
':'
os.curdir #相对当前目录符号
'.'
os.pardir #相对上级目录富豪
'..'
- 路径 os.path
os.path.isdir(r'/Users/yizhou/PycharmProjects') #是否路径
True
os.path.isfile(r'/Users/yizhou/PycharmProjects') #是否文件
False
os.path.exists(r'/Users/yizhou/PycharmProjectss')
False
os.path.getsize(r'/Users/yizhou/PycharmProjects')
306
os.path.split(r'/Users/yizhou/PycharmProjects/Python基础学习/data.txt')
('/Users/yizhou/PycharmProjects/Python基础学习', 'data.txt')
name = r'/Users/yizhou/PycharmProjects/Python基础学习/data.txt'
os.path.dirname(name)
'/Users/yizhou/PycharmProjects/Python基础学习'
os.path.basename(name)
'data.txt'
os.path.splitext(name)
('/Users/yizhou/PycharmProjects/Python基础学习/data', '.txt')
os.path.splitext(name)[1]
'.txt'
os.path.join(r'/Users/yizhou','product.csv')
'/Users/yizhou/product.csv'
name
'/Users/yizhou/PycharmProjects/Python基础学习/data.txt'
os.path.split(name)
('/Users/yizhou/PycharmProjects/Python基础学习', 'data.txt')
name.split(os.sep)
['', 'Users', 'yizhou', 'PycharmProjects', 'Python基础学习', 'data.txt']
p = '/Users/yizhou/dd\\files/data.csv'
p
'/Users/yizhou/dd\\files/data.csv'
os.path.normpath(p) #标准化
'/Users/yizhou/dd\\files/data.csv'
os.path.abspath('..') #绝对化路径
'/Users/yizhou/PycharmProjects/Python基础学习'
os.path.abspath('.')
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习'