Python 基础知识(十一)-----正则表达式

一. 正则表达式

1. 概述

  1. 概念
    (1)Regular Expression
    (2) 一种文本模式,描述在搜索文本时要匹配一个或多个字符串
  2. 典型场景
    (1)数据验证
    (2)文本扫描
    (3)文本提取
    (4)文本替换
    (5)文本分割
  3. 语法
    (1)字面值:普通字符,需转义(,^,$…)
    (2)元字符
  4. 匹配
    (1)单字,预定义元字符:. 除/n外所有字符,\d数字,\D 所有非数字, \s 空白字符,\S 非空白字符,\w 字母数字字符,\W 非字线数字
    (2)批量备选:| yes|no
    (3) 量词(字符、元字符,字符集如何重复):?0或1次,* 0或多次,+1或多次,特定{}, {n} n次,{n,}至少n次,{,m}最多m次
    (3)贪婪或非贪婪:贪婪(尽量匹配最大的范围),非贪婪(尽量匹配最小范围,*后加?)
    (4)边界匹配:^行首,$行尾,\b单词边界

2. Python 正则

1.模块
(1)import re

import re
text = 'Tom is 8 years old. Mike is 25 years old.'
pattern = re.compile('\d+')
pattern.findall(text)
['8', '25']
re.findall('\d',text)
['8', '2', '5']
  1. RegexDbject
    (1) 模式对象,表现编译后的正则表达式
    (2)编译: re.compile(r’模式’)
s = '\\author:Tom'
pattern = re.compile('\\author')
pattern.findall(s)
[]
pattern = re.compile('\\\\author')
pattern.findall(s)
['\\author']
pattern = re.compile(r'\\author')
pattern.findall(s)
['\\author']

(3)findall():查找所有非重叠匹配,返回list

text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
pattern = re.compile(r'\d+')
pattern.findall(text)
['8', '23', '87']
p_name = re.compile(r'[A-Z]\w+')
p_name.findall(text)
['Tom', 'Mike', 'Peter']

(4).match(string[,pos[,endpos]]):匹配,仅从起始位置, 返回 HatchObject

text = '<html><head></head><body></body></html>'
pattern.match(text)
<_sre.SRE_Match object; span=(0, 6), match='<html>'>
text2 = ' <html><head></head><body></body></html>'
pattern.match(text2)
pattern.match(text2,1)
<_sre.SRE_Match object; span=(1, 7), match='<html>'>

(5).search(string[,pos[,endpos]]): 任意位置搜索,返回 HatchObject

text = 'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
p1 = re.compile(r'\d')
p2 = re.compile(r'[A-Z]\w+')
p1.match(text)
p2.match(text)
<_sre.SRE_Match object; span=(0, 3), match='Tom'>
p1.search(text)
<_sre.SRE_Match object; span=(7, 8), match='8'>

(6) .finditer(): 返回可迭代对象,返回 HatchObject

text
'Tom is 8 years old. Mike is 23 years old. Peter is 87 years old.'
p1
re.compile('\\d')
p1.findall(text)
['8', '2', '3', '8', '7']
it = p1.finditer(text)
for m in it:
...    print(m)
...    
<_sre.SRE_Match object; span=(7, 8), match='8'>
<_sre.SRE_Match object; span=(28, 29), match='2'>
<_sre.SRE_Match object; span=(29, 30), match='3'>
<_sre.SRE_Match object; span=(51, 52), match='8'>
<_sre.SRE_Match object; span=(52, 53), match='7'>

  1. MatchObject 匹配对象
    (1) 表现被匹配的模式
    (2). group(): 参数为0或空返回整个匹配
import re
text = 'Tom is 8 years old. Jerry is 23 years old.'
pattern = re.compile(r'\d+')
pattern.findall(text)
['8', '23']
pattern = re.compile(r'(\d+).*?(\d+)')
m = pattern.search(text)
m
<_sre.SRE_Match object; span=(7, 31), match='8 years old. Jerry is 23'>
m.group()
'8 years old. Jerry is 23'
m.group(0) #返回整个匹配
'8 years old. Jerry is 23'
m.group(1)
'8'
m.group(2)
'23'
m.start(1) #返回特定分组的起始索引
7
m.end(1)
8
m.start(2)
29
m.end(2)
31
m.groups()
('8', '23')

pattern = re.compile(r'(\w+) (\w+)')
text = "Beautiful is better than ugly"
pattern.findall(text)
[('Beautiful', 'is'), ('better', 'than')]
it = pattern.finditer(text)
for m in it:
...    print(m.group())
...    
Beautiful is
better than
  1. Group编组
    (1)场景
import re
re.search(r'(ab)+c','ababc') #从匹配模式中提取信息,创建子正则以应用量词
<_sre.SRE_Match object; span=(0, 5), match='ababc'>
re.search(r'Center|re','Center')
<_sre.SRE_Match object; span=(0, 6), match='Center'>
re.search(r'Center|re','Centre')
<_sre.SRE_Match object; span=(4, 6), match='re'>
re.search(r'Cent(er|re)','Centre')#限制备选范围
<_sre.SRE_Match object; span=(0, 6), match='Centre'>
re.search(r'(\w+) \1','hello hello world') #重用正则模式中应用的内容

<_sre.SRE_Match object; span=(0, 11), match='hello hello'>

(2) 声明:(模式),(?P模式)

text = "Tom:98"
pattern = re.compile(r'(?P<name>\w+):(?P<score>\d+)')
m = pattern.search(text)
m.group()
'Tom:98'
m.group(1)
'Tom'
m.group('name')
'Tom'
m.group('score')
'98'

(3)引用: 匹配对象内 m.group(‘name’), 模式内(?P=name),表现内\h

  1. 应用
    (1)字符串操作:
    .split() 分割
text = "Beautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex."
p = re.compile(r'\n')
p.split(text)
['Beautiful is better than ugly.', 'Explicit is better than implicit.', 'Simple is better than complex.']
re.split(r'\n',text)
['Beautiful is better than ugly.', 'Explicit is better than implicit.', 'Simple is better than complex.']
re.split(r'\W','Good morning')
['Good', 'morning']
re.split(r'-','Good-morning')
['Good', 'morning']
re.split(r'(-)','Good-morning')
['Good', '-', 'morning']

.sub() 替换

ords = 'ORD000\nORD001\nORD003'
re.sub(r'\d+','-',ords)
'ORD-\nORD-\nORD-'
text = "Beautiful is *better* than ugly."
re.sub(r'\*(.*?)\*','<strong></strong>',text)
'Beautiful is <strong></strong> than ugly.'
re.sub(r'\*(.*?)\*','<strong>\g<1></strong>',text)
'Beautiful is <strong>better</strong> than ugly.'
re.sub(r'\*(?P<html>.*?)\*','<strong>\g<html></strong>',text)
'Beautiful is <strong>better</strong> than ugly.'
ords
'ORD000\nORD001\nORD003'
re.sub(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)
'000-ORD\n001-ORD\n003-ORD'
re.subn(r'([A-Z]+)(\d+)','\g<2>-\g<1>',ords)
('000-ORD\n001-ORD\n003-ORD', 3)

(3) 编译标记:改变正则的默认行为,re.I 忽略大小写, re.M 匹配多行, re.S 指定匹配所有字符

text = 'Python python PYTHON'
re.search(r'python',text)
<_sre.SRE_Match object; span=(7, 13), match='python'>
re.findall(r'python',text)
['python']
re.findall(r'python',text,re.I)
['Python', 'python', 'PYTHON']
re.findall(r'^<html>','\n<html>')
[]
re.findall(r'^<html>','\n<html>',re.M)
['<html>']
re.findall(r'\d(.)','1\ne',re.DOTALL)
['\n']
  1. 模块级别操作
    (1) re.purge() 清除内存
    (2) re.escape() 逃逸字符
re.findall(r'^','^python')
['']
re.findall(re.escape('^'),'^python')
['^']

二. 系统工具

1. 概念

  1. 命令行工具
  2. Shell 脚本
  3. 系统管理

2. 系统模块

  1. sys
    (1) 提供一组功能映射Python运行时的操作系统
  2. os
    (1)提供跨平台可移植的操作系统编程接口
    (2)os.path 提供文件及目录工具的可移植编程接口

3. sys

  1. 平台与版本
import sys
dir(sys)

sys.platform
'darwin'
sys.version
'3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]'
sys.path
  1. 观察异常细节
import traceback
try:
...    raise KeyError
...except:
...    print(sys.exc_info()) 获取最后一次异常细节
...    traceback.print_tb(sys.exc_info()[2]) #捕获异常
...    
(<class 'KeyError'>, KeyError(), <traceback object at 0x110a87e88>)
  File "<input>", line 2, in <module>
  1. 命令行参数
import sys
def add(a,b):
    #a = 5
    #b = 3
    return a + b

a = 0
b = 0

if sys.argv[1]:
    a = int(sys.argv[1])

if sys.argv[2]:
    b = int(sys.argv[2])

print(add(a,b))

#print(add())
#print(sys.argv)
#print(sys.argv[0])
#print(sys.argv[1])
#print(sys.argv[2])
  1. 标准流
import sys
sys.stdout.write('Hello') #标准输入流 等同于print
Hello5
print('input information:');sys.stdin.readline()[:] #标准输入流 等同于 input
input information:
Python
print('input information:');x = sys.stdin.readline()[:]
input information:
Python
x
'Python\n'
sys.stderr.write('Error')
5
sys.stderr.flush()
Error

4. os

  1. shell 变量
  2. 管理工具
import os
os.getcwd() #获当前命令
'/Users/yizhou/PycharmProjects/Python基础学习'
os.listdir()
['.idea', 'data.txt', 'data3.txt', 'pickle.db', 'pickle_db', 'Python基础学习', 'shelve_student.db', 'venv']
os.chdir('Python基础学习') #改变途径
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习'
os.getpid() #获取当前进程ID
509
os.getppid()
491
  1. 运行shell命令
    (1)os. system() Python 脚本中运行shell命令
    (2)os.popen() 运行命令并连接输入输出流

  2. 文件处理

os.chdir('系统模块')
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块'
os.mkdir('test')
os.listdir()
['add number.py', 'test']
os.chdir('test')
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块/test'
os.listdir()
[]
open('info.txt','w',encoding='utf8').write('Hello')
5
os.listdir()
['info.txt']
os.rename('info.txt','detail.txt')
os.listdir()
['detail.txt']
os.remove('detail.txt')
os.chdir('..') #返回上级目录
os.getcwd()
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习/系统模块'
os.rmdir('test')
  1. 可移植工具
os.sep
'/'
os.pathsep
':'
os.curdir #相对当前目录符号
'.'
os.pardir #相对上级目录富豪
'..'
  1. 路径 os.path
os.path.isdir(r'/Users/yizhou/PycharmProjects') #是否路径
True
os.path.isfile(r'/Users/yizhou/PycharmProjects') #是否文件
False
os.path.exists(r'/Users/yizhou/PycharmProjectss')
False
os.path.getsize(r'/Users/yizhou/PycharmProjects')
306
os.path.split(r'/Users/yizhou/PycharmProjects/Python基础学习/data.txt')
('/Users/yizhou/PycharmProjects/Python基础学习', 'data.txt')
name = r'/Users/yizhou/PycharmProjects/Python基础学习/data.txt'
os.path.dirname(name)
'/Users/yizhou/PycharmProjects/Python基础学习'
os.path.basename(name)
'data.txt'
os.path.splitext(name)
('/Users/yizhou/PycharmProjects/Python基础学习/data', '.txt')
os.path.splitext(name)[1]
'.txt'
os.path.join(r'/Users/yizhou','product.csv')
'/Users/yizhou/product.csv'
name
'/Users/yizhou/PycharmProjects/Python基础学习/data.txt'
os.path.split(name)
('/Users/yizhou/PycharmProjects/Python基础学习', 'data.txt')
name.split(os.sep)
['', 'Users', 'yizhou', 'PycharmProjects', 'Python基础学习', 'data.txt']
p = '/Users/yizhou/dd\\files/data.csv'
p
'/Users/yizhou/dd\\files/data.csv'
os.path.normpath(p) #标准化
'/Users/yizhou/dd\\files/data.csv'
os.path.abspath('..') #绝对化路径
'/Users/yizhou/PycharmProjects/Python基础学习'
os.path.abspath('.')
'/Users/yizhou/PycharmProjects/Python基础学习/Python基础学习'
发布了11 篇原创文章 · 获赞 0 · 访问量 179

猜你喜欢

转载自blog.csdn.net/mangogogo321/article/details/104936979