字符串与文本06-指定列宽格式化字符串 / 字符串中处理html和xml / 字节字符串上的字符串操作

以指定列宽格式化长字符串

textwrap 模块

s = "Look into my eyes, look into my eyes, the eyes, the eyes, \
the eyes, not around the eyes, don't look around the eyes, \
look into my eyes, you're under."

#没格式化
print(s)
#Look into my eyes, look into my eyes, the eyes, the eyes, the eyes, not around the eyes, don't look around the eyes, look into my eyes, you're under.

import textwrap
#使用格式化，指定行长度
print(textwrap.fill(s, 70))
'''
Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,
not around the eyes, don't look around the eyes, look into my eyes,
you're under.
'''

#使用格式化，指定行长度，并以空格符开头
print(textwrap.fill(s, 40, initial_indent='    '))
'''
    Look into my eyes, look into my
eyes, the eyes, the eyes, the eyes, not
around the eyes, don't look around the
eyes, look into my eyes, you're under.
'''

#使用格式化，指定行长度，除首行外全文以空格开头
print(textwrap.fill(s, 40, subsequent_indent='    '))
'''
Look into my eyes, look into my eyes,
    the eyes, the eyes, the eyes, not
    around the eyes, don't look around
    the eyes, look into my eyes, you're
    under.
'''

textwrap 模块对于字符串打印是非常有用的，特别是当你希望输出自动匹配终端
大小的时候。你可以使用 os.get terminal size() 方法来获取终端的大小尺寸。

字符串中处理 html 和 xml

使用 html.escape()

此方法可替换文本字符串中的 ‘<’ 或者 ‘>’,比如：

mstr = '这是一个文本标签"<tag>text</tag>".'

print(mstr)                          # 这是一个文本标签<tag>text</tag>".

import html
#使用escape()
print(html.escape(mstr))             # 这是一个文本标签&quot;&lt;tag&gt;text&lt;/tag&gt;&quot;.
#使用escape()，但保留字符串中的双引号
print(html.escape(mstr,quote=False)) # 这是一个文本标签"&lt;tag&gt;text&lt;/tag&gt;".

运用参数 errors='xmlcharrefreplace

此法用于 ASCII 文本，可将非 ASCII 文本变成对应的编码实体

s = 'Spicy Jalapeño'
print(s.encode('ascii', errors='xmlcharrefreplace'))
# b'Spicy Jalape&#241;o'

通常处理 HTML或者 XML 文本，试着先使用一个合适的 HTML 或者 XML 解析器。这些工具会自动替换这些编码值。但有时候，如果有一些含有编码值的原始文本，则需要手动去做替换，可使用 HTML 或者 XML 解析器的一些相关工具函数/方法比如：

HTMLParser().unescape() 与 unescape()

例1:HTML文本
s = 'Spicy &quot;Jalape&#241;o&quot.'

from html.parser import HTMLParser

p = HTMLParser()
print(p.unescape(s))   # 'Spicy "Jalapeño".'

例2：XML文本
t = 'The prompt is &gt;&gt;&gt;'

from xml.sax.saxutils import unescape
print(unescape(t))     # 'The prompt is >>>'

字节字符串上的字符串操作

字节字符串的“切片、搜索、分割、替换”

data = b'Hello World'

#切片
print(data[0:5])     # b'Hello'
#搜索
print(data.startswith(b'Hello'))    # True
#分割字符串
print(data.split())    # [b'Hello', b'World']
#替换
print(data.replace(b'Hello', b'Hello Cruel'))    # b'Hello Cruel World'

使用正则表达式匹配字节字符串，但是正则表达式本身必须也是字节串。比如：

data = b'FOO:BAR,SPAM'

import re
print(re.split(b'[:,]',data))  #[b'FOO', b'BAR', b'SPAM']

注意：

字节字符串的索引操作返回整数而不是单独字符

a = 'Hello World'
print(a[0])  # 'H'
print(a[1])  # 'e'

b =  b'Hello World'
print(b[0])  # 72
print(b[1])  # 101

字节与字符的转换

#字节转字符串
s = b'Hello World'
print(s)                   # b'Hello World'
print(s.decode('ascii'))   # Hello World

#字符串转字节
s = 'Hello World'
print(s)                   # 'Hello World'
print(s.encode())          # b'Hello World'

字符串与文本06-指定列宽格式化字符串 / 字符串中处理html和xml / 字节字符串上的字符串操作

以指定列宽格式化长字符串

字符串中处理 html 和 xml

字节字符串上的字符串操作

猜你喜欢