Python Module之textwrap - 文本段落格式编排

模块目的:通过调整每行文本的断句位置来对段落文本进行格式化编排。

在一些需要文本美观输出(打印)的场景中,textwrap模块可以用来格式化编排文本。它提供了类似于在许多文本编辑器和文字处理器中使用的段落包装、填充等程序化功能。


样例数据

本节的样例我们建立textwrap_example.py模块,其包含一个多行字符串sample_text

# textwrap_example.py
sample_text = '''
    The textwrap module can be used to format text for output in
    situations where pretty-printing is desired.  It offers
    programmatic functionality similar to the paragraph wrapping
    or filling features found in many text editors.
    '''

填充段落(Fill)

fill()方法使用文本作为输入,并返回格式化之后的文本。

# textwrap_fill.py

import textwrap
from textwrap_example import sample_text

print(textwrap.fill(sample_text, width=50))

结果并不令人满意。文本现在被左对齐,但是第一行文本保留了其行首缩进,而其余各行行首的空白都被嵌入到了段落中间。

$ python3 textwrap_fill.py

     The textwrap module can be used to format
text for output in     situations where pretty-
printing is desired.  It offers     programmatic
functionality similar to the paragraph wrapping
or filling features found in many text editors.

去除“空白”(Dedent)

在上个例子中,我们的格式化文本输出中间混杂着一些制表符和多余的空白,所以它看起来并不美观。dedent()方法可以去除样例字符串中每一行文本行首的共有空白,这样可以使结果看起来更美观。样例字符串是为了说明该特性人为的加上的空白。

# textwrap_dedent.py

import textwrap
from textwrap_example import sample_text

dedented_text = textwrap.dedent(sample_text)
print('Dedented:')
print(dedented_text)

结果开始变得美观起来:

$ python3 textwrap_dedent.py

Dedented:

The textwrap module can be used to format text for output in
situations where pretty-printing is desired.  It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.

dedent()(去除缩进)是”indent”(缩进/空白)的对立面,dedent()方法的结果就是每一行文本行首的共有空白被去除了。但是如果有一行文本本身比其他行多一些空白,那么这些多出来的空白将不会被去除。

比如,我们用下划线_代替空白,输入:

_Line one.
__Line two.
_Line Three.

那么输出结果是:

Line one.
_Line two.
Line Three.

Dedent和Fill结合使用

接下来,去除行首空白之后的文本可以传递给fill()方法,并使用不同的width参数值来测试:

# textwrap_fill_width.py

import textwrap
from textwrap_example import sample_text

dedented_text = textwrap.dedent(sample_text).strip()
for width in [45, 60]:
    print('{} Columns: \n'.format(width))
    print(textwrap.fill(dedented_text, width=width))
    print()

结果如下:

$ python3 textwrap_fill_width.py

45 Columns:

The textwrap module can be used to format
text for output in situations where pretty-
printing is desired.  It offers programmatic
functionality similar to the paragraph
wrapping or filling features found in many
text editors.

60 Columns:

The textwrap module can be used to format text for output in
situations where pretty-printing is desired.  It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.

添加缩进文本

使用indent()方法可以在一个多行字符串的每一行行首添加一致的前缀文本。下述例子在一个样例字符串的每一行行首添加>前缀,使其变成邮件回复中被引用的格式。

# textwrap_indent.py

import textwrap
from textwrap_example import sample_text

dedented_text = textwrap.dedent(sample_text)
wrapped = textwrap.fill(dedented_text, width=50)
wrapped += '\n\nSecond paragraph after a blank line.'
final = textwrap.indent(wrapped, '> ')

print('Quoted block:\n')
print(final)

样例段落被分割成新的每一行,并在每一行前面加上前缀>,接着这些行组成一个新的字符串并返回。

$ python3 textwrap_indent.py

Quoted block:

>  The textwrap module can be used to format text
> for output in situations where pretty-printing is
> desired.  It offers programmatic functionality
> similar to the paragraph wrapping or filling
> features found in many text editors.

> Second paragraph after a blank line.

如果我们想要控制给哪些行添加前缀,可以给indent()方法传递predicate断言参数,该参数是一个方法,对每一行文本,indent()方法先调用该方法进行判断,如果该方法返回True,则在这一行前面添加前缀,否则不添加。

# textwrap_indent_predicate.py

import textwrap
from textwrap_example import sample_text

def should_indent(line):
    print('Indent {!r}?'.format(line))
    return len(line.strip()) % 2 == 0

dedented_text = textwrap.dedent(sample_text)
wrapped = textwrap.fill(dedented_text, width=50)
final = textwrap.indent(wrapped, 'EVEN ', predicate=should_indent)

print('\nQuoted block:\n')
print(final)

结果是我们只给长度为偶数的每一行添加了前缀:

$ python3 textwrap_indent_predicate.py

Indent ' The textwrap module can be used to format text\n'?
Indent 'for output in situations where pretty-printing is\n'?
Indent 'desired.  It offers programmatic functionality\n'?
Indent 'similar to the paragraph wrapping or filling\n'?
Indent 'features found in many text editors.'?

Quoted block:

EVEN  The textwrap module can be used to format text
for output in situations where pretty-printing is
EVEN desired.  It offers programmatic functionality
EVEN similar to the paragraph wrapping or filling
EVEN features found in many text editors.

凸排(段落内缩)

我们也可以使用fill()方法实现添加前缀,同样的,我们可以设置输出的宽度,并且第一行文本的前缀文本可以单独设置。

# textwrap_hanging_indent.py

import textwrap
from textwrap_example import sample_text

dedented_text = textwrap.dedent(sample_text)
print(textwrap.fill(dedented_text, initial_indent='', subsequent_indent=' ' * 4, width=50))

这样可以很容易产生一段“凸排”文字,即第一行文本的缩进比其他行少。

$ python3 textwrap_hanging_indent.py

The textwrap module can be used to format text for
    output in situations where pretty-printing is
    desired.  It offers programmatic functionality
    similar to the paragraph wrapping or filling
    features found in many text editors.

前缀文本也可以是非空白字符,比如可以用星号*,这样就可以产生一段条列要点。


截断长字符串

我们可以使用shorten()方法来截断较长的字符串以此来产生一段摘要或概述。所有的空白字符,比如制表符、换行符、成串的空格都会被替换成一个空格。文本会以少于或等于所要求的文本长度而截断,截断的地方都在单词边界以避免不完整单词的出现。

# textwrap_shorten.py

import textwrap
from textwrap_example import sample_text

dedented_text = textwrap.dedent(sample_text)
original = textwrap.fill(dedented_text, width=50)

print('Original:\n')
print(original)

shortened = textwrap.shorten(original, 100)
shortened_wrapped = textwrap.fill(shortened, width=50)

print('\nShortened:\n')
print(shortened_wrapped)

如果原始字符串中的非空白字符被去除,那么它会被一个占位符代替。默认的占位符是[...],它可以通过给shorten()方法传递一个placeholder参数来设置。

$ python3 textwrap_shorten.py

Original:

 The textwrap module can be used to format text
for output in situations where pretty-printing is
desired.  It offers programmatic functionality
similar to the paragraph wrapping or filling
features found in many text editors.

Shortened:

The textwrap module can be used to format text for
output in situations where pretty-printing [...]

原文点这里

参考:
1.textwrap模块的官方文档

猜你喜欢

转载自blog.csdn.net/gggavin/article/details/78937682