Python makes manipulating PDF files fun

Hello everyone, I am Picasso (lock!)

Do you feel very boring when you usually operate PDF files?

So today I will teach you how to use Python to make manipulating PDF files more interesting.

 Table of Contents
 Tool
 Extract Text from PDF
 Rotate and Overlay Pages
 Encrypt PDF Files
 Create PDF Files
 Summary

PDF is the abbreviation of Portable Document Format, and such files usually use `.pdf` as their extension. In daily development work, the two tasks that are most likely to be encountered are reading text content from PDF and generating PDF documents with existing content.

tool :

python3.7

Pycharm

PDF

PyPDF2

report

Extracting text from PDFs
PyPDF2 has no way to extract images, diagrams, or other media from PDF documents, but it can extract text, and return it as a Python string.

import PyPDF2

reader = PyPDF2.PdfFileReader('test.pdf')
page = reader.getPage(0)
print(page.extractText())


Rotate and overlay pages


​ In the above code , the PDF document is read by creating a PdfFileReader object . The getPage method of this object can obtain the specified page of the PDF document and obtain a PageObject object. The clockwise rotation and rotation of the page can be realized through the rotateClockwise and rotateCounterClockwise methods of the PageObject object. Rotate counterclockwise, a new blank page can be added through the addBlankPage method of the PageObject object, the code is as follows.

import PyPDF2

from PyPDF2.pdf import PageObject

# 创建一个读PDF文件的Reader对象
reader = PyPDF2.PdfFileReader('resources/xxx.pdf')
# 创建一个写PDF文件的Writer对象
writer = PyPDF2.PdfFileWriter()
# 对PDF文件所有页进行循环遍历
for page_num in range(reader.numPages):
    # 获取指定页码的Page对象
    current_page = reader.getPage(page_num)  # type: PageObject
    if page_num % 2 == 0:
        # 奇数页顺时针旋转90度
        current_page.rotateClockwise(90)
    else:
        # 偶数页反时针旋转90度
        current_page.rotateCounterClockwise(90)
    writer.addPage(current_page)
# 最后添加一个空白页并旋转90度
page = writer.addBlankPage()  # type: PageObject
page.rotateClockwise(90)
# 通过Writer对象的write方法将PDF写入文件
with open('resources/xxx.pdf', 'wb') as file:
    writer.write(file)


Encrypt PDF files


​ Use the PdfFileWrite object in PyPDF2 to encrypt PDF documents. If you need to set a unified access password for a series of PDF documents, it will be very convenient to use a Python program to process them. i

mport PyPDF2

reader = PyPDF2.PdfFileReader('resources/XGBoost.pdf')
writer = PyPDF2.PdfFileWriter()
for page_num in range(reader.numPages):
    writer.addPage(reader.getPage(page_num))
# 通过encrypt方法加密PDF文件,方法的参数就是设置的密码
writer.encrypt('foobared')
with open('resources/XGBoost-encrypted.pdf', 'wb') as file:
    writer.write(file)



Create PDF files


​ Creating PDF documents requires the support of the three-party library reportlab , which can be installed using the pip install reportlab command

from reportlab.lib.pagesizes import A4
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfgen import canvas

pdf_canvas = canvas.Canvas('resources/python创建.pdf', pagesize=A4)
width, height = A4

# 绘图
image = canvas.ImageReader('resources/xxx.jpg')
pdf_canvas.drawImage(image, 20, height - 395, 250, 375)

# 显示当前页
pdf_canvas.showPage()

# 注册字体文件
pdfmetrics.registerFont(TTFont('Font1', 'resources/fonts/Vera.ttf'))
pdfmetrics.registerFont(TTFont('Font2', 'resources/fonts/青呱石头体.ttf'))

# 写字
pdf_canvas.setFont('Font2', 40)
pdf_canvas.setFillColorRGB(0.9, 0.5, 0.3, 1)
pdf_canvas.drawString(width // 2 - 120, height // 2, '你好,世界!')
pdf_canvas.setFont('Font1', 40)
pdf_canvas.setFillColorRGB(0, 1, 0, 0.5)
pdf_canvas.rotate(18)
pdf_canvas.drawString(250, 250, 'hello, world!')

# 保存
pdf_canvas.save()


Summarize

The above is the operation of python on PDF documents. Thank you for your support.

I'm Bi Jiasuo, looking forward to your attention

Guess you like

Origin blog.csdn.net/weixin_69999177/article/details/125106816