Convert pdf files to word files - installation + use + problem solving of the pdf2docx library of Python tools

#Inspiration#

Occasionally I saw an article online "That's awesome!" It only takes 2 lines of code to easily convert PDF to Word!》 I knew about the pdf2docx library, so I tried it to achieve the effect. I felt it was OK, so I recorded it. The above article contains some clutter and errors and is for reference only.

When installing third-party libraries, remember to pay attention. If you create a new virtual environment, you need to install the library into your virtual environment, otherwise it will cause an import error, because you There is no library you installed in the virtual environment. If you install the library into the default environment (such as Anaconda's base environment), it cannot be used in the newly created virtual environment unless the interpreter uses the default interpreter.

Environment configuration: I use Python3.10 installed by Anaconda, Pycharm as the editor, and use Pycharm to create a new virtual environment in Anaconda, as shown below.

Install the pdf2docx library: Click the Anaconda icon in the picture above, use the Conda package manager to install, click the + sign, enter pdf2docx in the pop-up dialog box to search and install.

When encountering an error:Be careful not to duplicate the file name with the library name, otherwise an error will be reported. For example, I initially named the file pdf2docx.py, but I kept getting errors, ImportError: cannot import name 'Converter' from partially initialized module 'pdf2docx' (most likely due to a circular import) .

Correct the error: Change the file name to pdf2word, the error disappears, and successfully convert the pdf file to a docx file.

Implementation code: In the code, start=0 is the starting page of PDF conversion, and end=123 is the end page. If these two items are not filled in, it will default from the first page to the last page.

from pdf2docx import Converter

pdf_file = 'D:\pythonProject\pythonExploring\P020231128362131003708.pdf'
docx_file = 'D:\pythonProject\pythonExploring\P020231128362131003708.docx'
conv = Converter(pdf_file)
conv.convert(docx_file, start=0, end=123)
conv.close()

Conversion effect: Comparing the contents of the original PDF file and the converted docx file, the accuracy is very high.

pdf interface

word interface

Reference article:

Use python to convert pdf files to word files | pdf2docx installation + quick use_python pdf2docx-CSDN blog 6 lines of python code uses the pdf2docx module Converter object to convert pdf to docx files _pdf2docx converter-CSDN blog

Guess you like

Origin blog.csdn.net/syluxhch/article/details/134809931