ImageMagick 和 PyPDF2 崩溃 Python 一起使用时

Question

我有一个约 20-25 页的 PDF 文件。该工具的目的是将 PDF 文件拆分为页面（使用 PyPdf2），将每个 PDF 页面保存在一个目录中（使用 PyPdf2），将 PDF 页面转换为图像（使用 ImageMagick），然后使用 tesseract（使用 PIL 和 PyOCR) 提取数据。该工具最终将通过 tkinter 成为 GUI，因此用户可以通过单击按钮多次执行相同的操作。在我大量的测试中，我注意到如果整个过程重复大约 6-7 次，tool/python 脚本会崩溃，因为在 Windows 上显示没有响应。我进行了一些调试，但不幸的是没有抛出任何错误。内存和 CPU 都很好，所以也没有问题。我能够通过观察缩小问题范围，在到达 tesseract 部分之前，PyPDF2 和 ImageMagick 在运行在一起时失败。通过将问题简化为以下 Python 代码，我能够重现该问题：

from wand.image import Image as Img
from PIL import Image as PIL
import pyocr
import pyocr.builders
import io, sys, os 
from PyPDF2 import PdfFileWriter, PdfFileReader


def splitPDF (pdfPath):
    #Read the PDF file that needs to be parsed.
    pdfNumPages =0
    with open(pdfPath, "rb") as pdfFile:
        inputpdf = PdfFileReader(pdfFile)

        #Iterate on every page of the PDF.
        for i in range(inputpdf.numPages):
            #Create the PDF Writer Object
            output = PdfFileWriter()
            output.addPage(inputpdf.getPage(i))
            with open("tempPdf%s.pdf" %i, "wb") as outputStream:
                output.write(outputStream)

        #Get the number of pages that have been split.
        pdfNumPages = inputpdf.numPages

    return pdfNumPages

pdfPath = "Test.pdf"
for i in range(1,20):
    print ("Run %s\n--------" %i)
    #Split the PDF into Pages & Get PDF number of pages.
    pdfNumPages = splitPDF (pdfPath)
    print(pdfNumPages)
    for i in range(pdfNumPages):
        #Convert the split pdf page to image to run tesseract on it.
        with Img(filename="tempPdf%s.pdf" %i, resolution=300) as pdfImg:
            print("Processing Page %s" %i)

我已经使用with语句正确处理了文件的打开和关闭，所以那里应该没有内存泄漏。我已经尝试运行分开分割部分和图像转换部分，单独运行时它们工作正常。但是当代码合并时，它会在迭代大约 5-6 次后失败。我使用了 try 和异常块，但没有捕获到错误。此外，我正在使用所有库的最新版本。感谢任何帮助或指导。

谢谢。

Answer 1

供将来参考，问题是由于其中一条评论中提到的 ImageMagick 的 32 位版本（感谢 emcconville）。卸载 Python 和 ImageMagick 32 位版本并安装这两个 64 位版本解决了这个问题。希望这有帮助。

ImageMagick 和 PyPDF2 崩溃 Python 一起使用时

ImageMagick & PyPDF2 Crashing Python When used Together

python

tesseract

imagemagick

python-imaging-library

pypdf2