Extracting text then saving to plain text file - TypeError: an integer is required (got type str)

Question

我正在将 pdf 文件转换为文本，并从之前的 post:

中获取了这段代码

Extracting text from a PDF file using PDFMiner in python?

当我打印（文本）时，它完全按照我的要求完成了，但是我需要将其保存到文本文件中，这就是我遇到上述错误的时候。

代码完全遵循链接问题的第一个答案。那我:

text = convert_pdf_to_txt("GMCA ECON.pdf")

file = open('GMCAECON.txt', 'w', 'utf-8')
file.write(text)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-ebc6b7708d93> in <module>
----> 1 file = open('GMCAECON.txt', 'w', 'utf-8')
  2 file.write(text)

TypeError: an integer is required (got type str)

恐怕这可能真的很简单，但我想不通。我希望它将文本写入具有相同名称的文本文件，然后我可以对其进行进一步分析。谢谢

Answer 1

问题是你的第三个参数。 open 接受的第三个位置参数是缓冲，而不是编码。

像这样调用open：

open('GMCAECON.txt', 'w', encoding='utf-8')

你的问题应该会消失。

Answer 2

当您执行 file = open('GMCAECON.txt', 'w', 'utf-8') 时，您将位置参数传递给 open()。您传递的第三个参数是 encoding，但它期望的第三个参数是 buffering。您需要将 encoding 作为关键字参数传递，例如 file = open('GMCAECON.txt', 'w', encoding='utf-8')

请注意，使用 with 上下文管理器

会更好

with open('GMCAECON.txt', 'w', encoding='utf-8') as f:
    f.write(text)

Extracting text then saving to plain text file - TypeError: an integer is required (got type str)

Extracting text then saving to plain text file - TypeError: an integer is required (got type str)

python

pdf

text

pdfminer