Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'

Question

我需要从 pdf 文件中提取文本并使用 pdfminer.six 成功地提取了文本段落和表格。但是现在我收到与

行相关的错误

from pdfminer.pdfparser import PDFParser, PDFDocument:

ImportError: 无法从 'pdfminer.pdfparser' (C:\Users[username]\Anaconda3\lib\site-packages\pdfminer\pdfparser.py)

导入名称 'PDFDocument'

我正在使用 Anaconda Jupyter。 Python 3.7.3。包裹pdfminer.six-20181108

我使用的代码基于此： How to read pdf file using pdfminer3k?

根据下面给出的建议，我多次尝试卸载并重新安装 Anaconda 和 pdfminer.six 以及其他软件包： https://github.com/pdfminer/pdfminer.six/issues/196 一周前它突然起作用了，但现在我又报错了。

因为我在 Win10 上工作，所以我也尝试使用 Linux Ubuntu，如下所述： https://medium.com/hugo-ferreiras-blog/using-windows-subsystem-for-linux-for-data-science-9a8e68d7610c

同样的错误。

然后，根据下面的网页，我认为拆分 PDFparser、PDFDocument 值得一试：来自

from pdfminer.pdfparser import PDFParser, PDFDocument

至

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

https://loctv.wordpress.com/2017/02/07/fix-importerror-cannot-import-name-pdfdocument-when-using-slate/ .. 但这会在稍后的代码中产生新的错误。

我的代码开头如下所示：

```
path = [name and path of file]
fp = open(path, 'rb')
from pdfminer.pdfparser import PDFParser, PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox, LTTextLine
```

我希望能够运行代码并从 pdf 文件中提取文本，但是代码因与 PDFDocument pdfminer.pdfparser

相关的错误而停止

非常感谢任何关于我应该做什么的建议！可能与 pdfminer.six 的安装方式有关？

Answer 1

我得到了 Notodden Serit 的帮助。改变这个：

from pdfminer.pdfparser import PDFParser, PDFDocument

至：

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

并在

中添加解析器

doc = PDFDocument()

收件人：

doc = PDFDocument(parser)

然后：

for page in doc.get_pages():

收件人：

for page in PDFPage.create_pages(doc):

Answer 2

According to the pdfminer documentation

from pdfminer.pdfdocument import PDFDocument`

Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'

Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'

python-3.x

pdfminer