如何让 pyPdf 与 os 或 glob 一起工作

Question

我的目标是使用 Python 读取包含多个 PDF 文件的目录和 return 每个文件的页数。我正在尝试使用 pyPdf 库，但它失败了。

如果我这样做：

from pyPdf import PdfFileReader

testFile = "C:\path\file.pdf"
pdfFile = PdfFileReader(file(testFile, 'rb'))
print pdfFile.getNumPages()

我会得到结果

如果我这样做，它会失败：

pdfList = []
for root, dirs, files in os.walk("C:\path"):
   for file in files:
     pdfList.append(os.path.join(root, file)

for item in pdfList:
  targetPdf = PdfFileReader(file(item,'rb'))
  numPages = targetPdf.getNumPages()
  print item, numPages

这总是导致：

TypeError: 'str' object is not callable

如果我尝试手动重新创建 pyPdf 对象，我会得到同样的结果。

我做错了什么？

Answer 1

问题是由于使用名称、文件作为变量造成的。您在第一个 for 循环中使用文件作为变量名。作为语句中的函数调用，targetPdf = PdfFileReader(file(item,'rb')).

尝试将第一个 for 循环中的变量名从 file 更改为 fileName。希望有帮助

如何让 pyPdf 与 os 或 glob 一起工作

How to get pyPdf to work with os or glob

python

parsing

pypdf