有没有办法关闭 PdfFileReader 打开的文件?
Is there a way to close the file PdfFileReader opens?
我打开了很多 PDF,我想在解析后删除这些 PDF,但文件在程序完成之前一直保持打开状态 运行。如何关闭使用 PyPDF2 打开的 PDF?
代码:
def getPDFContent(path):
content = ""
# Load PDF into pyPDF
pdf = PyPDF2.PdfFileReader(file(path, "rb"))
#Check for number of pages, prevents out of bounds errors
max = 0
if pdf.numPages > 3:
max = 3
else:
max = (pdf.numPages - 1)
# Iterate pages
for i in range(0, max):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
#pdf.close()
return content
执行此操作时:
pdf = PyPDF2.PdfFileReader(file(path, "rb"))
您正在传递对句柄的引用,但您无法控制文件何时关闭。
您应该使用句柄创建上下文,而不是从此处匿名传递它:
我会写
with open(path,"rb") as f:
pdf = PyPDF2.PdfFileReader(f)
#Check for number of pages, prevents out of bounds errors
... do your processing
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
# now the file is closed by exiting the block, you can delete it
os.remove(path)
# and return the contents
return content
是的,您正在将流传递给 PdfFileReader,您可以关闭它。 with
语法更适合您:
def getPDFContent(path):
with open(path, "rb") as f:
content = ""
# Load PDF into pyPDF
pdf = PyPDF2.PdfFileReader(f)
#Check for number of pages, prevents out of bounds errors
max = 0
if pdf.numPages > 3:
max = 3
else:
max = (pdf.numPages - 1)
# Iterate pages
for i in range(0, max):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
return content
只需自己打开和关闭文件
f = open(path, "rb")
pdf = PyPDF2.PdfFileReader(f)
f.close()
PyPDF2 .read()
是您传入的流,就在构造函数中。所以在最初的对象构建之后,你就可以扔文件了。
上下文管理器也可以工作:
with open(path, "rb") as f:
pdf = PyPDF2.PdfFileReader(f)
do_other_stuff_with_pdf(pdf)
我打开了很多 PDF,我想在解析后删除这些 PDF,但文件在程序完成之前一直保持打开状态 运行。如何关闭使用 PyPDF2 打开的 PDF?
代码:
def getPDFContent(path):
content = ""
# Load PDF into pyPDF
pdf = PyPDF2.PdfFileReader(file(path, "rb"))
#Check for number of pages, prevents out of bounds errors
max = 0
if pdf.numPages > 3:
max = 3
else:
max = (pdf.numPages - 1)
# Iterate pages
for i in range(0, max):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
#pdf.close()
return content
执行此操作时:
pdf = PyPDF2.PdfFileReader(file(path, "rb"))
您正在传递对句柄的引用,但您无法控制文件何时关闭。
您应该使用句柄创建上下文,而不是从此处匿名传递它:
我会写
with open(path,"rb") as f:
pdf = PyPDF2.PdfFileReader(f)
#Check for number of pages, prevents out of bounds errors
... do your processing
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
# now the file is closed by exiting the block, you can delete it
os.remove(path)
# and return the contents
return content
是的,您正在将流传递给 PdfFileReader,您可以关闭它。 with
语法更适合您:
def getPDFContent(path):
with open(path, "rb") as f:
content = ""
# Load PDF into pyPDF
pdf = PyPDF2.PdfFileReader(f)
#Check for number of pages, prevents out of bounds errors
max = 0
if pdf.numPages > 3:
max = 3
else:
max = (pdf.numPages - 1)
# Iterate pages
for i in range(0, max):
# Extract text from page and add to content
content += pdf.getPage(i).extractText() + "\n"
# Collapse whitespace
content = " ".join(content.replace(u"\xa0", " ").strip().split())
return content
只需自己打开和关闭文件
f = open(path, "rb")
pdf = PyPDF2.PdfFileReader(f)
f.close()
PyPDF2 .read()
是您传入的流,就在构造函数中。所以在最初的对象构建之后,你就可以扔文件了。
上下文管理器也可以工作:
with open(path, "rb") as f:
pdf = PyPDF2.PdfFileReader(f)
do_other_stuff_with_pdf(pdf)