PyPDF2 脚本来拆分文件夹中的每一页 pdf
PyPDF2 Script to split each page of pdf in folder
我有一个脚本可以将 pdf 拆分成单独的 pdf 文件。这是完美的。我正在尝试重新编写脚本,这样我就可以创建一个 SPLITPDF_Folder 并将该脚本放入该文件夹中。这样,将来我可以将我想要拆分的任何 pdf 放到该文件夹和 运行 脚本中。
每次需要拆分不同的 pdf 时,我编写的当前脚本都需要修改。
'''
Split each page of PDF
from PyPDF2 import PdfFileWriter, PdfFileReader
#import required pdf files
inputpdf = PdfFileReader(open("filename.pdf" , "rb"))
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output.write(outputStream)
#print out a success statement
print("Your PDF has been split")
我正在尝试重写脚本,以便可以将任何 pdf 文件放入文件夹和 运行 脚本,但我不知道如何将它指向 os.listdir()或者如何将其实现到代码中。以下是我有但没有用的。
##Split each page of PDF
#import required modules
from PyPDF2 import PdfFileWriter, PdfFileReader
#import pdf file in the folder
inputpdf = PdfFileWriter()
for filename in os.listdir('.'):
if filename.endswith('pdf'):
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output.write(outputStream)
#print out a success statement
print("Your PDF has been split")
我认为您应该尝试 glob
库,它已经处理了完整路径。由于拆分 pdf 是一项重复性任务,我创建了一个名为 splitPDF
的函数来打开 file_name
并拆分 pdf。请检查此解决方案:
##Split each page of PDF
#import required modules
import os
from glob import glob
from PyPDF2 import PdfFileWriter, PdfFileReader
# Functions
def splitPDF(file_name,output_dir=None):
inputpdf = PdfFileReader(open(file_name , "rb"))
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
# with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output_name = f"{os.path.basename(file_name.replace('.pdf',''))}{i+1}.pdf"
if output_dir is None:
output_name = os.path.join(output_dir, output_name)
with open(output_name,'wb') as outputStream:
output.write(outputStream)
print(f"PDF: {file_name} has been splitted")
# Loop over all the files in base_dir
base_dir = "<the directory where you want to find pdfs>"
output_dir = "<the directory where you want to store your splitted pdfs>"
pdfs = os.path.join(base_dir,"*.pdf")
files = glob(pdfs)
# Split each file found in base_dir
for file in files:
splitPDF(file,output_dir)
# splitPDF(file) use this to place the files on the same dir.
此外,我更改了 output_name
以便它可以在需要时与原始文件的名称相匹配。
编辑:
我添加了变量 output_dir
这样您就可以将文件放在不同的文件夹中。默认情况下 output_dir = None
的值,因此如果您不将此值解析为函数,它将把文件放在与 base_dir
.
相同的目录中
我有一个脚本可以将 pdf 拆分成单独的 pdf 文件。这是完美的。我正在尝试重新编写脚本,这样我就可以创建一个 SPLITPDF_Folder 并将该脚本放入该文件夹中。这样,将来我可以将我想要拆分的任何 pdf 放到该文件夹和 运行 脚本中。
每次需要拆分不同的 pdf 时,我编写的当前脚本都需要修改。 '''
Split each page of PDF
from PyPDF2 import PdfFileWriter, PdfFileReader
#import required pdf files
inputpdf = PdfFileReader(open("filename.pdf" , "rb"))
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output.write(outputStream)
#print out a success statement
print("Your PDF has been split")
我正在尝试重写脚本,以便可以将任何 pdf 文件放入文件夹和 运行 脚本,但我不知道如何将它指向 os.listdir()或者如何将其实现到代码中。以下是我有但没有用的。
##Split each page of PDF
#import required modules
from PyPDF2 import PdfFileWriter, PdfFileReader
#import pdf file in the folder
inputpdf = PdfFileWriter()
for filename in os.listdir('.'):
if filename.endswith('pdf'):
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output.write(outputStream)
#print out a success statement
print("Your PDF has been split")
我认为您应该尝试 glob
库,它已经处理了完整路径。由于拆分 pdf 是一项重复性任务,我创建了一个名为 splitPDF
的函数来打开 file_name
并拆分 pdf。请检查此解决方案:
##Split each page of PDF
#import required modules
import os
from glob import glob
from PyPDF2 import PdfFileWriter, PdfFileReader
# Functions
def splitPDF(file_name,output_dir=None):
inputpdf = PdfFileReader(open(file_name , "rb"))
num_pages = inputpdf.numPages
#loop through all pages
for i in range(num_pages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
# with open(f"document-page{i+1}.pdf", 'wb') as outputStream:
output_name = f"{os.path.basename(file_name.replace('.pdf',''))}{i+1}.pdf"
if output_dir is None:
output_name = os.path.join(output_dir, output_name)
with open(output_name,'wb') as outputStream:
output.write(outputStream)
print(f"PDF: {file_name} has been splitted")
# Loop over all the files in base_dir
base_dir = "<the directory where you want to find pdfs>"
output_dir = "<the directory where you want to store your splitted pdfs>"
pdfs = os.path.join(base_dir,"*.pdf")
files = glob(pdfs)
# Split each file found in base_dir
for file in files:
splitPDF(file,output_dir)
# splitPDF(file) use this to place the files on the same dir.
此外,我更改了 output_name
以便它可以在需要时与原始文件的名称相匹配。
编辑:
我添加了变量 output_dir
这样您就可以将文件放在不同的文件夹中。默认情况下 output_dir = None
的值,因此如果您不将此值解析为函数,它将把文件放在与 base_dir
.