从 Python 中的目录创建与其 pdf 文件名相对应的多个文本文件

Question

我刚刚开始尝试练习 python 文件转换。请帮我解决这个问题。

我正在尝试将 .PDF 文件转换为 .TXT 文件，我可以使用以下代码将其用于单个文件：

import pdfplumber

pdfPath = r'C:\Users\xyz\pdffiles\abc.pdf'

txtPath = r'C:\Users\xyz\txtfiles\abc.txt'

with pdfplumber.open(pdfPath) as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        with open( txtPath, encoding='utf-8', mode='a') as f:
            f.write(text)    
print("Operation Success!")

以上代码有效。但是，我希望自动处理我的“..\pdffiles”目录中的所有多个 pdf 文件，并在“..\txtfiles”目录中使用 SAME NAME 创建相应的文本文件作为使用循环的 PDF 副本。有人可以帮我完成代码吗？

非常感谢任何建议！！美好的一天!!

Answer 1

您可以使用 os 库中的 listdir 函数

https://docs.python.org/3/library/os.html#os.listdir

使用包含您的 pdf 文件 (pdffiles) 的文件夹的路径调用此函数。这将 return 该文件夹中所有 pdf 文件的列表。

遍历该列表，从文件中删除 pdf 扩展名，并将其用于 txt 文件名。

例如

import os

folder_pdf = os.listdir('C:\Users\xyz\pdffiles')
for file in folder_pdf:
    name, ext = file.split('.')
    txt_path = f'C:\Users\xyz\txtfiles\{name}.txt'
    pdf_path = f'C:\Users\xyz\pdffiles\{file}'
    # Code to read pdf and write to text file

Answer 2

import os
import pdfplumber

path_to_your_files = "/path/to/your/pdffiles"
for filename in os.listdir(path_to_your_files):
    
    absolute_file_path = os.path.join(path_to_your_files, filename)
    with pdfplumber.open(absolute_file_path) as pdf:
        for page in pdf.pages:
            text = page.extract_text()
            with open(
                    os.path.splitext(absolute_file_path)[0] + ".txt", encoding="utf-8", mode="a"
            ) as f:
                f.write(text)
    print("Operation Success!")

从 Python 中的目录创建与其 pdf 文件名相对应的多个文本文件

Create multiple text files corresponding to its pdf file names from directory in Python

python

automation

loops

file-conversion

pdftotext