如何使用 pdfplumber 提取的文本打印 Python 中的下一行

Question

如何使用从 PDF 中提取的文本打印下一行 pdfPlumber extract.text 函数?

我试过 line.next() 但它不起作用。

实际的工作名称在“工作名称”之后的行中。按照下面的示例。

职位名称

奥尔巴尼购物中心开发

我的代码如下。

jobName_re = re.compile(r'(Job Name)')
siteAddress_re = re.compile(r'(Wellington\s)(.+)')
file = 'invoices.pdf'

lines = []

with pdfplumber.open(file) as myPdf:
    for page in myPdf.pages:
        text = page.extract_text()
        for line in text.split('\n'):
            jobName = jobName_re.search(line)
            siteAddress = siteAddress_re.search(line)
            if jobName:
                print('The next line that follows Job Name is', line.next())
            elif siteAddress:
                print(siteAddress.group(1))

Answer 1

您有多种选择。

选项 1

您可以切换到使用整数索引来遍历记录：

lines = text.split('\n')
for i in range(len(lines)):
    line = lines[i]

然后就可以访问lines[i+1].

选项 2

设置一个标志，表明您已经看到作业名称的标题，然后在下次循环时选择它。像这样：

        last_was_job_heading = False
        for line in text.split('\n'):
            siteAddress = siteAddress_re.search(line)
            if last_was_job_heading:
                print('The next line that follows Job Name is', line)
            elif siteAddress:
                print(siteAddress.group(1))
            last_was_job_heading = jobName_re.search(line)

选项 3

根本不要将文本拆分成行。而是使用更智能的正则表达式一次解析多行。

选项 4

使用某种解析库代替正则表达式。在这个简单的案例中，这可能有点矫枉过正。

如何使用 pdfplumber 提取的文本打印 Python 中的下一行

How to print the next line in Python with text extracted using pdfplumber

python

pdfplumber

选项 1

选项 2

选项 3

选项 4