如何使用 python 将所有表格从 pdf 文件存储到 excel sheet？

Question

1.I 我能够获取 PDF 文件的所有 table。但是当我想存储所有 table 时，只有最后一个 table 被保存在我的 excel sheet.

2.How 来处理这些被覆盖的值。

3.for循环最后table会保存在excel

import PyPDF2
import tabula
from tabula import read_pdf
import pandas as pd 
from xlwt import Workbook 



pdfFileObj = open('LAB.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
print(pdfReader.numPages)          #Total number of pages 
pageObj = pdfReader.getPage(5)



#LAB is my pdf file
x = tabula.read_pdf("LAB.pdf", pages='all', multiple_tables=True)
for i in x:    #x values in list []
    print("printing all the table from the sheet", i)
    df = pd.DataFrame(i)
df.to_excel('tables.xlsx', header=True, index = True)

Answer 1

您可以将 pandas 数据帧附加到单个数据帧

df = pd.DataFrame()
for i in x:    #x values in list []
    print("printing all the table from the sheet", i)
    df_table = pd.DataFrame(i)
    df = df.append(df_table)

df.to_excel('tables.xlsx', header=True, index = True)

为了将其存储在单独的 excel 中，您需要运行 df.to_excel() 在 for 循环

下

for i in range(len(x)):    #x values in list []
    print("printing all the table from the sheet", x[i])
    df = pd.DataFrame(x[i])
    df.to_excel('tables{}.xlsx'.format(i), header=True, index = True)

如何使用 python 将所有表格从 pdf 文件存储到 excel sheet？

How to store all the tables from pdf file to excel sheet using python?

python

excel

tabular

pandas