如何循环从 python-docx 获得的列表列表，其中每个列表都是一个 table 并将 table 写入单独的工作表

Question

我正在使用 python-docx 从文档中提取两个 table。我迭代了 tables 并创建了一个列表列表。每个单独的列表代表一个 table，其中每行都有字典。每个字典都包含一个键/值对。键是 table 中的列标题，值是该列该行数据的单元格内容。我在为每个 table 创建数据框并将每个 table 写在单独的 excel sheet.

上时遇到困难

from docx.api import Document
import pandas as pd
import csv
import json
import unicodedata

document = Document('Sampletable1.docx')
tables = document.tables
print (len(tables))
big_data = []


for table in document.tables:
    data = []
    Keys = None
    for i, row in enumerate(table.rows):
        text = (cell.text for cell in row.cells)
        if i == 0:
            keys = tuple(text)
            continue
        dic = dict(zip(keys, text))
        data.append(dic)
    big_data.append(data)
 print(big_data)

以上代码的输出为：

2

[[{'Asset': 'Growth investments', 'Target investment mix': '66.50%', 'Actual investment mix': '66.30%', 'Variance': '- 0.20%'}, {'Asset': 'Defensive investments', 'Target investment mix': '33.50%', 'Actual investment mix': '33.70%', 'Variance': '0.20%' }], [{'Owner': 'REST Super', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER TransPension', 'Product': 'TTR Pension', 'Type': 'New', 'Status': 'New', 'Customer 2': 'Customer 1'}, {'Owner': 'TWUSUPER', 'Product': 'Superannuation', 'Type': 'Existing', 'Status': 'Existing'}]]

如何访问以上列表？

此外，我尝试创建一个 pandas 数据框

#write the data into a data frame
for thing in big_data:
    #print(thing)
    df = pd.DataFrame(thing)
    print(df)
    writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1')
    writer.save()

我在 excel 上获得了第一个 table，但无法使用第二个 table。 我希望 table 位于同一个 excel 工作簿 (dftable3.xlsx) 中，但位于不同的工作簿中 sheets(Sheet1,Sheet2)

我附上了 tables 的图像。

提前致谢

Answer 1

How do I access the above lists??

您已经这样做了，遍历它们或打印它们。考虑使用漂亮打印库：

import pprint
pprint.pprint(big_data)

I am expecting ... different worksheets(Sheet1,Sheet2)

好吧，鉴于您提供的常量 'Sheet1' 参数，这不太可能。这是实现此目的的一种方法：

writer = pd.ExcelWriter('dftable3.xlsx', engine='xlsxwriter')
for i, thing in enumerate(big_data):
    df = pd.DataFrame(thing)
    df.to_excel(writer, sheet_name=f'Sheet{i}')
writer.save()

注意 writer 的范围——它必须比每个组成部分 df 的寿命更长。

如何循环从 python-docx 获得的列表列表，其中每个列表都是一个 table 并将 table 写入单独的工作表

How to loop lists of list obtained from python-docx where each list is a table and write the tables into a seperate worksheets

python-3.x

pandas

python-docx

pandas.excelwriter