从 Word 中提取表格到 Excel- 如何将表格分开?
Extracting tables from Word into Excel- how to keep the tables separate?
尝试将 .docx 中的表格(在列数和行数方面大小不同)导出到 Excel
我可以获得将表格转换为 excel 的文档,但它将两个表格连接在一起。
有没有办法让表格在相同的 sheet 或不同的工作 sheet 上分开(两者都可以)?
代码如下:
pip install python-docx
import pandas as pd
from docx import Document
path = (r"PATH\Practice_Tables.docx")
df = pd.DataFrame()
doc = Document(path)
for table in doc.tables:
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
0 1 2
0 Table header 1 Table header 2 Table header 3
1 r1c1 r1c2 r1c3
2 r2c1 r2c2 r2c3
3 Practice Table 2 Practice table col 2 NaN
4 Row 2 Row 2a NaN
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
更新
单个错误 sheet:
NameError Traceback (most recent call last)
<ipython-input-7-d2e9fa27f104> in <module>
9 print(df)
10
---> 11 df.to_excel(writer, sheet_name='Sheet1', startrow=startrow)
12 startrow += len(df)+2
13
NameError: name 'startrow' is not defined
您当前的代码是将所有 table 添加到一个数据框中。
您需要为每个 table 创建一个单独的数据框,将其写入 Excel 文件并继续。
分开sheets
这会将每个 table 写入 Excel 文件中的单独 sheet。
import pandas as pd
from docx import Document
path = 'Practice_Tables.docx'
sheet_no=0
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
doc = Document(path)
for table in doc.tables:
df = pd.DataFrame()
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
sheet_no += 1
df.to_excel(writer, sheet_name=f'Sheet{sheet_no}')
writer.save()
单身sheet
此代码会将 table 写入相同的 sheet,中间有行。
import pandas as pd
from docx import Document
path = 'Practice_Tables.docx'
startrow=0
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
doc = Document(path)
for table in doc.tables:
df = pd.DataFrame()
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
df.to_excel(writer, sheet_name='Sheet1', startrow=startrow)
startrow += len(df)+2
writer.save()
尝试将 .docx 中的表格(在列数和行数方面大小不同)导出到 Excel
我可以获得将表格转换为 excel 的文档,但它将两个表格连接在一起。 有没有办法让表格在相同的 sheet 或不同的工作 sheet 上分开(两者都可以)?
代码如下:
pip install python-docx
import pandas as pd
from docx import Document
path = (r"PATH\Practice_Tables.docx")
df = pd.DataFrame()
doc = Document(path)
for table in doc.tables:
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
0 1 2
0 Table header 1 Table header 2 Table header 3
1 r1c1 r1c2 r1c3
2 r2c1 r2c2 r2c3
3 Practice Table 2 Practice table col 2 NaN
4 Row 2 Row 2a NaN
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
更新
单个错误 sheet:
NameError Traceback (most recent call last)
<ipython-input-7-d2e9fa27f104> in <module>
9 print(df)
10
---> 11 df.to_excel(writer, sheet_name='Sheet1', startrow=startrow)
12 startrow += len(df)+2
13
NameError: name 'startrow' is not defined
您当前的代码是将所有 table 添加到一个数据框中。
您需要为每个 table 创建一个单独的数据框,将其写入 Excel 文件并继续。
分开sheets
这会将每个 table 写入 Excel 文件中的单独 sheet。
import pandas as pd
from docx import Document
path = 'Practice_Tables.docx'
sheet_no=0
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
doc = Document(path)
for table in doc.tables:
df = pd.DataFrame()
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
sheet_no += 1
df.to_excel(writer, sheet_name=f'Sheet{sheet_no}')
writer.save()
单身sheet
此代码会将 table 写入相同的 sheet,中间有行。
import pandas as pd
from docx import Document
path = 'Practice_Tables.docx'
startrow=0
writer = pd.ExcelWriter('PracticeTables3.xlsx', engine='xlsxwriter')
doc = Document(path)
for table in doc.tables:
df = pd.DataFrame()
for row in table.rows:
row_text = [c.text for c in row.cells]
df = df.append([row_text], ignore_index=True)
print(df)
df.to_excel(writer, sheet_name='Sheet1', startrow=startrow)
startrow += len(df)+2
writer.save()