根据值合并行(pandas 到 excel - xlsxwriter)
Merge rows based on value (pandas to excel - xlsxwriter)
我正在尝试使用 xlsxwriter 将 Pandas 数据帧输出到 excel 文件中。但是我正在尝试应用一些基于规则的格式;专门尝试合并具有相同值的单元格,但无法想出如何编写循环。 (这里是 Python 的新手!)
请参阅下面的输出与预期输出:
(如您所见,根据上图,当它们具有相同的值时,我试图合并“名称”列下的单元格)。
这是我目前的情况:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
非常感谢任何帮助!
谢谢!
您的逻辑几乎是正确的,但是我通过稍微不同的方法解决了您的问题:
1) 对列进行排序,确保所有值都分组在一起。
2) 重置索引(使用 reset_index() 并可能传递 arg drop=True)。
3) 然后我们必须捕获值是新的行。为此,创建一个列表并添加第一行 1,因为我们肯定会从那里开始。
4) 然后开始遍历该列表的行并检查一些条件:
4a) 如果只有一行有一个值,merge_range 方法会报错,因为它不能合并一个单元格。在这种情况下,我们需要用 write 方法替换 merge_range。
4b) 使用此算法,您将在尝试写入列表的最后一个值时遇到索引错误(因为它正在将它与下一个索引位置中的值进行比较,并且因为它是列表中没有下一个索引位置)。所以我们需要特别提到,如果我们得到一个索引错误(这意味着我们正在检查最后一个值)我们想要合并或写入直到数据帧的最后一行。
4c) 最后我没有考虑该列是否包含空白或空单元格。在这种情况下需要调整代码。
最后的代码可能看起来有点混乱,你必须记住 pandas 的第一行是 0 索引(headers 是分开的)而 xlsxwriter headers 是0 索引,第一行索引 1.
这里是一个工作示例,可以准确实现您想要做的事情:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
输出:
替代方法:
可以使用 unique() 函数查找分配给每个唯一值(本例中的汽车名称)的索引。使用以上测试数据,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()
我正在尝试使用 xlsxwriter 将 Pandas 数据帧输出到 excel 文件中。但是我正在尝试应用一些基于规则的格式;专门尝试合并具有相同值的单元格,但无法想出如何编写循环。 (这里是 Python 的新手!)
请参阅下面的输出与预期输出:
(如您所见,根据上图,当它们具有相同的值时,我试图合并“名称”列下的单元格)。
这是我目前的情况:
#This is the logic you use to merge cells in xlsxwriter (just an example)
worksheet.merge_range('A3:A4','value you want in merged cells', merge_format)
#Merge Car type Loop thought process...
#1.Loop through data frame where row n Name = row n -1 Name
#2.Get the length of the rows that have the same Name
#3.Based off the length run the merge_range function from xlsxwriter, worksheet.merge_range('range_found_from_loop','Name', merge_format)
for row_index in range(1,len(car_report)):
if car_report.loc[row_index, 'Name'] == car_report.loc[row_index-1, 'Name']
#find starting point based off index, then get range by adding number of rows to starting point. for example lets say rows 0-2 are similar I would get 'A0:A2' which I can then put in the code below
#from there apply worksheet.merge_range('A0:A2','[input value]', merge_format)
非常感谢任何帮助!
谢谢!
您的逻辑几乎是正确的,但是我通过稍微不同的方法解决了您的问题:
1) 对列进行排序,确保所有值都分组在一起。
2) 重置索引(使用 reset_index() 并可能传递 arg drop=True)。
3) 然后我们必须捕获值是新的行。为此,创建一个列表并添加第一行 1,因为我们肯定会从那里开始。
4) 然后开始遍历该列表的行并检查一些条件:
4a) 如果只有一行有一个值,merge_range 方法会报错,因为它不能合并一个单元格。在这种情况下,我们需要用 write 方法替换 merge_range。
4b) 使用此算法,您将在尝试写入列表的最后一个值时遇到索引错误(因为它正在将它与下一个索引位置中的值进行比较,并且因为它是列表中没有下一个索引位置)。所以我们需要特别提到,如果我们得到一个索引错误(这意味着我们正在检查最后一个值)我们想要合并或写入直到数据帧的最后一行。
4c) 最后我没有考虑该列是否包含空白或空单元格。在这种情况下需要调整代码。
最后的代码可能看起来有点混乱,你必须记住 pandas 的第一行是 0 索引(headers 是分开的)而 xlsxwriter headers 是0 索引,第一行索引 1.
这里是一个工作示例,可以准确实现您想要做的事情:
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
# Create the list where we 'll capture the cells that appear for 1st time,
# add the 1st row and we start checking from 2nd row until end of df
startCells = [1]
for row in range(2,len(df)+1):
if (df.loc[row-1,'Name'] != df.loc[row-2,'Name']):
startCells.append(row)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
lastRow = len(df)
for row in startCells:
try:
endRow = startCells[startCells.index(row)+1]-1
if row == endRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, endRow, 0, df.loc[row-1,'Name'], merge_format)
except IndexError:
if row == lastRow:
worksheet.write(row, 0, df.loc[row-1,'Name'], merge_format)
else:
worksheet.merge_range(row, 0, lastRow, 0, df.loc[row-1,'Name'], merge_format)
writer.save()
输出:
替代方法: 可以使用 unique() 函数查找分配给每个唯一值(本例中的汽车名称)的索引。使用以上测试数据,
import pandas as pd
# Create a test df
df = pd.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})
for car in df['Name'].unique():
# find indices and add one to account for header
u=df.loc[df['Name']==car].index.values + 1
if len(u) <2:
pass # do not merge cells if there is only one car name
else:
# merge cells using the first and last indices
worksheet.merge_range(u[0], 0, u[-1], 0, df.loc[u[0],'Name'], merge_format)
writer.save()