使用 python 使用 XLSXWRITER 计算 header 中具有相似名称的值
Calculate the value which has the similar name in the header using - XLSWRITER using python
我需要计算 header 中具有相似名称的值。这里我在列中有 Bill 和 Non Bill 字段。我需要分别计算所有账单和 non-bill 并且需要在另一列中 sum Bill Amt Total 和 非账单金额总计 在 python.
中使用 xlswrriter
输入文件
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS
Alex | 30 | | 10 |
Ram | 10 | 20 | |
Stephen | | | 20 |
Robert | | 10 | | 10
Mohan | | 20 | 10 |
输出文件:
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex | 30 | | 10 | | 40 |
Ram | 10 | 20 | | | 10 | 20
Stephen | | | 20 | | | 20
Robert | | 10 | | 10 | | 20
Mohan | | 20 | 10 | | 10 | 20
只需按字符串前缀(str.startswith()
) select 列,然后按df[selected_columns].sum(axis=1)
.
进行水平求和
此外,Pandas
自带Excelsave/load能力,所以你真的不需要导入xlsxwriter
或openpyxl
。
数据
数据以Excel格式转载。
import pandas as pd
import io
import numpy as np
df = pd.read_csv(io.StringIO("""
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS
Alex | 30 | | 10 |
Ram | 10 | 20 | |
Stephen | | | 20 |
Robert | | 10 | | 10
Mohan | | 20 | 10 |
"""), sep=r"\|\s*", engine='python')
# cleanup
df.columns = [c.strip() for c in df.columns]
df["Name"] = df["Name"].str.strip()
# save .xlsx
df.to_excel("/mnt/ramdisk/data.xlsx", index=False)
解决方案
# load .xlsx
df = pd.read_excel("/mnt/ramdisk/data.xlsx")
for prefix in ("Bill", "Non Bill"):
# select the columns to be summed
cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
# new column name
col = f"{prefix} Amt Total"
# sum the selected columns horizontally
df[col] = df[cols_to_sum].sum(axis=1)
# (optional) replace 0 with nan
df[col] = df[col].replace({0.0: np.nan})
# save a new file
df.to_excel("/mnt/ramdisk/out.xlsx", index=False)
查看添加的列:
print(df.iloc[:,-2:])
# Out[219]:
# Bill Amt Total Non Bill Amt Total
# 0 40.0 NaN
# 1 10.0 20.0
# 2 20.0 NaN
# 3 NaN 20.0
# 4 10.0 20.0
我需要计算 header 中具有相似名称的值。这里我在列中有 Bill 和 Non Bill 字段。我需要分别计算所有账单和 non-bill 并且需要在另一列中 sum Bill Amt Total 和 非账单金额总计 在 python.
中使用 xlswrriter输入文件
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS
Alex | 30 | | 10 |
Ram | 10 | 20 | |
Stephen | | | 20 |
Robert | | 10 | | 10
Mohan | | 20 | 10 |
输出文件:
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex | 30 | | 10 | | 40 |
Ram | 10 | 20 | | | 10 | 20
Stephen | | | 20 | | | 20
Robert | | 10 | | 10 | | 20
Mohan | | 20 | 10 | | 10 | 20
只需按字符串前缀(str.startswith()
) select 列,然后按df[selected_columns].sum(axis=1)
.
此外,Pandas
自带Excelsave/load能力,所以你真的不需要导入xlsxwriter
或openpyxl
。
数据
数据以Excel格式转载。
import pandas as pd
import io
import numpy as np
df = pd.read_csv(io.StringIO("""
Name | Bill - Php | Non Bill - Php | Bill - JS | Non Bill -JS
Alex | 30 | | 10 |
Ram | 10 | 20 | |
Stephen | | | 20 |
Robert | | 10 | | 10
Mohan | | 20 | 10 |
"""), sep=r"\|\s*", engine='python')
# cleanup
df.columns = [c.strip() for c in df.columns]
df["Name"] = df["Name"].str.strip()
# save .xlsx
df.to_excel("/mnt/ramdisk/data.xlsx", index=False)
解决方案
# load .xlsx
df = pd.read_excel("/mnt/ramdisk/data.xlsx")
for prefix in ("Bill", "Non Bill"):
# select the columns to be summed
cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
# new column name
col = f"{prefix} Amt Total"
# sum the selected columns horizontally
df[col] = df[cols_to_sum].sum(axis=1)
# (optional) replace 0 with nan
df[col] = df[col].replace({0.0: np.nan})
# save a new file
df.to_excel("/mnt/ramdisk/out.xlsx", index=False)
查看添加的列:
print(df.iloc[:,-2:])
# Out[219]:
# Bill Amt Total Non Bill Amt Total
# 0 40.0 NaN
# 1 10.0 20.0
# 2 20.0 NaN
# 3 NaN 20.0
# 4 10.0 20.0