使用 python 使用 XLSXWRITER 计算 header 中具有相似名称的值

Calculate the value which has the similar name in the header using - XLSWRITER using python

我需要计算 header 中具有相似名称的值。这里我在列中有 BillNon Bill 字段。我需要分别计算所有账单和 non-bill 并且需要在另一列中 sum Bill Amt Total非账单金额总计 在 python.

中使用 xlswrriter

输入文件

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS
Alex    |   30          |                   |      10    |
Ram     |   10          |          20       |            |
Stephen |               |                   |      20    |
Robert  |               |          10       |            |      10
Mohan   |               |          20       |      10    |

输出文件:

Name    | Bill - Php    | Non Bill - Php    | Bill - JS  | Non Bill -JS | Bill Total Amt | Non Bill Total Amt
Alex    |   30          |                   |      10    |              |    40          |       
Ram     |   10          |          20       |            |              |    10          |   20
Stephen |               |                   |      20    |              |                |   20
Robert  |               |          10       |            |      10      |                |   20
Mohan   |               |          20       |      10    |              |     10         |   20

只需按字符串前缀(str.startswith()) select 列,然后按df[selected_columns].sum(axis=1).

进行水平求和

此外,Pandas自带Excelsave/load能力,所以你真的不需要导入xlsxwriteropenpyxl

数据

数据以Excel格式转载。

import pandas as pd
import io
import numpy as np

df = pd.read_csv(io.StringIO("""
Name    | Bill - Php  | Non Bill - Php  | Bill - JS  | Non Bill -JS
Alex    |   30        |                 |      10    |
Ram     |   10        |          20     |            |
Stephen |             |                 |      20    |
Robert  |             |          10     |            |      10
Mohan   |             |          20     |      10    |
"""), sep=r"\|\s*", engine='python')

# cleanup
df.columns = [c.strip() for c in df.columns]
df["Name"] = df["Name"].str.strip()

# save .xlsx
df.to_excel("/mnt/ramdisk/data.xlsx", index=False)

解决方案

# load .xlsx
df = pd.read_excel("/mnt/ramdisk/data.xlsx")

for prefix in ("Bill", "Non Bill"):
    # select the columns to be summed
    cols_to_sum = [c for c in df.columns if c.startswith(prefix)]
    # new column name
    col = f"{prefix} Amt Total"
    # sum the selected columns horizontally
    df[col] = df[cols_to_sum].sum(axis=1)
    # (optional) replace 0 with nan
    df[col] = df[col].replace({0.0: np.nan})

# save a new file
df.to_excel("/mnt/ramdisk/out.xlsx", index=False)

查看添加的列:

print(df.iloc[:,-2:])
# Out[219]: 
#    Bill Amt Total  Non Bill Amt Total
# 0            40.0                 NaN
# 1            10.0                20.0
# 2            20.0                 NaN
# 3             NaN                20.0
# 4            10.0                20.0