分组依据 - Python / Pandas 中的总时数和按类别分类的时数

Group By - Total Hours and Hours by Category in Python / Pandas

我需要使用 Python / Pandas [按状态 计算每周 总小时数 小时数 分组依据.

Id             Week          Status Hours

1   01/10/2022 - 01/16/2022    On     5
2   01/10/2022 - 01/16/2022    Off    2
3   01/17/2022 - 01/23/2022    Off    6
4   01/17/2022 - 01/23/2022    On     1
5   01/17/2022 - 01/23/2022    On     5
6   01/03/2022 - 01/09/2022    On     10
7   01/10/2022 - 01/16/2022    Off    9
8   01/03/2022 - 01/09/2022    On     3
9   01/24/2022 - 01/30/2022    Off    4
10  01/24/2022 - 01/30/2022    On     7
test_data = {'Id': [1,2,3,4,5,6,7,8,9,10], 
             'Week': ['01/10/2022 - 01/16/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/03/2022 - 01/09/2022', '01/24/2022 - 01/30/2022', '01/24/2022 - 01/30/2022'], 
             'Status': ['On', 'Off', 'Off', 'On', 'On', 'On', 'Off', 'On', 'Off', 'On'], 
             'Hours': [5,2,6,1,5,10,9,3,4,7]}

test_df = pd.DataFrame(data=test_data)

我每周可以获得总小时数:

test_df.groupby(by=['Week'], as_index=False).agg({"Hours": "sum"})

但我不知道如何也按状态分组,所以它将是 2 个额外的列(On Status Hours and Off Status Hours)

如果我只将 Status 列添加到 groupby 部分,它会创建额外的行(我明白为什么)

test_df.groupby(by=['Week', 'Status'], as_index=False).agg({"Hours": "sum"})

我想要的输出:

Week Total Hours On Status Hours Off Status Hours
01/03/2022 - 01/09/2022 13 13 0
01/10/2022 - 01/16/2022 16 5 11
01/17/2022 - 01/23/2022 12 6 6
01/24/2022 - 01/30/2022 11 7 4

您可以使用 pd.pivot_table 得到您的结果:

x = pd.pivot_table(
    test_df,
    index="Week",
    columns="Status",
    values="Hours",
    aggfunc="sum",
    fill_value=0,
).add_suffix(" Status Hours")
x["Total Hours"] = x.sum(axis=1)
print(x)

打印:

Status                   Off Status Hours  On Status Hours  Total Hours
Week                                                                   
01/03/2022 - 01/09/2022                 0               13           13
01/10/2022 - 01/16/2022                11                5           16
01/17/2022 - 01/23/2022                 6                6           12
01/24/2022 - 01/30/2022                 4                7           11

您可以使用:

(test_df
 .groupby(['Week', 'Status'])['Hours']
 .sum()
 .unstack(1, fill_value=0)
 .add_suffix(' Status Hours')
 .assign(**{'Total Hours': lambda d: d.sum(1)})
 )

输出:

Status                   Off Status Hours  On Status Hours  Total Hours
Week                                                                   
01/03/2022 - 01/09/2022                 0               13           13
01/10/2022 - 01/16/2022                11                5           16
01/17/2022 - 01/23/2022                 6                6           12
01/24/2022 - 01/30/2022                 4                7           11