Python：Groupby 条件在 pandas 数据框中？

Question

我有一个如下所示的数据框。

我需要做 groupby(country and product) 和 Value 列应该包含 count(id) where status is closed 和我需要 return 剩余的列。预期输出格式如下。

Sample input

id        status    ticket_time           product      country     last_load_time       metric_id   name
1260057   open      2021-10-04 01:20:00   Broadband    Grenada     2021-12-09 09:57:27  MTR013      repair
2998178   open      2021-10-02 00:00:00   Fixed Voice  Bahamas     2021-12-09 09:57:27  MTR013      repair
3762949   closed    2021-10-01 00:00:00   Fixed Voice  St Lucia    2021-12-09 09:57:27  MTR013      repair
3766608   closed    2021-10-04 00:00:00   Broadband    St Lucia    2021-12-09 09:57:27  MTR013      repair
3767125   closed    2021-10-04 00:00:00   TV           Antigua     2021-12-09 09:57:27  MTR013      repair
6050009   closed    2021-10-01 00:00:00   TV           Jamaica     2021-12-09 09:57:27  MTR013      repair
6050608   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6050972   open      2021-10-01 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6052253   closed    2021-10-02 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair
6053697   open      2021-10-03 00:00:00   Broadband    Jamaica     2021-12-09 09:57:27  MTR013      repair  

**EXPECTED OUTPUT FORMAT** SAMPLE

country  product    load_time          metric_id     name          ticket_time        Value(count(id)with status closed)
Antigua   TV      2021-12-09 09:57:27   MTR013     pending_repair   2021-10-01         1
....      ...     ....                  ...        ...              ...                2

我试过下面的代码：

df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)

但它 return 只有三列 国家、产品和价值。

我需要在上面的预期输出格式中提到的其余列。另外，我试过

df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')

df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)

输出

id    ticket_time    product    country     load_time          metric_id    name        Value
3762949 2021-10-01  Fixed Voice St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  23
3766608 2021-10-04  Broadband   St Lucia    2021-12-09 09:57:27 MTR013  pending_repair  87

第二个逻辑转换 return我不想要的 id 列。值列基于 count(id)，其中状态为关闭。我尝试了上述两种方法，但无法获得预期的输出。有什么办法可以解决这个问题吗？

Answer 1

当您分组时，通常是根据某个类别聚合数据，因此您不会保留所有单独的记录，而只会留下您分组的列 -按和聚合数据的列（计数、平均值等）。然而，转换函数会做你想做的事。我认为这就是您根据预期输出寻找的内容。

df_closed = df[df['status']=='closed']  # Filters data

df_closed = df_closed.reindex()  # Resets index

df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)

Python：Groupby 条件在 pandas 数据框中？

Python: Groupby with conditions in pandas dataframe?

python

dataframe

python-3.x

pandas

pandas-groupby