Pandas:如何根据 pandas 数据帧中的条件进行分组?

Pandas: How to do groupby with conditions in pandas dataframe?

我应该做 groupby(country and product)Value column 应该包含 count(id) where status已关闭,我需要return所有剩余的列。

Sample input

id        status    ticket_time           product      country    name
126       open      2021-10-04 01:20:00   Broad        A          metric
299       open      2021-10-02 00:00:00   Fixed        B          metric
376       closed    2021-10-01 00:00:00   Fixed        C          metric
370       closed    2021-10-04 00:00:00   Broad        C          metric
372       closed    2021-10-04 00:00:00   TV           D          metric
605       closed    2021-10-01 00:00:00   TV           D          metric

输出格式示例

country  product   name      ticket_time            Value(count(id)where status closed)
D        TV        metric    YYYY-MM-DD HH:MM:SS          2
C        Broad     metric    YYYY-MM-DD HH:MM:SS          1
C        Fixed     metric    YYYY-MM-DD HH:MM:SS          1
....      ...      ....                  ...               ...       

我试过下面的代码:


df1 = df[df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')

df = df1.drop_duplicates(['country', 'product']).drop('status',axis=1).drop(['id'], axis = 1)

有没有更好的方法来解决这个问题?

不要使用 groupby+transformgroupby+agg:

(df.loc[df['status'].eq('closed')]
   .groupby(['country', 'product'], as_index=False)
   .agg({'name': 'first', 'ticket_time': 'first', 'status': 'size'})
   .rename(columns={'status': 'Value count(size)'})
)

输出:

  country product    name          ticket_time  Value count(size)
0       C   Broad  metric  2021-10-04 00:00:00                  1
1       C   Fixed  metric  2021-10-01 00:00:00                  1
2       D      TV  metric  2021-10-04 00:00:00                  2