Pandas:如何根据 pandas 数据帧中的条件进行分组?
Pandas: How to do groupby with conditions in pandas dataframe?
我应该做 groupby(country and product) 和 Value column 应该包含 count(id) where status已关闭,我需要return所有剩余的列。
Sample input
id status ticket_time product country name
126 open 2021-10-04 01:20:00 Broad A metric
299 open 2021-10-02 00:00:00 Fixed B metric
376 closed 2021-10-01 00:00:00 Fixed C metric
370 closed 2021-10-04 00:00:00 Broad C metric
372 closed 2021-10-04 00:00:00 TV D metric
605 closed 2021-10-01 00:00:00 TV D metric
输出格式示例
country product name ticket_time Value(count(id)where status closed)
D TV metric YYYY-MM-DD HH:MM:SS 2
C Broad metric YYYY-MM-DD HH:MM:SS 1
C Fixed metric YYYY-MM-DD HH:MM:SS 1
.... ... .... ... ...
我试过下面的代码:
df1 = df[df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status',axis=1).drop(['id'], axis = 1)
有没有更好的方法来解决这个问题?
不要使用 groupby
+transform
但 groupby
+agg
:
(df.loc[df['status'].eq('closed')]
.groupby(['country', 'product'], as_index=False)
.agg({'name': 'first', 'ticket_time': 'first', 'status': 'size'})
.rename(columns={'status': 'Value count(size)'})
)
输出:
country product name ticket_time Value count(size)
0 C Broad metric 2021-10-04 00:00:00 1
1 C Fixed metric 2021-10-01 00:00:00 1
2 D TV metric 2021-10-04 00:00:00 2
我应该做 groupby(country and product) 和 Value column 应该包含 count(id) where status已关闭,我需要return所有剩余的列。
Sample input
id status ticket_time product country name
126 open 2021-10-04 01:20:00 Broad A metric
299 open 2021-10-02 00:00:00 Fixed B metric
376 closed 2021-10-01 00:00:00 Fixed C metric
370 closed 2021-10-04 00:00:00 Broad C metric
372 closed 2021-10-04 00:00:00 TV D metric
605 closed 2021-10-01 00:00:00 TV D metric
输出格式示例
country product name ticket_time Value(count(id)where status closed)
D TV metric YYYY-MM-DD HH:MM:SS 2
C Broad metric YYYY-MM-DD HH:MM:SS 1
C Fixed metric YYYY-MM-DD HH:MM:SS 1
.... ... .... ... ...
我试过下面的代码:
df1 = df[df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status',axis=1).drop(['id'], axis = 1)
有没有更好的方法来解决这个问题?
不要使用 groupby
+transform
但 groupby
+agg
:
(df.loc[df['status'].eq('closed')]
.groupby(['country', 'product'], as_index=False)
.agg({'name': 'first', 'ticket_time': 'first', 'status': 'size'})
.rename(columns={'status': 'Value count(size)'})
)
输出:
country product name ticket_time Value count(size)
0 C Broad metric 2021-10-04 00:00:00 1
1 C Fixed metric 2021-10-01 00:00:00 1
2 D TV metric 2021-10-04 00:00:00 2