Python:Groupby 条件在 pandas 数据框中?
Python: Groupby with conditions in pandas dataframe?
我有一个如下所示的数据框。
我需要做 groupby(country and product) 和 Value 列应该包含 count(id) where status is closed 和我需要 return 剩余的列。预期输出格式如下。
Sample input
id status ticket_time product country last_load_time metric_id name
1260057 open 2021-10-04 01:20:00 Broadband Grenada 2021-12-09 09:57:27 MTR013 repair
2998178 open 2021-10-02 00:00:00 Fixed Voice Bahamas 2021-12-09 09:57:27 MTR013 repair
3762949 closed 2021-10-01 00:00:00 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 repair
3766608 closed 2021-10-04 00:00:00 Broadband St Lucia 2021-12-09 09:57:27 MTR013 repair
3767125 closed 2021-10-04 00:00:00 TV Antigua 2021-12-09 09:57:27 MTR013 repair
6050009 closed 2021-10-01 00:00:00 TV Jamaica 2021-12-09 09:57:27 MTR013 repair
6050608 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6050972 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6052253 closed 2021-10-02 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6053697 open 2021-10-03 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
**EXPECTED OUTPUT FORMAT** SAMPLE
country product load_time metric_id name ticket_time Value(count(id)with status closed)
Antigua TV 2021-12-09 09:57:27 MTR013 pending_repair 2021-10-01 1
.... ... .... ... ... ... 2
我试过下面的代码:
df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
但它 return 只有三列 国家、产品和价值。
我需要在上面的预期输出格式中提到的其余列。
另外,我试过
df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)
输出
id ticket_time product country load_time metric_id name Value
3762949 2021-10-01 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 23
3766608 2021-10-04 Broadband St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 87
第二个逻辑转换 return我不想要的 id 列。值列基于 count(id),其中状态为关闭。我尝试了上述两种方法,但无法获得预期的输出。有什么办法可以解决这个问题吗?
当您分组时,通常是根据某个类别聚合数据,因此您不会保留所有单独的记录,而只会留下您分组的列 -按和聚合数据的列(计数、平均值等)。然而,转换函数会做你想做的事。我认为这就是您根据预期输出寻找的内容。
df_closed = df[df['status']=='closed'] # Filters data
df_closed = df_closed.reindex() # Resets index
df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)
我有一个如下所示的数据框。
我需要做 groupby(country and product) 和 Value 列应该包含 count(id) where status is closed 和我需要 return 剩余的列。预期输出格式如下。
Sample input
id status ticket_time product country last_load_time metric_id name
1260057 open 2021-10-04 01:20:00 Broadband Grenada 2021-12-09 09:57:27 MTR013 repair
2998178 open 2021-10-02 00:00:00 Fixed Voice Bahamas 2021-12-09 09:57:27 MTR013 repair
3762949 closed 2021-10-01 00:00:00 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 repair
3766608 closed 2021-10-04 00:00:00 Broadband St Lucia 2021-12-09 09:57:27 MTR013 repair
3767125 closed 2021-10-04 00:00:00 TV Antigua 2021-12-09 09:57:27 MTR013 repair
6050009 closed 2021-10-01 00:00:00 TV Jamaica 2021-12-09 09:57:27 MTR013 repair
6050608 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6050972 open 2021-10-01 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6052253 closed 2021-10-02 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
6053697 open 2021-10-03 00:00:00 Broadband Jamaica 2021-12-09 09:57:27 MTR013 repair
**EXPECTED OUTPUT FORMAT** SAMPLE
country product load_time metric_id name ticket_time Value(count(id)with status closed)
Antigua TV 2021-12-09 09:57:27 MTR013 pending_repair 2021-10-01 1
.... ... .... ... ... ... 2
我试过下面的代码:
df = new_df[new_df['status'] == 'closed'].groupby(['country', 'product']).agg(Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
但它 return 只有三列 国家、产品和价值。
我需要在上面的预期输出格式中提到的其余列。 另外,我试过
df1 = new_df[new_df['status'] == 'closed']
df1['Value'] = df1.groupby(['country', 'product'])['status'].transform('size')
df = df1.drop_duplicates(['country', 'product']).drop('status', axis=1)
输出
id ticket_time product country load_time metric_id name Value
3762949 2021-10-01 Fixed Voice St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 23
3766608 2021-10-04 Broadband St Lucia 2021-12-09 09:57:27 MTR013 pending_repair 87
第二个逻辑转换 return我不想要的 id 列。值列基于 count(id),其中状态为关闭。我尝试了上述两种方法,但无法获得预期的输出。有什么办法可以解决这个问题吗?
当您分组时,通常是根据某个类别聚合数据,因此您不会保留所有单独的记录,而只会留下您分组的列 -按和聚合数据的列(计数、平均值等)。然而,转换函数会做你想做的事。我认为这就是您根据预期输出寻找的内容。
df_closed = df[df['status']=='closed'] # Filters data
df_closed = df_closed.reindex() # Resets index
df_closed['count_closed'] = df_closed.groupby('country')['status'].transform(len)