通过 pandas 中的字符串列聚合数据框
Aggregating a dataframe by a String column in pandas
我有一个如下所示的数据框:
dfB
name value country
benzene spice Australia
benzene spice Australia
benzene spice Australia
benzene herbs Australia
benzene herbs Americas
benzene anise Poland
methyl herbs
methyl herbs Americas
methyl spice Americas
alcohol spice Germany
alcohol spice Germany
我想创建一个不同的数据框,它是国家列的聚合,如下所示:
dfB
name value country count
benzene spice Australia 3
benzene herbs Australia 1
benzene herbs Americas 1
benzene anise Poland 1
methyl herbs 1
methyl herbs Americas 1
methyl spice Americas 1
alcohol spice Germany 2
我们的想法是聚合国家/地区列并为每个唯一的“名称”和“值”组合创建国家/地区列的计数。如果有空白或Nan也应该区别对待。
我尝试使用 groupby:
grouped = dfB.groupby(["name", "value", "country"]).agg({"country": "count"})
但它似乎没有按照我的意图创建数据框。我该怎么做?
使用value_counts
或groupby
不修改顺序:
out = dfB.value_counts(["name", "value", "country"], sort=False, dropna=False) \
.rename('count').reset_index()
out.loc[out['country'].isna(), 'count'] = 1
out1 = dfB.groupby(["name", "value", "country"], sort=False, dropna=False) \
.size().reset_index(name='count')
out1.loc[out1['country'].isna(), 'count'] = 1
>>> out
name value country count
0 alcohol spice Germany 2
1 benzene anise Poland 1
2 benzene herbs Americas 1
3 benzene herbs Australia 1
4 benzene spice Australia 3
5 methyl herbs Americas 1
6 methyl herbs NaN 1
7 methyl spice Americas 1
>>> out1
name value country count
0 benzene spice Australia 3
1 benzene herbs Australia 1
2 benzene herbs Americas 1
3 benzene anise Poland 1
4 methyl herbs NaN 1
5 methyl herbs Americas 1
6 methyl spice Americas 1
7 alcohol spice Germany 2
我有一个如下所示的数据框:
dfB
name value country
benzene spice Australia
benzene spice Australia
benzene spice Australia
benzene herbs Australia
benzene herbs Americas
benzene anise Poland
methyl herbs
methyl herbs Americas
methyl spice Americas
alcohol spice Germany
alcohol spice Germany
我想创建一个不同的数据框,它是国家列的聚合,如下所示:
dfB
name value country count
benzene spice Australia 3
benzene herbs Australia 1
benzene herbs Americas 1
benzene anise Poland 1
methyl herbs 1
methyl herbs Americas 1
methyl spice Americas 1
alcohol spice Germany 2
我们的想法是聚合国家/地区列并为每个唯一的“名称”和“值”组合创建国家/地区列的计数。如果有空白或Nan也应该区别对待。
我尝试使用 groupby:
grouped = dfB.groupby(["name", "value", "country"]).agg({"country": "count"})
但它似乎没有按照我的意图创建数据框。我该怎么做?
使用value_counts
或groupby
不修改顺序:
out = dfB.value_counts(["name", "value", "country"], sort=False, dropna=False) \
.rename('count').reset_index()
out.loc[out['country'].isna(), 'count'] = 1
out1 = dfB.groupby(["name", "value", "country"], sort=False, dropna=False) \
.size().reset_index(name='count')
out1.loc[out1['country'].isna(), 'count'] = 1
>>> out
name value country count
0 alcohol spice Germany 2
1 benzene anise Poland 1
2 benzene herbs Americas 1
3 benzene herbs Australia 1
4 benzene spice Australia 3
5 methyl herbs Americas 1
6 methyl herbs NaN 1
7 methyl spice Americas 1
>>> out1
name value country count
0 benzene spice Australia 3
1 benzene herbs Australia 1
2 benzene herbs Americas 1
3 benzene anise Poland 1
4 methyl herbs NaN 1
5 methyl herbs Americas 1
6 methyl spice Americas 1
7 alcohol spice Germany 2