使用 Pandas 中的不同列进行聚合分组
Grouping by with aggregating using different columns in Pandas
在 pandas 中有一个数据框,其中包含 ID 和交货日期(例如,每周 7 天):
我想使用 groupby() pandas 函数并创建以下内容 - 每天创建 7 个不同的列(例如,delivery_day_1、delivery_day_2 等。 ) 并计算数据框中按 ID 分组的出现次数。怎么做到的?
谢谢。
我认为你需要 groupby
+ size
+ unstack
or crosstab
先进行整形。
然后,如果需要,通过 reindex_axis
and last add_prefix
:
添加缺失的 weekday
s
样本:
df = pd.DataFrame({'subscription_id':[1,2,3,1], 'delivery_weekday':[1,1,2,1]})
print (df)
delivery_weekday subscription_id
0 1 1
1 1 2
2 2 3
3 1 1
df = df.groupby(['subscription_id','delivery_weekday']) \
.size() \
.unstack(fill_value=0) \
.reindex_axis(range(1,8), fill_value=0, axis=1) \
.add_prefix('delivery_day_')
print (df)
delivery_weekday delivery_day_1 delivery_day_2 delivery_day_3 \
subscription_id
1 2 0 0
2 1 0 0
3 0 1 0
delivery_weekday delivery_day_4 delivery_day_5 delivery_day_6 \
subscription_id
1 0 0 0
2 0 0 0
3 0 0 0
delivery_weekday delivery_day_7
subscription_id
1 0
2 0
3 0
df = pd.crosstab(df['subscription_id'],df['delivery_weekday']) \
.reindex_axis(range(1,8), fill_value=0, axis=1) \
.add_prefix('delivery_day_')
print (df)
delivery_weekday delivery_day_1 delivery_day_2 delivery_day_3 \
subscription_id
1 2 0 0
2 1 0 0
3 0 1 0
delivery_weekday delivery_day_4 delivery_day_5 delivery_day_6 \
subscription_id
1 0 0 0
2 0 0 0
3 0 0 0
delivery_weekday delivery_day_7
subscription_id
1 0
2 0
3 0
在 pandas 中有一个数据框,其中包含 ID 和交货日期(例如,每周 7 天):
我想使用 groupby() pandas 函数并创建以下内容 - 每天创建 7 个不同的列(例如,delivery_day_1、delivery_day_2 等。 ) 并计算数据框中按 ID 分组的出现次数。怎么做到的?
谢谢。
我认为你需要 groupby
+ size
+ unstack
or crosstab
先进行整形。
然后,如果需要,通过 reindex_axis
and last add_prefix
:
weekday
s
样本:
df = pd.DataFrame({'subscription_id':[1,2,3,1], 'delivery_weekday':[1,1,2,1]})
print (df)
delivery_weekday subscription_id
0 1 1
1 1 2
2 2 3
3 1 1
df = df.groupby(['subscription_id','delivery_weekday']) \
.size() \
.unstack(fill_value=0) \
.reindex_axis(range(1,8), fill_value=0, axis=1) \
.add_prefix('delivery_day_')
print (df)
delivery_weekday delivery_day_1 delivery_day_2 delivery_day_3 \
subscription_id
1 2 0 0
2 1 0 0
3 0 1 0
delivery_weekday delivery_day_4 delivery_day_5 delivery_day_6 \
subscription_id
1 0 0 0
2 0 0 0
3 0 0 0
delivery_weekday delivery_day_7
subscription_id
1 0
2 0
3 0
df = pd.crosstab(df['subscription_id'],df['delivery_weekday']) \
.reindex_axis(range(1,8), fill_value=0, axis=1) \
.add_prefix('delivery_day_')
print (df)
delivery_weekday delivery_day_1 delivery_day_2 delivery_day_3 \
subscription_id
1 2 0 0
2 1 0 0
3 0 1 0
delivery_weekday delivery_day_4 delivery_day_5 delivery_day_6 \
subscription_id
1 0 0 0
2 0 0 0
3 0 0 0
delivery_weekday delivery_day_7
subscription_id
1 0
2 0
3 0