如何遍历 pandas Dataframe 中的组,对每个组进行操作,然后将值分配给原始 Dataframe?
How do you iterate through groups in a pandas Dataframe, operate on each group, then assign values to the original dataframe?
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]
yearGroups = yearCount.groupby('order_date')
for year in yearGroups:
yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)
在这种情况下,yearCount
是一个包含 'order_date', 'antibiotic', 'antiYearCount'
的数据帧。我已清理 'order_date'
以仅包含订单年份。我想按 'order_date'
中的年份对 yearCount
进行分组,计算每个 'antibiotic'
在每个“年份组”中出现的次数,然后将该值分配给 yearCount
的 'antiYearCount'
变量。
我认为您需要将新列 order_date
添加到 groupby
然后也可以使用 size
而不是 pd.Series.value_counts
以获得相同的输出:
df = pd.DataFrame({'antibiotic':list('accbbb'),
'antiYearCount':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})
print (df)
C D E antiYearCount antibiotic order_date
0 7 1 5 4 a 2012-01-01
1 8 3 3 5 c 2012-01-01
2 9 5 6 4 c 2012-01-01
3 4 7 9 5 b 2012-01-02
4 2 1 2 5 b 2012-01-02
5 3 0 4 4 b 2012-01-02
#copy for remove warning
#
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform('size')
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform(pd.Series.value_counts)
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]
yearGroups = yearCount.groupby('order_date')
for year in yearGroups:
yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)
在这种情况下,yearCount
是一个包含 'order_date', 'antibiotic', 'antiYearCount'
的数据帧。我已清理 'order_date'
以仅包含订单年份。我想按 'order_date'
中的年份对 yearCount
进行分组,计算每个 'antibiotic'
在每个“年份组”中出现的次数,然后将该值分配给 yearCount
的 'antiYearCount'
变量。
我认为您需要将新列 order_date
添加到 groupby
然后也可以使用 size
而不是 pd.Series.value_counts
以获得相同的输出:
df = pd.DataFrame({'antibiotic':list('accbbb'),
'antiYearCount':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})
print (df)
C D E antiYearCount antibiotic order_date
0 7 1 5 4 a 2012-01-01
1 8 3 3 5 c 2012-01-01
2 9 5 6 4 c 2012-01-01
3 4 7 9 5 b 2012-01-02
4 2 1 2 5 b 2012-01-02
5 3 0 4 4 b 2012-01-02
#copy for remove warning
#
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform('size')
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
.transform(pd.Series.value_counts)
print (yearCount)
antibiotic order_date antiYearCount
0 a 2012-01-01 1
1 c 2012-01-01 2
2 c 2012-01-01 2
3 b 2012-01-02 3
4 b 2012-01-02 3
5 b 2012-01-02 3