如何遍历 pandas Dataframe 中的组,对每个组进行操作,然后将值分配给原始 Dataframe?

How do you iterate through groups in a pandas Dataframe, operate on each group, then assign values to the original dataframe?

    yearCount = df[['antibiotic', 'order_date', 'antiYearCount']]

    yearGroups = yearCount.groupby('order_date')

    for year in yearGroups:
        yearCount['antiYearCount'] =year.groupby('antibiotic'['antibiotic'].transform(pd.Series.value_counts)

在这种情况下,yearCount 是一个包含 'order_date', 'antibiotic', 'antiYearCount' 的数据帧。我已清理 'order_date' 以仅包含订单年份。我想按 'order_date' 中的年份对 yearCount 进行分组,计算每个 'antibiotic' 在每个“年份组”中出现的次数,然后将该值分配给 yearCount'antiYearCount'变量。

我认为您需要将新列 order_date 添加到 groupby 然后也可以使用 size 而不是 pd.Series.value_counts 以获得相同的输出:

df = pd.DataFrame({'antibiotic':list('accbbb'),
                   'antiYearCount':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'order_date': pd.to_datetime(['2012-01-01']*3+['2012-01-02']*3)})

print (df)
   C  D  E  antiYearCount antibiotic order_date
0  7  1  5              4          a 2012-01-01
1  8  3  3              5          c 2012-01-01
2  9  5  6              4          c 2012-01-01
3  4  7  9              5          b 2012-01-02
4  2  1  2              5          b 2012-01-02
5  3  0  4              4          b 2012-01-02

#copy for remove warning
#
yearCount = df[['antibiotic', 'order_date', 'antiYearCount']].copy()
yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform('size')
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3

yearCount['antiYearCount'] = yearCount.groupby(['order_date','antibiotic'])['antibiotic'] \
                                      .transform(pd.Series.value_counts)
print (yearCount)
  antibiotic order_date  antiYearCount
0          a 2012-01-01              1
1          c 2012-01-01              2
2          c 2012-01-01              2
3          b 2012-01-02              3
4          b 2012-01-02              3
5          b 2012-01-02              3