Pandas Groupby 多个条件KeyError

Question

我有一个名为 df_out 的 df，在下面的插入中有这样的列名，但由于某种原因我不能将 'groupby' 函数与列 headers 一起使用，因为它一直在给出我 KeyError：'year'。我已经研究并尝试剥离白色 space、重置索引、在我的 groupby 设置之前允许白色 space 等，但我无法克服这个 KeyError。df_out 看起来像这样：

df_out.columns
Out[185]: 
Index(['year', 'month', 'BARTON CHAPEL', 'BARTON I', 'BIG HORN I',
       'BLUE CREEK', 'BUFFALO RIDGE I', 'CAYUGA RIDGE', 'COLORADO GREEN',
       'DESERT WIND', 'DRY LAKE I', 'EL CABO', 'GROTON', 'NEW HARVEST',
       'PENASCAL I', 'RUGBY', 'TULE'],
      dtype='object', name='plant_name')

但是，当我使用 df_out.head() 时，我在 'plant_name' 的前导列中得到了不同的答案，所以这可能是错误的来源或相关之处。这是 -

的输出列

df_out.head()
Out[187]: 
plant_name  year  month  BARTON CHAPEL  BARTON I  BIG HORN I  BLUE CREEK  \
0           1991      1       6.432285  7.324126    5.170067    6.736384   
1           1991      2       7.121324  6.973586    4.922693    7.473527   
2           1991      3       8.125793  8.681317    5.796599    8.401855   
3           1991      4       7.454972  8.037764    7.272292    7.961625   
4           1991      5       7.012809  6.530013    6.626949    6.009825   

plant_name  BUFFALO RIDGE I  CAYUGA RIDGE  COLORADO GREEN  DESERT WIND  \
0                  7.163790      7.145323        5.783629     5.682003   
1                  7.595744      7.724717        6.245952     6.269524   
2                  8.111411      9.626075        7.918871     6.657648   
3                  8.807458      8.618806        7.011444     5.848736   
4                  7.734852      6.267097        7.410013     5.099610   

plant_name  DRY LAKE I    EL CABO    GROTON  NEW HARVEST  PENASCAL I  \
0             4.721089  10.747285  7.456640     6.921801    6.296425   
1             5.095923   8.891057  7.239762     7.449122    6.484241   
2             8.409637  12.238508  8.274046     8.824758    8.444960   
3             7.893694  10.837139  6.381736     8.840431    7.282444   
4             8.496976   8.636882  6.856747     7.469825    7.999530   

plant_name     RUGBY       TULE  
0           7.028360   4.110605  
1           6.394687   5.257128  
2           6.859462  10.789516  
3           7.590153   7.425153  
4           7.556546   8.085255

我得到 KeyError 的 groupby 语句看起来像这样，我正在尝试根据列表中 df_out 中的列子集计算年和月行的平均值 - 'west':

west=['BIG HORN I','DRY LAKE I', 'TULE']
westavg = df_out[df_out.columns[df_out.columns.isin(west)]].groupby(['year','month']).mean()

非常感谢，

Answer 1

您的代码可以分解为：

westavg =  (df_out[df_out.columns[df_out.columns.isin(west)]]
                 .groupby(['year','month']).mean()
           )

这不起作用，因为 ['year','month'] 不是 df_out[df_out.columns[df_out.columns.isin(west)]].

的列

尝试：

west_cols = [c for c in df_out if c in west]
westavg = df_out.groupby(['year','month'])[west_cols].mean()

Answer 2

好的，在下面的 Quang Hoang 的帮助下，我理解了这个问题并提出了这个有效的答案，我可以使用 .intersection 更好地理解它：

westavg = df_out[df_out.columns.intersection(west)].mean(axis=1)

#给我列表'west'`定义的列子集中每一行的平均值。

Pandas Groupby 多个条件KeyError

Pandas Groupby Multiple Conditions KeyError

pandas

keyerror