Pandas 避免比较 2 个数据帧时出现多维键错误

Pandas Avoid Multidimensional Key Error Comparing 2 Dataframes

我陷入多维键值错误。我有一个看起来像这样的数据框:

    year      RMSE  index  cyear  Corr_to_CY
0   2000  0.279795      5   1997    0.997975
1   2011  0.299011      2   1994    0.997792
2   2003  0.368341      1   1993    0.977143
3   2013  0.377902     23   2015    0.824441
4   1999   0.41495     10   2002    0.804633
5   1997  0.435813      8   2000    0.752724
6   2018  0.491003     24   2016    0.703359
7   2002  0.505771      3   1995    0.684926
8   2009  0.529308     17   2009    0.580481
9   2015  0.584146     27   2019    0.556555
10  2004  0.620946     26   2018    0.500790
11  2016  0.659388     22   2014    0.443543
12  1993  0.700942     19   2011    0.431615
13  2006  0.748086     11   2003    0.375111
14  2007  0.766675     21   2013    0.323143
15  2020  0.827913     12   2004    0.149202
16  2014  0.884109      7   1999    0.002438
17  2012  0.900184      0   1992   -0.351615
18  1995  0.919482     28   2020   -0.448915
19  1992  0.930512     20   2012   -0.563762
20  2001  0.967834     18   2010   -0.613170
21  2019   1.00497      9   2001   -0.677590
22  2005   1.00885     13   2005   -0.695690
23  2010  1.159125     14   2006   -0.843122
24  2017  1.173262     15   2007   -0.931034
25  1994  1.179737      6   1998   -0.939697
26  2008  1.212915     25   2017   -0.981626
27  1996  1.308853     16   2008   -0.985893
28  1998  1.396771      4   1996   -0.999990

我已将 'Corr_to_CY' 列值 >= 0.70 和 'cyear' 列的 return 值的条件选择到名为 'cyears' 的新 df 中。我需要使用它作为索引来查找 'year' 列在 cyears df 中的年份和 RMSE 值。这是我最好的尝试,我得到了值错误:无法使用多维键进行索引。我是否需要将索引 df "cyears" 更改为其他内容 - 系列、列表等才能正常工作?谢谢,这是我产生错误的代码:

cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
cyears = cyears.to_frame()
result = comp.loc[comp['year'] == cyears,'RMSE']

ValueError: Cannot index with multidimensional key

您可以使用isin方法:

import pandas as pd

# Sample creation
import io
comp = pd.read_csv(io.StringIO('year,RMSE,index,cyear,Corr_to_CY\n2000,0.279795,5,1997,0.997975\n2011,0.299011,2,1994,0.997792\n2003,0.368341,1,1993,0.977143\n2013,0.377902,23,2015,0.824441\n1999,0.41495,10,2002,0.804633\n1997,0.435813,8,2000,0.752724\n2018,0.491003,24,2016,0.703359\n2002,0.505771,3,1995,0.684926\n2009,0.529308,17,2009,0.580481\n2015,0.584146,27,2019,0.556555\n2004,0.620946,26,2018,0.500790\n2016,0.659388,22,2014,0.443543\n1993,0.700942,19,2011,0.431615\n2006,0.748086,11,2003,0.375111\n2007,0.766675,21,2013,0.323143\n2020,0.827913,12,2004,0.149202\n2014,0.884109,7,1999,0.002438\n2012,0.900184,0,1992,-0.351615\n1995,0.919482,28,2020,-0.448915\n1992,0.930512,20,2012,-0.563762\n2001,0.967834,18,2010,-0.613170\n2019,1.00497,9,2001,-0.677590\n2005,1.00885,13,2005,-0.695690\n2010,1.159125,14,2006,-0.843122\n2017,1.173262,15,2007,-0.931034\n1994,1.179737,6,1998,-0.939697\n2008,1.212915,25,2017,-0.981626\n1996,1.308853,16,2008,-0.985893\n1998,1.396771,4,1996,-0.999990\n'))

# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
result = comp.loc[comp['year'].isin(cyears),'RMSE']

如果您想将 cyears 保留为 pandas DataFrame 而不是 Series,请尝试以下操作:

# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7, ['cyear']]
result = comp.loc[comp['year'].isin(cyears.cyear),'RMSE']