Pandas 避免比较 2 个数据帧时出现多维键错误
Pandas Avoid Multidimensional Key Error Comparing 2 Dataframes
我陷入多维键值错误。我有一个看起来像这样的数据框:
year RMSE index cyear Corr_to_CY
0 2000 0.279795 5 1997 0.997975
1 2011 0.299011 2 1994 0.997792
2 2003 0.368341 1 1993 0.977143
3 2013 0.377902 23 2015 0.824441
4 1999 0.41495 10 2002 0.804633
5 1997 0.435813 8 2000 0.752724
6 2018 0.491003 24 2016 0.703359
7 2002 0.505771 3 1995 0.684926
8 2009 0.529308 17 2009 0.580481
9 2015 0.584146 27 2019 0.556555
10 2004 0.620946 26 2018 0.500790
11 2016 0.659388 22 2014 0.443543
12 1993 0.700942 19 2011 0.431615
13 2006 0.748086 11 2003 0.375111
14 2007 0.766675 21 2013 0.323143
15 2020 0.827913 12 2004 0.149202
16 2014 0.884109 7 1999 0.002438
17 2012 0.900184 0 1992 -0.351615
18 1995 0.919482 28 2020 -0.448915
19 1992 0.930512 20 2012 -0.563762
20 2001 0.967834 18 2010 -0.613170
21 2019 1.00497 9 2001 -0.677590
22 2005 1.00885 13 2005 -0.695690
23 2010 1.159125 14 2006 -0.843122
24 2017 1.173262 15 2007 -0.931034
25 1994 1.179737 6 1998 -0.939697
26 2008 1.212915 25 2017 -0.981626
27 1996 1.308853 16 2008 -0.985893
28 1998 1.396771 4 1996 -0.999990
我已将 'Corr_to_CY' 列值 >= 0.70 和 'cyear' 列的 return 值的条件选择到名为 'cyears' 的新 df 中。我需要使用它作为索引来查找 'year' 列在 cyears df 中的年份和 RMSE 值。这是我最好的尝试,我得到了值错误:无法使用多维键进行索引。我是否需要将索引 df "cyears" 更改为其他内容 - 系列、列表等才能正常工作?谢谢,这是我产生错误的代码:
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
cyears = cyears.to_frame()
result = comp.loc[comp['year'] == cyears,'RMSE']
ValueError: Cannot index with multidimensional key
您可以使用isin
方法:
import pandas as pd
# Sample creation
import io
comp = pd.read_csv(io.StringIO('year,RMSE,index,cyear,Corr_to_CY\n2000,0.279795,5,1997,0.997975\n2011,0.299011,2,1994,0.997792\n2003,0.368341,1,1993,0.977143\n2013,0.377902,23,2015,0.824441\n1999,0.41495,10,2002,0.804633\n1997,0.435813,8,2000,0.752724\n2018,0.491003,24,2016,0.703359\n2002,0.505771,3,1995,0.684926\n2009,0.529308,17,2009,0.580481\n2015,0.584146,27,2019,0.556555\n2004,0.620946,26,2018,0.500790\n2016,0.659388,22,2014,0.443543\n1993,0.700942,19,2011,0.431615\n2006,0.748086,11,2003,0.375111\n2007,0.766675,21,2013,0.323143\n2020,0.827913,12,2004,0.149202\n2014,0.884109,7,1999,0.002438\n2012,0.900184,0,1992,-0.351615\n1995,0.919482,28,2020,-0.448915\n1992,0.930512,20,2012,-0.563762\n2001,0.967834,18,2010,-0.613170\n2019,1.00497,9,2001,-0.677590\n2005,1.00885,13,2005,-0.695690\n2010,1.159125,14,2006,-0.843122\n2017,1.173262,15,2007,-0.931034\n1994,1.179737,6,1998,-0.939697\n2008,1.212915,25,2017,-0.981626\n1996,1.308853,16,2008,-0.985893\n1998,1.396771,4,1996,-0.999990\n'))
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
result = comp.loc[comp['year'].isin(cyears),'RMSE']
如果您想将 cyears
保留为 pandas DataFrame 而不是 Series,请尝试以下操作:
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7, ['cyear']]
result = comp.loc[comp['year'].isin(cyears.cyear),'RMSE']
我陷入多维键值错误。我有一个看起来像这样的数据框:
year RMSE index cyear Corr_to_CY
0 2000 0.279795 5 1997 0.997975
1 2011 0.299011 2 1994 0.997792
2 2003 0.368341 1 1993 0.977143
3 2013 0.377902 23 2015 0.824441
4 1999 0.41495 10 2002 0.804633
5 1997 0.435813 8 2000 0.752724
6 2018 0.491003 24 2016 0.703359
7 2002 0.505771 3 1995 0.684926
8 2009 0.529308 17 2009 0.580481
9 2015 0.584146 27 2019 0.556555
10 2004 0.620946 26 2018 0.500790
11 2016 0.659388 22 2014 0.443543
12 1993 0.700942 19 2011 0.431615
13 2006 0.748086 11 2003 0.375111
14 2007 0.766675 21 2013 0.323143
15 2020 0.827913 12 2004 0.149202
16 2014 0.884109 7 1999 0.002438
17 2012 0.900184 0 1992 -0.351615
18 1995 0.919482 28 2020 -0.448915
19 1992 0.930512 20 2012 -0.563762
20 2001 0.967834 18 2010 -0.613170
21 2019 1.00497 9 2001 -0.677590
22 2005 1.00885 13 2005 -0.695690
23 2010 1.159125 14 2006 -0.843122
24 2017 1.173262 15 2007 -0.931034
25 1994 1.179737 6 1998 -0.939697
26 2008 1.212915 25 2017 -0.981626
27 1996 1.308853 16 2008 -0.985893
28 1998 1.396771 4 1996 -0.999990
我已将 'Corr_to_CY' 列值 >= 0.70 和 'cyear' 列的 return 值的条件选择到名为 'cyears' 的新 df 中。我需要使用它作为索引来查找 'year' 列在 cyears df 中的年份和 RMSE 值。这是我最好的尝试,我得到了值错误:无法使用多维键进行索引。我是否需要将索引 df "cyears" 更改为其他内容 - 系列、列表等才能正常工作?谢谢,这是我产生错误的代码:
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
cyears = cyears.to_frame()
result = comp.loc[comp['year'] == cyears,'RMSE']
ValueError: Cannot index with multidimensional key
您可以使用isin
方法:
import pandas as pd
# Sample creation
import io
comp = pd.read_csv(io.StringIO('year,RMSE,index,cyear,Corr_to_CY\n2000,0.279795,5,1997,0.997975\n2011,0.299011,2,1994,0.997792\n2003,0.368341,1,1993,0.977143\n2013,0.377902,23,2015,0.824441\n1999,0.41495,10,2002,0.804633\n1997,0.435813,8,2000,0.752724\n2018,0.491003,24,2016,0.703359\n2002,0.505771,3,1995,0.684926\n2009,0.529308,17,2009,0.580481\n2015,0.584146,27,2019,0.556555\n2004,0.620946,26,2018,0.500790\n2016,0.659388,22,2014,0.443543\n1993,0.700942,19,2011,0.431615\n2006,0.748086,11,2003,0.375111\n2007,0.766675,21,2013,0.323143\n2020,0.827913,12,2004,0.149202\n2014,0.884109,7,1999,0.002438\n2012,0.900184,0,1992,-0.351615\n1995,0.919482,28,2020,-0.448915\n1992,0.930512,20,2012,-0.563762\n2001,0.967834,18,2010,-0.613170\n2019,1.00497,9,2001,-0.677590\n2005,1.00885,13,2005,-0.695690\n2010,1.159125,14,2006,-0.843122\n2017,1.173262,15,2007,-0.931034\n1994,1.179737,6,1998,-0.939697\n2008,1.212915,25,2017,-0.981626\n1996,1.308853,16,2008,-0.985893\n1998,1.396771,4,1996,-0.999990\n'))
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7,'cyear']
result = comp.loc[comp['year'].isin(cyears),'RMSE']
如果您想将 cyears
保留为 pandas DataFrame 而不是 Series,请尝试以下操作:
# Operations
cyears = comp.loc[comp['Corr_to_CY']>= 0.7, ['cyear']]
result = comp.loc[comp['year'].isin(cyears.cyear),'RMSE']