Pandas 多列数据框条件流
Pandas Data Frame conditional flow with multiple columns
我有一个数据框如下:
fix = pd.DataFrame()
fix ['Home'] =['A','B','C','D','E']
fix ['Away'] =['F','G','H','I','J']
fix ['GD = -2'] = [0.2,0.3,0.5,0.1,0.6]
fix ['GD = -1'] = [0.25,0.1,0.55,0.35,0.43]
fix ['GD = 0'] = [0.1,0.2,0.23,0.5,0.4]
fix ['GD = 2'] = [0.1,0.5,0.2,0.12,0.18]
fix ['GD = 1'] = [0.24,0.5,0.33,0.31,0.13]
我想创建一个包含基于 GD 的获胜球队的新列(即 GD +ve 表示主队获胜,GD -Ve 表示客队获胜,GD = 0 表示平局。
所以我编写了以下代码来锻炼新专栏。
GDPlus = fix ['GD=1'] or fix['GD=2']
GDMins = fix ['GD= -1'] or fix['GD= -2']
fix['Winning_Team'] = np.select([GDPlus,GDMins],[fix.Home,fix.Away],default ='Draw')
它给我一个错误如下:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
谁能告诉我怎么做?
如果想要新列 max
值:
#get max values per rows
smax = fix.max(axis=1)
compare by eq (==) and check if at least one True per rows
GDPlus = fix[['GD = 1','GD = 2']].eq(smax, axis=0).any(axis=1)
GDMins = fix[['GD = -1','GD = -2']].eq(smax, axis=0).any(axis=1)
你的解决方案应该通过比较 eq
(==
):
来改变
GDPlus = fix ['GD = 1'].eq(smax) | fix['GD = 2'].eq(smax)
GDMins = fix ['GD = -1'].eq(smax) | fix['GD = -2'] .eq(smax)
#alternative solution
#GDPlus = (fix['GD = 1'] == smax) | (fix['GD = 2'] == smax)
#GDMins = (fix['GD = -1'] == smax) | (fix['GD = -2'] == smax)
fix['Winning_Team'] = np.select([GDPlus,GDMins],[fix.Home,fix.Away],default ='Draw')
print (fix)
Home Away GD = -2 GD = -1 GD = 0 GD = 2 GD = 1 Winning_Team
0 A F 0.2 0.25 0.10 0.10 0.24 F
1 B G 0.3 0.10 0.20 0.50 0.50 B
2 C H 0.5 0.55 0.23 0.20 0.33 H
3 D I 0.1 0.35 0.50 0.12 0.31 Draw
4 E J 0.6 0.43 0.40 0.18 0.13 J
我有一个数据框如下:
fix = pd.DataFrame()
fix ['Home'] =['A','B','C','D','E']
fix ['Away'] =['F','G','H','I','J']
fix ['GD = -2'] = [0.2,0.3,0.5,0.1,0.6]
fix ['GD = -1'] = [0.25,0.1,0.55,0.35,0.43]
fix ['GD = 0'] = [0.1,0.2,0.23,0.5,0.4]
fix ['GD = 2'] = [0.1,0.5,0.2,0.12,0.18]
fix ['GD = 1'] = [0.24,0.5,0.33,0.31,0.13]
我想创建一个包含基于 GD 的获胜球队的新列(即 GD +ve 表示主队获胜,GD -Ve 表示客队获胜,GD = 0 表示平局。
所以我编写了以下代码来锻炼新专栏。
GDPlus = fix ['GD=1'] or fix['GD=2']
GDMins = fix ['GD= -1'] or fix['GD= -2']
fix['Winning_Team'] = np.select([GDPlus,GDMins],[fix.Home,fix.Away],default ='Draw')
它给我一个错误如下:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
谁能告诉我怎么做?
如果想要新列 max
值:
#get max values per rows
smax = fix.max(axis=1)
compare by eq (==) and check if at least one True per rows
GDPlus = fix[['GD = 1','GD = 2']].eq(smax, axis=0).any(axis=1)
GDMins = fix[['GD = -1','GD = -2']].eq(smax, axis=0).any(axis=1)
你的解决方案应该通过比较 eq
(==
):
GDPlus = fix ['GD = 1'].eq(smax) | fix['GD = 2'].eq(smax)
GDMins = fix ['GD = -1'].eq(smax) | fix['GD = -2'] .eq(smax)
#alternative solution
#GDPlus = (fix['GD = 1'] == smax) | (fix['GD = 2'] == smax)
#GDMins = (fix['GD = -1'] == smax) | (fix['GD = -2'] == smax)
fix['Winning_Team'] = np.select([GDPlus,GDMins],[fix.Home,fix.Away],default ='Draw')
print (fix)
Home Away GD = -2 GD = -1 GD = 0 GD = 2 GD = 1 Winning_Team
0 A F 0.2 0.25 0.10 0.10 0.24 F
1 B G 0.3 0.10 0.20 0.50 0.50 B
2 C H 0.5 0.55 0.23 0.20 0.33 H
3 D I 0.1 0.35 0.50 0.12 0.31 Draw
4 E J 0.6 0.43 0.40 0.18 0.13 J