基于分类值更新函数 python

Update function based on categorical values python

    MatchId ExpectedGoals_Team1 ExpectedGoals_Team2 Timestamp         Stages        Home              Away
0   698085  0.8585339288573895  1.4819072820614578  2016-08-13 11:30:00  0        [92, 112]            [94]
1   698086  1.097064295289673   1.0923520385902274  2016-09-12 14:00:00  0        []                   [164]
2   698087  1.2752442136224664  0.8687263006179976  2016-11-25 14:00:00  1        [90]                 [147]
3   698088  1.0571269856980154  1.4323522262211752  2016-02-16 14:00:00  2        [10, 66, 101]        [50, 118]
4   698089  1.2680212913301165  0.918961072480616   2016-05-10 14:00:00  2        [21]                 [134, 167]

这是需要根据分类列 'Stages'.

更新结果的函数
x1 = np.array([1, 0, 0])
x2 = np.array([0, 1, 0])
x3 = np.array([0, 0, 1])
total_timeslot = 196
m=1

def squared_diff(row):
    ssd = []
    Home = row.Home
    Away = row.Away
    y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
for k in range(total_timeslot):          
    if k in Home:
        ssd.append(sum((x2 - y) ** 2))
    elif k in Away:
        ssd.append(sum((x3 - y) ** 2))
    else:
        ssd.append(sum((x1 - y) ** 2))
return sum(ssd)

sum(df.apply(squared_diff, axis=1)) 
For m=1, Out[400]: 7636.305551658377

通过为 Stages 中的每个类别分配任意值 m 我想测试成本函数。 Let m1 = 2, m2 = 3.

这是我尝试的方式。

def stages(row):
    Stages = row.Stages
    if Stages == 0:
        return np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])
    elif Stages == 1:
        return np.array([1 - (row.ExpectedGoals_Team1*m1 + row.ExpectedGoals_Team2*m1), row.ExpectedGoals_Team1*m1, row.ExpectedGoals_Team2*m1])
    else:
        return np.array([1 - (row.ExpectedGoals_Team1*m2 + row.ExpectedGoals_Team2*m2), row.ExpectedGoals_Team1*m2, row.ExpectedGoals_Team2*m2])

df.apply(squared_diff, Stages, axis=1)

TypeError: apply() got multiple values for argument 'axis'

df.apply(squared_diff, Stages, axis=1) 出错,因为第二个参数是 axis 所以它认为 axis=Stages,但第三个参数又是 axis=1.

为了解决这个问题,您可以先将所需的 m 存储到单独的列中

df['m'] = df.Stages.apply(lambda x: 1 if x == 0 else 2 if x == 1 else 3)

然后在您的 squared_diff 函数中替换这一行

y = np.array([1 - (row.ExpectedGoals_Team1*m + row.ExpectedGoals_Team2*m), row.ExpectedGoals_Team1*m, row.ExpectedGoals_Team2*m])

y = np.array([1 - (row.ExpectedGoals_Team1*row.m + row.ExpectedGoals_Team2*row.m), row.ExpectedGoals_Team1*row.m, row.ExpectedGoals_Team2*row.m])