如何抽象出两个相似的函数

How to abstract over two similar functions

我有以下关于足球比赛的数据定义:

Game = namedtuple('Game', ['Date', 'Home', 'Away', 'HomeShots', 'AwayShots',
                           'HomeBT', 'AwayBT', 'HomeCrosses', 'AwayCrosses',
                           'HomeCorners', 'AwayCorners', 'HomeGoals',
                           'AwayGoals', 'HomeXG', 'AwayXG'])

这里有一些例子:

[Game(Date=datetime.date(2018, 10, 21), Home='Everton', Away='Crystal Palace', HomeShots='21', AwayShots='6', HomeBT='22', AwayBT='13', HomeCrosses='21', AwayCrosses='14', HomeCorners='10', AwayCorners='5', HomeGoals='2', AwayGoals='0', HomeXG='1.93', AwayXG='1.5'),
 Game(Date=datetime.date(2019, 2, 27), Home='Man City', Away='West Ham', HomeShots='20', AwayShots='2', HomeBT='51', AwayBT='6', HomeCrosses='34', AwayCrosses='5', HomeCorners='12', AwayCorners='2', HomeGoals='1', AwayGoals='0', HomeXG='3.68', AwayXG='0.4'),
 Game(Date=datetime.date(2019, 2, 9), Home='Fulham', Away='Man Utd', HomeShots='12', AwayShots='15', HomeBT='19', AwayBT='38', HomeCrosses='20', AwayCrosses='12', HomeCorners='5', AwayCorners='4', HomeGoals='0', AwayGoals='3', HomeXG='2.19', AwayXG='2.13'),
 Game(Date=datetime.date(2019, 3, 9), Home='Southampton', Away='Tottenham', HomeShots='12', AwayShots='15', HomeBT='13', AwayBT='17', HomeCrosses='15', AwayCrosses='15', HomeCorners='1', AwayCorners='10', HomeGoals='2', AwayGoals='1', HomeXG='2.08', AwayXG='1.27'),
 Game(Date=datetime.date(2018, 9, 22), Home='Man Utd', Away='Wolverhampton', HomeShots='16', AwayShots='11', HomeBT='17', AwayBT='17', HomeCrosses='26', AwayCrosses='13', HomeCorners='5', AwayCorners='4', HomeGoals='1', AwayGoals='1', HomeXG='0.62', AwayXG='1.12')]

还有两个几乎相同的函数计算给定球队的主场和客场统计数据。

def calculate_home_stats(team, games):
    """
    Calculates home stats for the given team.
    """
    home_stats = defaultdict(float)

    home_stats['HomeShotsFor'] = sum(int(game.HomeShots) for game in games if game.Home == team)
    home_stats['HomeShotsAgainst'] = sum(int(game.AwayShots) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesFor'] = sum(int(game.HomeBT) for game in games if game.Home == team)
    home_stats['HomeBoxTouchesAgainst'] = sum(int(game.AwayBT) for game in games if game.Home == team)
    home_stats['HomeCrossesFor'] = sum(int(game.HomeCrosses) for game in games if game.Home == team)
    home_stats['HomeCrossesAgainst'] = sum(int(game.AwayCrosses) for game in games if game.Home == team)
    home_stats['HomeCornersFor'] = sum(int(game.HomeCorners) for game in games if game.Home == team)
    home_stats['HomeCornersAgainst'] = sum(int(game.AwayCorners) for game in games if game.Home == team)
    home_stats['HomeGoalsFor'] = sum(int(game.HomeGoals) for game in games if game.Home == team)
    home_stats['HomeGoalsAgainst'] = sum(int(game.AwayGoals) for game in games if game.Home == team)
    home_stats['HomeXGoalsFor'] = sum(float(game.HomeXG) for game in games if game.Home == team)
    home_stats['HomeXGoalsAgainst'] = sum(float(game.AwayXG) for game in games if game.Home == team)
    home_stats['HomeGames'] = sum(1 for game in games if game.Home == team)

    return home_stats


def calculate_away_stats(team, games):
    """
    Calculates away stats for the given team.
    """
    away_stats = defaultdict(float)

    away_stats['AwayShotsFor'] = sum(int(game.AwayShots) for game in games if game.Away == team)
    away_stats['AwayShotsAgainst'] = sum(int(game.HomeShots) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesFor'] = sum(int(game.AwayBT) for game in games if game.Away == team)
    away_stats['AwayBoxTouchesAgainst'] = sum(int(game.HomeBT) for game in games if game.Away == team)
    away_stats['AwayCrossesFor'] = sum(int(game.AwayCrosses) for game in games if game.Away == team)
    away_stats['AwayCrossesAgainst'] = sum(int(game.HomeCrosses) for game in games if game.Away == team)
    away_stats['AwayCornersFor'] = sum(int(game.AwayCorners) for game in games if game.Away == team)
    away_stats['AwayCornersAgainst'] = sum(int(game.HomeCorners) for game in games if game.Away == team)
    away_stats['AwayGoalsFor'] = sum(int(game.AwayGoals) for game in games if game.Away == team)
    away_stats['AwayGoalsAgainst'] = sum(int(game.HomeGoals) for game in games if game.Away == team)
    away_stats['AwayXGoalsFor'] = sum(float(game.AwayXG) for game in games if game.Away == team)
    away_stats['AwayXGoalsAgainst'] = sum(float(game.HomeXG) for game in games if game.Away == team)
    away_stats['AwayGames'] = sum(1 for game in games if game.Away == team)

    return away_stats

我想知道是否有一种方法可以抽象出这两个函数并将它们合并为一个函数,而无需创建一堵 if/else 语句来确定球队是在主场比赛还是在客场比赛,以及哪个应计算字段。

我建议不要使用命名元组,而是使用带有字典的简单元组,例如:

game=(datetime.date(2019, 5, 12), 'Burnley', 'Arsenal', '12', '17', '26', '26', '21', '22', '4', '5', '1', '3', '1.73', '2.87')

还有一个映射字典:

numtostr={0: 'Date', 1: 'Home', 2: 'Away', 3: 'HomeShots', 4: 'AwayShots', 5: 'HomeBT', 6: 'AwayBT', 7: 'HomeCrosses', 8: 'AwayCrosses', 9: 'HomeCorners', 10: 'AwayCorners', 11: 'HomeGoals', 12: 'AwayGoals', 13: 'HomeXG'}
strtonum={'Date': 0, 'Home': 1, 'Away': 2, 'HomeShots': 3, 'AwayShots': 4, 'HomeBT': 5, 'AwayBT': 6, 'HomeCrosses': 7, 'AwayCrosses': 8, 'HomeCorners': 9, 'AwayCorners': 10, 'HomeGoals': 11, 'AwayGoals': 12, 'HomeXG': 13}

制作 homestats 和 awaystats 的映射字典({0: 'HomeShotsFor'、1: 'HomeShotsAgainst' 等} home_stats)。为了解释映射字典的工作原理,例如,如果你想获得游戏的 HomeCrosses,你可以

game[7]

game[strtonum['HomeCrosses']]

然后函数:

def calculate_home_stats(team, games):
    home_stats=[0]*13
    for game in games:
        if game[1]=team:
            for index in range(12):
                home_stats[index]+=game[index+3] #because you just put the sum of everything except date, home, and away which are the first 3 indices. see how this cleans everything up?
            home_stats[12]+=1

def calculate_away_stats(team, games):
    away_stats=[0]*13
    for game in games:
        if game[2]=team:
            for index in range(12):
                away_stats[index]+=game[index+3]
            away_stats[12]+=1

如果你真的想将两个功能合并为一个,你可以这样做:

def calculate_stats(team, games, homeaway):
    stats=[0]*13
    for game in games:
        if game[{'Home': 1, 'Away': 2}[homeaway]]=team:
            for index in range(12):
                stats[index]+=game[index+3]
            stats[12]+=1

与我的函数一样,您唯一需要更改的是检查主场或客场的索引,而不是需要大量更改的冗余 if else 语句。

拥有更清晰的数据结构可以编写更简单的代码。 在这种情况下,您的数据已经包含重复项 (例如,您同时拥有 HomeShotsAwayShots)。

对于如何在此处构造数据,有许多可能的答案。 我将介绍一个与之前相比变化不大的解决方案 你原来的结构。

Statistics = namedtuple('Statistics', ['shots', 'BT', 'crosses', 'corners', 'goals', 'XG'])
Game = namedtuple('Game', ['home', 'away', 'date', 'home_stats', 'away_stats'])

你可以像这样使用它(我没有在这里包含所有统计数据,只是举几个例子):

def calculate_stats(games, team_name, home_stats_only=False, away_stats_only=False):

    home_stats = [g.home_stats._asdict() for g in games if g.home == team_name]
    away_stats = [g.away_stats._asdict() for g in games if g.away == team_name]

    if away_stats_only:
        input_stats = away_stats
    elif home_stats_only:
        input_stats = home_stats
    else:
        input_stats = home_stats + away_stats

    def sum_on_field(field_name):
        return sum(stats[field_name] for stats in input_stats)

    return {f:sum_on_field(f) for f in Statistics._fields}

然后可以使用它来获得两个 away/home 统计数据:

example_game_1 = Game(
    home='Burnley', 
    away='Arsenal',
    date=datetime.now(),
    home_stats=Statistics(shots=12, BT=26, crosses=21, corners=4, goals=1, XG=1.73),
    away_stats=Statistics(shots=17, BT=26, crosses=22, corners=5, goals=3, XG=2.87),
)

example_game_2 = Game(
    home='Arsenal',
    away='Pessac',
    date=datetime.now(),
    home_stats=Statistics(shots=1, BT=1, crosses=1, corners=1, goals=1, XG=1),
    away_stats=Statistics(shots=2, BT=2, crosses=2, corners=2, goals=2, XG=2),
)

print(calculate_stats([example_game_1, example_game_2], 'Arsenal'))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', home_stats_only=True))
print(calculate_stats([example_game_1, example_game_2], 'Arsenal', away_stats_only=True))

打印:

{'shots': 18, 'BT': 27, 'crosses': 23, 'corners': 6, 'goals': 4, 'XG': 3.87}
{'shots': 1, 'BT': 1, 'crosses': 1, 'corners': 1, 'goals': 1, 'XG': 1}
{'shots': 17, 'BT': 26, 'crosses': 22, 'corners': 5, 'goals': 3, 'XG': 2.87}

处理此类数据时,通常最好使用专门的工具,例如 pandas. It could also be very convenient to use interactive tools, like JupyterLab