Python - 遍历文件以调用当前行中包含变量的最新行

Python - loop through file to call most recent row that contained a variable in current row

我正在努力提高我的 Python 技能和一般的基本编码。我有一个 csv 文件,其中的前 7 行(包括 header)如下所示:

HomeTeam     AwayTeam      HomeTeamWin     AwayTeamWin
AV           MU            1               0
BR           QPR           1               0
C            E             0               1
MU           BR            1               0
QPR          C             0               1
E            AV            0               1

我正在尝试实现以下代码,以便生成一个输出文件,根据他们最近比赛的结果显示主队是否获胜。我卡在标有 ******

的部分
#start loop
for row in file:
    #create empty list to put value we will find into
    observation_list=[]
    #define variable a as being row[0], i.e. the cell 
    #in the current row that contains the 'hometeam'
    a=row[0]
    #*****stuck here*******#
    #call the last row to contain variable a i.e. where toprow = the most recent row
    #above the current row to have contained varaible a i.e. the value from row[0]
    for toprow in file:
    #*****stuck here*******#
        if (toprow[0] or toprow[1])==a: 
            #implement the following if statement
            #where toprow[0] is the 1st column containing the value
            #of the hometeam from the toprow
            if (toprow[0]==a):      
            #implement the following to generate an output file showing
            #1 or 0 for home team coming off a win
                b=toprow[2]
                observation_list.append(b)
                with open(Output_file, "ab") as resultFile:
                     writer = csv.writer(resultFile, lineterminator='\n')
                     writer.writerow(observation_list)  
            else (toprow[1]==a):
            #implement the following if statement
            #where toprow[1] is the 1st column containing the value
            #of the hometeam from the toprow
                b==toprow[3]
                observation_list.append(b])
            #implement the following to generate an output file showing
            #1 or 0 for home team coming off a win
                with open(Output_file, "ab") as resultFile:
                     writer = csv.writer(resultFile, lineterminator='\n')
                     writer.writerow(observation_list)

根据我到目前为止所做和阅读的内容,我可以看出有两个问题:

问题 1:如何让第二个 for 循环(标记为 ****)迭代先前读取的行,直到它到达最近的行以包含由 'a' 定义的变量?

问题2:如何从第3行开始代码块?需要这样做的原因是为了防止 A. 读取 header,更重要的是,B. 试图读取不存在的/负的行,即 row1 - 1 = row0,row0 不存在!?

注意所需的输出文件如下:

-blank-      #first cell would be empty as there is no data to fill it
-blank-      #second cell would be empty as there is no data to fill it
-blank-      #third cell would be empty as there is no data to fill it
0            #fourth cell contains 0 as MU lost their most recent game
0            #fifth cell contains 0 as QPR lost their most recent game
1            #sixth cell contains 1 as E won their most recent game

用文字写下您认为需要采取的解决问题的步骤是一件好事。对于这个问题我想:

  1. 跳过文件的第一行
  2. 读取一行,并将其拆分成多个部分
  3. 如果这是主队的第一场比赛打印空白,如果不是则打印上一场比赛的结果。
  4. 重复直到文件用完。

在读取文件的同时,存储最近玩过的游戏的结果,以便日后查找。 dictionaries are made for this - {team1 : result_of_last_game, team2 : result_of_last_game, ...}. When looking up each team's first game, there wont be a previous game - the dictionary will throw a KeyError. the KeyError can be handled with a try/except block or collections.defaultdictionary 可以用来解释这一点。

我喜欢在从序列中提取项目时使用 operator.itemgetter - 它使代码在我稍后查看时更具可读性。

import operator, collections

home = operator.itemgetter(0,2)    #first and third item
away = operator.itemgetter(1,3)    #second and fourth item
team = operator.itemgetter(0)      #first item

#dictionary to hold the previous game's result
#default will be a blank string
last_game = collections.defaultdict(str)

#string to format the output
out = '{}\t{}'
with open('data.txt') as f:
    #skip the header
    f.next()
    #data = map(parse, f)
    for line in f:
        #split the line into its relavent parts
        line = line.strip()
        line = line.split()
        #extract the team and game result
        #--> (team1, result), (team2, result)
        h, a = home(line), away(line)
        home_team = team(h)
        #print the result of the last game
        print(out.format(home_team, last_game[home_team]))
        #update the dictionary with the results of this game
        last_game.update([h,a])

无需打印结果,您可以轻松地将它们写入文件或将它们收集在容器中,稍后再写入文件。


如果您想要 defaultdict 的空字符串以外的内容,您可以这样做

class Foo(object):
    def __init__(self, foo):
        self.__foo = foo
    def __call__(self):
        return self.__foo
blank = Foo('-blank-')
last_game = collections.defaultdict(blank)