Python - 遍历文件以调用当前行中包含变量的最新行
Python - loop through file to call most recent row that contained a variable in current row
我正在努力提高我的 Python 技能和一般的基本编码。我有一个 csv 文件,其中的前 7 行(包括 header)如下所示:
HomeTeam AwayTeam HomeTeamWin AwayTeamWin
AV MU 1 0
BR QPR 1 0
C E 0 1
MU BR 1 0
QPR C 0 1
E AV 0 1
我正在尝试实现以下代码,以便生成一个输出文件,根据他们最近比赛的结果显示主队是否获胜。我卡在标有 ******
的部分
#start loop
for row in file:
#create empty list to put value we will find into
observation_list=[]
#define variable a as being row[0], i.e. the cell
#in the current row that contains the 'hometeam'
a=row[0]
#*****stuck here*******#
#call the last row to contain variable a i.e. where toprow = the most recent row
#above the current row to have contained varaible a i.e. the value from row[0]
for toprow in file:
#*****stuck here*******#
if (toprow[0] or toprow[1])==a:
#implement the following if statement
#where toprow[0] is the 1st column containing the value
#of the hometeam from the toprow
if (toprow[0]==a):
#implement the following to generate an output file showing
#1 or 0 for home team coming off a win
b=toprow[2]
observation_list.append(b)
with open(Output_file, "ab") as resultFile:
writer = csv.writer(resultFile, lineterminator='\n')
writer.writerow(observation_list)
else (toprow[1]==a):
#implement the following if statement
#where toprow[1] is the 1st column containing the value
#of the hometeam from the toprow
b==toprow[3]
observation_list.append(b])
#implement the following to generate an output file showing
#1 or 0 for home team coming off a win
with open(Output_file, "ab") as resultFile:
writer = csv.writer(resultFile, lineterminator='\n')
writer.writerow(observation_list)
根据我到目前为止所做和阅读的内容,我可以看出有两个问题:
问题 1:如何让第二个 for 循环(标记为 ****)迭代先前读取的行,直到它到达最近的行以包含由 'a' 定义的变量?
问题2:如何从第3行开始代码块?需要这样做的原因是为了防止 A. 读取 header,更重要的是,B. 试图读取不存在的/负的行,即 row1 - 1 = row0,row0 不存在!?
注意所需的输出文件如下:
-blank- #first cell would be empty as there is no data to fill it
-blank- #second cell would be empty as there is no data to fill it
-blank- #third cell would be empty as there is no data to fill it
0 #fourth cell contains 0 as MU lost their most recent game
0 #fifth cell contains 0 as QPR lost their most recent game
1 #sixth cell contains 1 as E won their most recent game
用文字写下您认为需要采取的解决问题的步骤是一件好事。对于这个问题我想:
- 跳过文件的第一行
- 读取一行,并将其拆分成多个部分
- 如果这是主队的第一场比赛打印空白,如果不是则打印上一场比赛的结果。
- 重复直到文件用完。
在读取文件的同时,存储最近玩过的游戏的结果,以便日后查找。 dictionaries are made for this - {team1 : result_of_last_game, team2 : result_of_last_game, ...}
. When looking up each team's first game, there wont be a previous game - the dictionary will throw a KeyError
. the KeyError
can be handled with a try/except
block or collections.defaultdictionary
可以用来解释这一点。
我喜欢在从序列中提取项目时使用 operator.itemgetter
- 它使代码在我稍后查看时更具可读性。
import operator, collections
home = operator.itemgetter(0,2) #first and third item
away = operator.itemgetter(1,3) #second and fourth item
team = operator.itemgetter(0) #first item
#dictionary to hold the previous game's result
#default will be a blank string
last_game = collections.defaultdict(str)
#string to format the output
out = '{}\t{}'
with open('data.txt') as f:
#skip the header
f.next()
#data = map(parse, f)
for line in f:
#split the line into its relavent parts
line = line.strip()
line = line.split()
#extract the team and game result
#--> (team1, result), (team2, result)
h, a = home(line), away(line)
home_team = team(h)
#print the result of the last game
print(out.format(home_team, last_game[home_team]))
#update the dictionary with the results of this game
last_game.update([h,a])
无需打印结果,您可以轻松地将它们写入文件或将它们收集在容器中,稍后再写入文件。
如果您想要 defaultdict
的空字符串以外的内容,您可以这样做
class Foo(object):
def __init__(self, foo):
self.__foo = foo
def __call__(self):
return self.__foo
blank = Foo('-blank-')
last_game = collections.defaultdict(blank)
我正在努力提高我的 Python 技能和一般的基本编码。我有一个 csv 文件,其中的前 7 行(包括 header)如下所示:
HomeTeam AwayTeam HomeTeamWin AwayTeamWin
AV MU 1 0
BR QPR 1 0
C E 0 1
MU BR 1 0
QPR C 0 1
E AV 0 1
我正在尝试实现以下代码,以便生成一个输出文件,根据他们最近比赛的结果显示主队是否获胜。我卡在标有 ******
的部分#start loop
for row in file:
#create empty list to put value we will find into
observation_list=[]
#define variable a as being row[0], i.e. the cell
#in the current row that contains the 'hometeam'
a=row[0]
#*****stuck here*******#
#call the last row to contain variable a i.e. where toprow = the most recent row
#above the current row to have contained varaible a i.e. the value from row[0]
for toprow in file:
#*****stuck here*******#
if (toprow[0] or toprow[1])==a:
#implement the following if statement
#where toprow[0] is the 1st column containing the value
#of the hometeam from the toprow
if (toprow[0]==a):
#implement the following to generate an output file showing
#1 or 0 for home team coming off a win
b=toprow[2]
observation_list.append(b)
with open(Output_file, "ab") as resultFile:
writer = csv.writer(resultFile, lineterminator='\n')
writer.writerow(observation_list)
else (toprow[1]==a):
#implement the following if statement
#where toprow[1] is the 1st column containing the value
#of the hometeam from the toprow
b==toprow[3]
observation_list.append(b])
#implement the following to generate an output file showing
#1 or 0 for home team coming off a win
with open(Output_file, "ab") as resultFile:
writer = csv.writer(resultFile, lineterminator='\n')
writer.writerow(observation_list)
根据我到目前为止所做和阅读的内容,我可以看出有两个问题:
问题 1:如何让第二个 for 循环(标记为 ****)迭代先前读取的行,直到它到达最近的行以包含由 'a' 定义的变量?
问题2:如何从第3行开始代码块?需要这样做的原因是为了防止 A. 读取 header,更重要的是,B. 试图读取不存在的/负的行,即 row1 - 1 = row0,row0 不存在!?
注意所需的输出文件如下:
-blank- #first cell would be empty as there is no data to fill it
-blank- #second cell would be empty as there is no data to fill it
-blank- #third cell would be empty as there is no data to fill it
0 #fourth cell contains 0 as MU lost their most recent game
0 #fifth cell contains 0 as QPR lost their most recent game
1 #sixth cell contains 1 as E won their most recent game
用文字写下您认为需要采取的解决问题的步骤是一件好事。对于这个问题我想:
- 跳过文件的第一行
- 读取一行,并将其拆分成多个部分
- 如果这是主队的第一场比赛打印空白,如果不是则打印上一场比赛的结果。
- 重复直到文件用完。
在读取文件的同时,存储最近玩过的游戏的结果,以便日后查找。 dictionaries are made for this - {team1 : result_of_last_game, team2 : result_of_last_game, ...}
. When looking up each team's first game, there wont be a previous game - the dictionary will throw a KeyError
. the KeyError
can be handled with a try/except
block or collections.defaultdictionary
可以用来解释这一点。
我喜欢在从序列中提取项目时使用 operator.itemgetter
- 它使代码在我稍后查看时更具可读性。
import operator, collections
home = operator.itemgetter(0,2) #first and third item
away = operator.itemgetter(1,3) #second and fourth item
team = operator.itemgetter(0) #first item
#dictionary to hold the previous game's result
#default will be a blank string
last_game = collections.defaultdict(str)
#string to format the output
out = '{}\t{}'
with open('data.txt') as f:
#skip the header
f.next()
#data = map(parse, f)
for line in f:
#split the line into its relavent parts
line = line.strip()
line = line.split()
#extract the team and game result
#--> (team1, result), (team2, result)
h, a = home(line), away(line)
home_team = team(h)
#print the result of the last game
print(out.format(home_team, last_game[home_team]))
#update the dictionary with the results of this game
last_game.update([h,a])
无需打印结果,您可以轻松地将它们写入文件或将它们收集在容器中,稍后再写入文件。
如果您想要 defaultdict
的空字符串以外的内容,您可以这样做
class Foo(object):
def __init__(self, foo):
self.__foo = foo
def __call__(self):
return self.__foo
blank = Foo('-blank-')
last_game = collections.defaultdict(blank)