从文本文件中提取特定单词及其后的值
Extract specific word and the value after it from text file
我的输入文件为:
1 sentences, 6 words, 1 OOVs
1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0
7 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss= 0.925606 absloss= 0.856944
file input.txt : 1 sentences, 6 words, 1 OOVs
1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0
7 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss= 0.925606 absloss= 0.856944
我想提取单词 ppl 及其后面的值,在本例中为:ppl=15911.4
我正在使用此代码:
with open("input.txt") as openfile:
for line in openfile:
for part in line.split():
if "ppl=" in part:
print part
然而,这只是提取词 ppl 而不是值。我还想打印文件名。
预期输出:
input.txt, ppl=15911.4
我该如何解决这个问题?
您可以使用enumerate
函数,
with open("input.txt") as openfile:
for line in openfile:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
示例:
>>> fil = '''1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0'''.splitlines()
>>> for line in fil:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
ppl= 15911.4
>>>
只打印第一个值,
>>> for line in fil:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
break
ppl= 15911.4
您可以使用一个简单的计数器来修复它:
found = False
with open("input.txt") as openfile:
for line in openfile:
if not found:
counter = 0
for part in line.split():
counter = counter + 1
if "ppl=" in part:
print part
print line.split()[counter]
found = True
您可以将 line.split()
生成的列表分配给一个变量,然后使用 while 循环和 i 作为计数器进行迭代,当您点击 'ppl=' 时,您可以 return 'ppl=' 和下一个索引
with open("input.txt") as openfile:
for line in openfile:
phrases = line.split()
i = 0
while i < len(phrases):
if 'ppl=' in phrases[i]
print "ppl= " + str(phrases[i + 1])
i += 1
我的输入文件为:
1 sentences, 6 words, 1 OOVs
1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0
7 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss= 0.925606 absloss= 0.856944
file input.txt : 1 sentences, 6 words, 1 OOVs
1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0
7 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss= 0.925606 absloss= 0.856944
我想提取单词 ppl 及其后面的值,在本例中为:ppl=15911.4
我正在使用此代码:
with open("input.txt") as openfile:
for line in openfile:
for part in line.split():
if "ppl=" in part:
print part
然而,这只是提取词 ppl 而不是值。我还想打印文件名。
预期输出:
input.txt, ppl=15911.4
我该如何解决这个问题?
您可以使用enumerate
函数,
with open("input.txt") as openfile:
for line in openfile:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
示例:
>>> fil = '''1 zeroprobs, logprob= -21.0085 ppl= 15911.4 ppl1= 178704
6 words, rank1= 0 rank5= 0 rank10= 0'''.splitlines()
>>> for line in fil:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
ppl= 15911.4
>>>
只打印第一个值,
>>> for line in fil:
s = line.split()
for i,j in enumerate(s):
if j == "ppl=":
print s[i],s[i+1]
break
ppl= 15911.4
您可以使用一个简单的计数器来修复它:
found = False
with open("input.txt") as openfile:
for line in openfile:
if not found:
counter = 0
for part in line.split():
counter = counter + 1
if "ppl=" in part:
print part
print line.split()[counter]
found = True
您可以将 line.split()
生成的列表分配给一个变量,然后使用 while 循环和 i 作为计数器进行迭代,当您点击 'ppl=' 时,您可以 return 'ppl=' 和下一个索引
with open("input.txt") as openfile:
for line in openfile:
phrases = line.split()
i = 0
while i < len(phrases):
if 'ppl=' in phrases[i]
print "ppl= " + str(phrases[i + 1])
i += 1