Python:迭代问题
Python: iteration problems
我正在对一个文本文件执行文本处理,并且一直在尝试迭代到一个 for 循环中。
fields = [1, 2, 3, 4, 5]
i = 0
with open('file path', 'r') as f:
for line in f:
# while i is smaller than the number of fields (=5)
while i <= len(fields)-1:
currentfield = fields[i]
# if the first character of the line matches currentfield
# (that being a number)
if line[0] == currentfield:
print(line[4:]) # print the value in the "third column"
i += 1
文本文件"f"有这样的东西(破折号之间的数字表示年份,每一年都有自己的"entry"):
-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
1 6563453
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
文本文件中实际上没有列,但是 space 字段编号(即 1、2、3、4、5)和后面的值(即 17824)之间有两个制表符- space秒。我只是不知道如何拨打 17824。
我想做的是遍历每个 entry/year 的所有字段,但输出只给我第一个字段的值,1.Thus 我得到如下输出:
17824
3224
6563453
它不会遍历所有字段,而是只遍历第一个字段。我如何修复我的代码,以便将输出创建为类似 table 的形式,并在其中迭代字段 2、3、4 和 5?像这样:
17824 20131125192004.9 690714s1969 dcu 000 0 eng ...and so on
3224 20w125192004.9 690714s1969 dcu 000 0 eng ...and so on
6563453 2013341524626245.9 484914s1969 dcu 000 0 eng ...and so on
编辑:我知道我没说清楚,所以我添加了一些部分。
这对你有帮助:
for line in f:
print '\nline[0] is %s' % line[0]
for currentfield in fields: # loop through all fields
# convert currentfield to string
if line[0] == str(currentfield): #if the first character of the line matches currentfield (that being a number)
print 'Printing field %d' % current field # debugging
print line[4:] #print the value in the "third column"
这给了我:
u'''line[0] is -
line[0] is 1
Printing field 1
17824
line[0] is 2
Printing field 2
20131125192004.9
line[0] is 3
Printing field 3
690714s1969 dcu 000 0 eng
line[0] is 4
Printing field 4
a 75601809
line[0] is 4
Printing field 4
a DLC
line[0] is 4
Printing field 4
b eng
line[0] is 4
Printing field 4
c DLC
line[0] is 5
Printing field 5
a WA 750
line[0] is -
line[0] is 1
Printing field 1
3224
line[0] is 2
Printing field 2
20w125192004.9
line[0] is 3
Printing field 3
690714s1969 dcu 000 0 eng
line[0] is 5
Printing field 5
a WA 120
line[0] is -
line[0] is 1
Printing field 1
6563453
line[0] is 2
Printing field 2
2013341524626245.9
line[0] is 3
Printing field 3
484914s1969 dcu 000 0 eng
line[0] is 4
Printing field 4
a 75601809
line[0] is 4
Printing field 4
a eng
line[0] is 4
Printing field 4
c DLC
line[0] is 5
Printing field 5
a WA 345'''
顺便说一下,将 line[:4]
更改为 line[:8]
将根据您上面粘贴的数据给出第三列。
然后您可以使用正则表达式删除第三列数据后 space 之后的所有内容。
为您更改后的问题编辑
在这里,我连接每一行并删除所有 space,将列作为带有 l = [el for el in ''.join(line) if el != '']
的列表。然后您可以通过直接引用它来索引该列,例如对于第 4 列:l[4]
for line in f:
l = [el for el in ''.join(line) if el != '']
print '\nline[0] is %s' % line[0]
for currentfield in fields: # loop through all fields
# convert currentfield to string
if l[0] == str(currentfield): #if the first character of the line matches currentfield (that being a number)
print 'Printing field %d' % current field # debugging
print l[currentfield] #print the value in the "third column"
我正在对一个文本文件执行文本处理,并且一直在尝试迭代到一个 for 循环中。
fields = [1, 2, 3, 4, 5]
i = 0
with open('file path', 'r') as f:
for line in f:
# while i is smaller than the number of fields (=5)
while i <= len(fields)-1:
currentfield = fields[i]
# if the first character of the line matches currentfield
# (that being a number)
if line[0] == currentfield:
print(line[4:]) # print the value in the "third column"
i += 1
文本文件"f"有这样的东西(破折号之间的数字表示年份,每一年都有自己的"entry"):
-------------2000--------------
1 17824
2 20131125192004.9
3 690714s1969 dcu 000 0 eng
4 a 75601809
4 a DLC
4 b eng
4 c DLC
5 a WA 750
-------------2001--------------
1 3224
2 20w125192004.9
3 690714s1969 dcu 000 0 eng
5 a WA 120
-------------2002--------------
1 6563453
2 2013341524626245.9
3 484914s1969 dcu 000 0 eng
4 a 75601809
4 a eng
4 c DLC
5 a WA 345
文本文件中实际上没有列,但是 space 字段编号(即 1、2、3、4、5)和后面的值(即 17824)之间有两个制表符- space秒。我只是不知道如何拨打 17824。
我想做的是遍历每个 entry/year 的所有字段,但输出只给我第一个字段的值,1.Thus 我得到如下输出:
17824
3224
6563453
它不会遍历所有字段,而是只遍历第一个字段。我如何修复我的代码,以便将输出创建为类似 table 的形式,并在其中迭代字段 2、3、4 和 5?像这样:
17824 20131125192004.9 690714s1969 dcu 000 0 eng ...and so on
3224 20w125192004.9 690714s1969 dcu 000 0 eng ...and so on
6563453 2013341524626245.9 484914s1969 dcu 000 0 eng ...and so on
编辑:我知道我没说清楚,所以我添加了一些部分。
这对你有帮助:
for line in f:
print '\nline[0] is %s' % line[0]
for currentfield in fields: # loop through all fields
# convert currentfield to string
if line[0] == str(currentfield): #if the first character of the line matches currentfield (that being a number)
print 'Printing field %d' % current field # debugging
print line[4:] #print the value in the "third column"
这给了我:
u'''line[0] is -
line[0] is 1
Printing field 1
17824
line[0] is 2
Printing field 2
20131125192004.9
line[0] is 3
Printing field 3
690714s1969 dcu 000 0 eng
line[0] is 4
Printing field 4
a 75601809
line[0] is 4
Printing field 4
a DLC
line[0] is 4
Printing field 4
b eng
line[0] is 4
Printing field 4
c DLC
line[0] is 5
Printing field 5
a WA 750
line[0] is -
line[0] is 1
Printing field 1
3224
line[0] is 2
Printing field 2
20w125192004.9
line[0] is 3
Printing field 3
690714s1969 dcu 000 0 eng
line[0] is 5
Printing field 5
a WA 120
line[0] is -
line[0] is 1
Printing field 1
6563453
line[0] is 2
Printing field 2
2013341524626245.9
line[0] is 3
Printing field 3
484914s1969 dcu 000 0 eng
line[0] is 4
Printing field 4
a 75601809
line[0] is 4
Printing field 4
a eng
line[0] is 4
Printing field 4
c DLC
line[0] is 5
Printing field 5
a WA 345'''
顺便说一下,将 line[:4]
更改为 line[:8]
将根据您上面粘贴的数据给出第三列。
然后您可以使用正则表达式删除第三列数据后 space 之后的所有内容。
为您更改后的问题编辑
在这里,我连接每一行并删除所有 space,将列作为带有 l = [el for el in ''.join(line) if el != '']
的列表。然后您可以通过直接引用它来索引该列,例如对于第 4 列:l[4]
for line in f:
l = [el for el in ''.join(line) if el != '']
print '\nline[0] is %s' % line[0]
for currentfield in fields: # loop through all fields
# convert currentfield to string
if l[0] == str(currentfield): #if the first character of the line matches currentfield (that being a number)
print 'Printing field %d' % current field # debugging
print l[currentfield] #print the value in the "third column"