IndexError: list index out of range in a loop of readlines()

IndexError: list index out of range in a loop of readlines()

我不明白为什么这给了我 'IndexError: list index out of range'。 我正在阅读一个简单的 csv.file 并尝试获取以逗号分隔的值。

with open('project_twitter_data.csv','r') as twf:

    tw = twf.readlines()[1:] # I dont need the very first line

    for i in tw:
        linelst = i.strip().split(",")

        RT = linelst[1]
        RP = linelst[2]

        rows = "{}, {}".format(RT,RP)

我的输出看起来像这样


print(tw) # the original strings.
..\nBORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6\nREASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0\nHOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0\n\n"

print (i)
..
BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6

REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0

HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0

print(linelst)
..
['BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.', '4', '6']
["REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love", '19', '0']
["HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a", '0', '0']
['']

print(rows) 
..
4, 6
19, 0
0, 0


# the error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-f27e87689f41> in <module>
     6         linelst = i.strip().split(",")
     7 #        print(linelst)
----> 8         RT = linelst[1]
     9         RP = linelst[2]
   

IndexError: list index out of range

我做错了什么?

我还注意到,在我使用 strip().split(",") 之后 [' '],我的列表的最后出现了一个空列表。 我可以用 twf.readlines()[1:][:-1] 删除它,但错误仍然存​​在。 谢谢你的建议。

你的最后一行在剥离后是空的,所以 split 生成一个 list 的空字符串。

最简单的解决方案是明确跳过空行:

with open('project_twitter_data.csv','r') as twf:

    next(twf, None)  # Advance past first line without needing to slurp whole file into memory and
                     # slice it, tying peak memory usage to max line size, not size of file

    for line in twf:
        line = line.strip()
        if not line:
            continue
        linelst = line.split(",")

        # If non-empty, but incomplete lines should be ignored:
        if len(linelst) < 3:
            continue

        RT = linelst[1]
        RP = linelst[2]

        rows = "{}, {}".format(RT,RP)

或者更简单,使用 EAFP patterns and the csv module,您在处理 CSV 文件时应该始终使用它(格式比“用逗号分隔”要复杂得多):

import csv

with open('project_twitter_data.csv', 'r', newline='') as twf:  # newline='' needed for proper CSV dialect handling
    csvf = csv.reader(twf)
    next(csvf, None)  # Advance past first row without needing to slurp whole file into memory and
                      # slice it, tying peak memory usage to max line size, not size of file

    for row in csvf:
        try:
            RT, RP = row[1:3]
        except ValueError:
            continue  # Didn't have enough elements, incomplete line
 
        rows = "{}, {}".format(RT,RP)

注意:在这两种情况下,我都做了一些小改进以避免大型临时列表,并调整了一些小东西以提高可读性(将 str 变量命名为 i 是错误的形式;i 通常用于索引,或至少用于整数,并且您有一个现成的更清晰的名称,因此即使像 x 这样的占位符也不合适)。