IndexError: list index out of range in Python Script

IndexError: list index out of range in Python Script

我是 python 的新手,如果这个问题已经得到解答,我深表歉意。我以前使用过这个脚本并且它有效,所以我完全不确定哪里出了问题。

我正在尝试将 MALLET 输出文档转换为主题、权重、值的长列表,而不是主题文档和权重的广泛列表。

这是我尝试转换的原始 csv 的样子,但其中有 30 个主题(它是一个名为 mb_composition.txt 的文本文件):

0   file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Abizaid.txt    6.509147794508226E-6    1.8463345214533957E-5   3.301298069640119E-6    0.003825178550032757    0.15240841618294929 0.03903974304065183 0.10454783676528623 0.1316719812119471  1.8018057013225344E-5   4.869261713020613E-6    0.0956868156114931  1.3521101623203115E-5   9.514591058923748E-6    1.822741355900598E-5    4.932324961835634E-4    2.756817586271138E-4    4.039186874601744E-5    1.0503346606335033E-5   1.1466132458804392E-5   0.007003443189848799    6.7094360963952E-6  0.2651753488982284  0.011727025879070194    0.11306132549594633 4.463460490946615E-6    0.0032751230536005056   1.1887304822238514E-5   7.382714572306351E-6    3.538808652077042E-5    0.07158823129977483
1   file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Jeffrey,%20Jim%20-%20Chk5-%20ASC%20-%20FINAL%20-%20Sept%202017.docx.txt    4.296636200313062E-6    1.218750594272488E-5    1.5556725986514498E-4   0.043172816021532695    0.04645757277949794 0.01963429696910822 0.1328206370818606  0.116826297071711   1.1893574776047563E-5   3.2141605637859693E-6   0.10242945223692496 0.010439315937573735    0.2478814493196687  1.2031769351093548E-5   0.010142417179693447    2.858721603853616E-5    2.6662348272204834E-5   6.9331747684835E-6  7.745091995495631E-4    0.04235638910274044 4.428844900369446E-6    0.0175105406405736  0.05314379308820005 0.11788631730736487 2.9462944350793084E-6   4.746133386282654E-4    7.846714475661223E-6    4.873270616886766E-6    0.008919869163605806    0.02884824479155971

这是我试图用来转换它的 python 脚本:

infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')

outfile.write('file,topicnum,weight\n')
for line in infile:
    tokens = line.split('\t')
    fn = tokens[1]
    topics = tokens[2:]
    #outfile.write(fn[46:] + ",")
    for i in range(0,59):
        outfile.write(fn[46:] + ",")
        outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')

我在终端中使用 python reshape.py 运行 这个,我得到这个错误:

Traceback (most recent call last):
  File "reshape.py", line 12, in <module>
    outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
IndexError: list index out of range

知道我做错了什么吗?我似乎无法弄清楚并感到沮丧,因为我知道我已经成功地多次使用了这个脚本!如果有帮助,我正在使用 Mac OSx Python 版本 2.7.10

问题是您要在 CSV 的每行中查找 60 个主题。

如果您只想打印列表中的主题,直到每行第 n 个主题,您应该根据每行的实际主题数来定义您的范围:

for i in range(len(topics) // 2):
    outfile.write(fn[46:] + ",")
    outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')

用更 python 的方式表述,它看起来像这样:

# Group the topics into tuple-pairs for easier management
paired_topics = [tuple(topics[i:i+2]) for i in range(0, len(topics), 2)]
# Iterate the paired topics and print them each on a line of output
for topic in paired_topics:
    outfile.write(fn[46:] + ',' + ','.join(topic) + '\n')

您的 'topics' 列表只有 30 个元素?看起来您正在尝试访问远远超出可用范围的项目,即您正在尝试访问主题 [x],其中 x > 30。

您需要调试代码。尝试打印出变量。

infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')

outfile.write('file,topicnum,weight\n')
for line in infile:
    tokens = line.split('\t')
    fn = tokens[1]
    topics = tokens[2:]
    # outfile.write(fn[46:] + ",")
    for i in range(0,59):
        # Add a print statement like this
        print(f'Topics {i}: {i*2} and {i*2+1}')
        outfile.write(fn[46:] + ",")
        outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')