IndexError: list index out of range in Python Script
IndexError: list index out of range in Python Script
我是 python 的新手,如果这个问题已经得到解答,我深表歉意。我以前使用过这个脚本并且它有效,所以我完全不确定哪里出了问题。
我正在尝试将 MALLET 输出文档转换为主题、权重、值的长列表,而不是主题文档和权重的广泛列表。
这是我尝试转换的原始 csv 的样子,但其中有 30 个主题(它是一个名为 mb_composition.txt 的文本文件):
0 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Abizaid.txt 6.509147794508226E-6 1.8463345214533957E-5 3.301298069640119E-6 0.003825178550032757 0.15240841618294929 0.03903974304065183 0.10454783676528623 0.1316719812119471 1.8018057013225344E-5 4.869261713020613E-6 0.0956868156114931 1.3521101623203115E-5 9.514591058923748E-6 1.822741355900598E-5 4.932324961835634E-4 2.756817586271138E-4 4.039186874601744E-5 1.0503346606335033E-5 1.1466132458804392E-5 0.007003443189848799 6.7094360963952E-6 0.2651753488982284 0.011727025879070194 0.11306132549594633 4.463460490946615E-6 0.0032751230536005056 1.1887304822238514E-5 7.382714572306351E-6 3.538808652077042E-5 0.07158823129977483
1 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Jeffrey,%20Jim%20-%20Chk5-%20ASC%20-%20FINAL%20-%20Sept%202017.docx.txt 4.296636200313062E-6 1.218750594272488E-5 1.5556725986514498E-4 0.043172816021532695 0.04645757277949794 0.01963429696910822 0.1328206370818606 0.116826297071711 1.1893574776047563E-5 3.2141605637859693E-6 0.10242945223692496 0.010439315937573735 0.2478814493196687 1.2031769351093548E-5 0.010142417179693447 2.858721603853616E-5 2.6662348272204834E-5 6.9331747684835E-6 7.745091995495631E-4 0.04235638910274044 4.428844900369446E-6 0.0175105406405736 0.05314379308820005 0.11788631730736487 2.9462944350793084E-6 4.746133386282654E-4 7.846714475661223E-6 4.873270616886766E-6 0.008919869163605806 0.02884824479155971
这是我试图用来转换它的 python 脚本:
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
#outfile.write(fn[46:] + ",")
for i in range(0,59):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
我在终端中使用 python reshape.py 运行 这个,我得到这个错误:
Traceback (most recent call last):
File "reshape.py", line 12, in <module>
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
IndexError: list index out of range
知道我做错了什么吗?我似乎无法弄清楚并感到沮丧,因为我知道我已经成功地多次使用了这个脚本!如果有帮助,我正在使用 Mac OSx Python 版本 2.7.10
问题是您要在 CSV 的每行中查找 60 个主题。
如果您只想打印列表中的主题,直到每行第 n 个主题,您应该根据每行的实际主题数来定义您的范围:
for i in range(len(topics) // 2):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
用更 python 的方式表述,它看起来像这样:
# Group the topics into tuple-pairs for easier management
paired_topics = [tuple(topics[i:i+2]) for i in range(0, len(topics), 2)]
# Iterate the paired topics and print them each on a line of output
for topic in paired_topics:
outfile.write(fn[46:] + ',' + ','.join(topic) + '\n')
您的 'topics' 列表只有 30 个元素?看起来您正在尝试访问远远超出可用范围的项目,即您正在尝试访问主题 [x],其中 x > 30。
您需要调试代码。尝试打印出变量。
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
# outfile.write(fn[46:] + ",")
for i in range(0,59):
# Add a print statement like this
print(f'Topics {i}: {i*2} and {i*2+1}')
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
我是 python 的新手,如果这个问题已经得到解答,我深表歉意。我以前使用过这个脚本并且它有效,所以我完全不确定哪里出了问题。
我正在尝试将 MALLET 输出文档转换为主题、权重、值的长列表,而不是主题文档和权重的广泛列表。
这是我尝试转换的原始 csv 的样子,但其中有 30 个主题(它是一个名为 mb_composition.txt 的文本文件):
0 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Abizaid.txt 6.509147794508226E-6 1.8463345214533957E-5 3.301298069640119E-6 0.003825178550032757 0.15240841618294929 0.03903974304065183 0.10454783676528623 0.1316719812119471 1.8018057013225344E-5 4.869261713020613E-6 0.0956868156114931 1.3521101623203115E-5 9.514591058923748E-6 1.822741355900598E-5 4.932324961835634E-4 2.756817586271138E-4 4.039186874601744E-5 1.0503346606335033E-5 1.1466132458804392E-5 0.007003443189848799 6.7094360963952E-6 0.2651753488982284 0.011727025879070194 0.11306132549594633 4.463460490946615E-6 0.0032751230536005056 1.1887304822238514E-5 7.382714572306351E-6 3.538808652077042E-5 0.07158823129977483
1 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Jeffrey,%20Jim%20-%20Chk5-%20ASC%20-%20FINAL%20-%20Sept%202017.docx.txt 4.296636200313062E-6 1.218750594272488E-5 1.5556725986514498E-4 0.043172816021532695 0.04645757277949794 0.01963429696910822 0.1328206370818606 0.116826297071711 1.1893574776047563E-5 3.2141605637859693E-6 0.10242945223692496 0.010439315937573735 0.2478814493196687 1.2031769351093548E-5 0.010142417179693447 2.858721603853616E-5 2.6662348272204834E-5 6.9331747684835E-6 7.745091995495631E-4 0.04235638910274044 4.428844900369446E-6 0.0175105406405736 0.05314379308820005 0.11788631730736487 2.9462944350793084E-6 4.746133386282654E-4 7.846714475661223E-6 4.873270616886766E-6 0.008919869163605806 0.02884824479155971
这是我试图用来转换它的 python 脚本:
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
#outfile.write(fn[46:] + ",")
for i in range(0,59):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
我在终端中使用 python reshape.py 运行 这个,我得到这个错误:
Traceback (most recent call last):
File "reshape.py", line 12, in <module>
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
IndexError: list index out of range
知道我做错了什么吗?我似乎无法弄清楚并感到沮丧,因为我知道我已经成功地多次使用了这个脚本!如果有帮助,我正在使用 Mac OSx Python 版本 2.7.10
问题是您要在 CSV 的每行中查找 60 个主题。
如果您只想打印列表中的主题,直到每行第 n 个主题,您应该根据每行的实际主题数来定义您的范围:
for i in range(len(topics) // 2):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
用更 python 的方式表述,它看起来像这样:
# Group the topics into tuple-pairs for easier management
paired_topics = [tuple(topics[i:i+2]) for i in range(0, len(topics), 2)]
# Iterate the paired topics and print them each on a line of output
for topic in paired_topics:
outfile.write(fn[46:] + ',' + ','.join(topic) + '\n')
您的 'topics' 列表只有 30 个元素?看起来您正在尝试访问远远超出可用范围的项目,即您正在尝试访问主题 [x],其中 x > 30。
您需要调试代码。尝试打印出变量。
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
# outfile.write(fn[46:] + ",")
for i in range(0,59):
# Add a print statement like this
print(f'Topics {i}: {i*2} and {i*2+1}')
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')