IndexError: list index out of range (In whoosh Search Engine library) error at

IndexError: list index out of range (In whoosh Search Engine library) error at

我正在尝试通过快速创建 150 MB 文件的索引。但它显示错误列表索引超出范围:我引用了导致错误的行。即for x in range(len(id)):。逻辑索引记录将相当于文档的ID号。

from whoosh import index

from whoosh.fields import Schema,ID, TEXT,NUMERIC
from whoosh import index
from whoosh.index import create_in

id = []
body = []
Score = []
count=0
doc_path='C:/Users/Abhi/Desktop/My_Experiments_with_truth/extracted_xml.txt'
with open(doc_path,'r+',encoding="utf8") as line:
 for f in line:
    count=count+1
    if f.startswith('Id : '):
            a = f.replace('Id : ','')
            id.append(a)
            #print(a)
    elif f.startswith('body : '):
            b = f.replace('body : ','')
            body.append(b)
            #print(b)
    elif  f.startswith('Score :'):
            c = f.replace('Score :','')
            Score.append(c)
            #print(c)

if not os.path.exists("index"):
        os.mkdir("index")
#design the Schema

schema=Schema(id_details=ID(stored=True),body_details=TEXT(stored=True),Score_details=NUMERIC(stored=True))

print(schema)


#creation of the index

ix = index.create_in("index", schema)

writer = ix.writer()
#Opening writer


for x in range(len(id)):
    writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
writer.commit()
print("Index created")

我认为问题不在于 whoosh,而在于您解析输入文件的方式。如果您在从输入文件中读取数据时不一致,您将获得不同大小的列表 id, body, Score,从而导致此行失败:

  writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])

因为你只比较列表的限制 id : range(len(id))

尝试改进您对文件的解析,或者至少将您的 x 与 id, body, Score

之间的最短列表的限制进行比较