IndexError: list index out of range (In whoosh Search Engine library) error at
IndexError: list index out of range (In whoosh Search Engine library) error at
我正在尝试通过快速创建 150 MB 文件的索引。但它显示错误列表索引超出范围:我引用了导致错误的行。即for x in range(len(id)):
。逻辑索引记录将相当于文档的ID号。
from whoosh import index
from whoosh.fields import Schema,ID, TEXT,NUMERIC
from whoosh import index
from whoosh.index import create_in
id = []
body = []
Score = []
count=0
doc_path='C:/Users/Abhi/Desktop/My_Experiments_with_truth/extracted_xml.txt'
with open(doc_path,'r+',encoding="utf8") as line:
for f in line:
count=count+1
if f.startswith('Id : '):
a = f.replace('Id : ','')
id.append(a)
#print(a)
elif f.startswith('body : '):
b = f.replace('body : ','')
body.append(b)
#print(b)
elif f.startswith('Score :'):
c = f.replace('Score :','')
Score.append(c)
#print(c)
if not os.path.exists("index"):
os.mkdir("index")
#design the Schema
schema=Schema(id_details=ID(stored=True),body_details=TEXT(stored=True),Score_details=NUMERIC(stored=True))
print(schema)
#creation of the index
ix = index.create_in("index", schema)
writer = ix.writer()
#Opening writer
for x in range(len(id)):
writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
writer.commit()
print("Index created")
我认为问题不在于 whoosh,而在于您解析输入文件的方式。如果您在从输入文件中读取数据时不一致,您将获得不同大小的列表 id, body, Score
,从而导致此行失败:
writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
因为你只比较列表的限制 id
: range(len(id))
尝试改进您对文件的解析,或者至少将您的 x 与 id, body, Score
之间的最短列表的限制进行比较
我正在尝试通过快速创建 150 MB 文件的索引。但它显示错误列表索引超出范围:我引用了导致错误的行。即for x in range(len(id)):
。逻辑索引记录将相当于文档的ID号。
from whoosh import index
from whoosh.fields import Schema,ID, TEXT,NUMERIC
from whoosh import index
from whoosh.index import create_in
id = []
body = []
Score = []
count=0
doc_path='C:/Users/Abhi/Desktop/My_Experiments_with_truth/extracted_xml.txt'
with open(doc_path,'r+',encoding="utf8") as line:
for f in line:
count=count+1
if f.startswith('Id : '):
a = f.replace('Id : ','')
id.append(a)
#print(a)
elif f.startswith('body : '):
b = f.replace('body : ','')
body.append(b)
#print(b)
elif f.startswith('Score :'):
c = f.replace('Score :','')
Score.append(c)
#print(c)
if not os.path.exists("index"):
os.mkdir("index")
#design the Schema
schema=Schema(id_details=ID(stored=True),body_details=TEXT(stored=True),Score_details=NUMERIC(stored=True))
print(schema)
#creation of the index
ix = index.create_in("index", schema)
writer = ix.writer()
#Opening writer
for x in range(len(id)):
writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
writer.commit()
print("Index created")
我认为问题不在于 whoosh,而在于您解析输入文件的方式。如果您在从输入文件中读取数据时不一致,您将获得不同大小的列表 id, body, Score
,从而导致此行失败:
writer.add_document(id_details=id[x],body_details=body[x],Score_details=Score[x])
因为你只比较列表的限制 id
: range(len(id))
尝试改进您对文件的解析,或者至少将您的 x 与 id, body, Score