如何在whoosh中return匹配我们搜索关键词的对应行?
How to return the corresponding line that matches our search keyword in whoosh?
假设给定文件 a.txt
:
hello world
good morning world
good night world
鉴于我要搜索的关键字是 morning
,我想使用 whoosh python 库来 return 匹配关键字 morning
的行文本文件 a.txt
。所以,它将 return good morning world
。我怎样才能做到这一点?
更新:这是我的架构:
schema = Schema(title=TEXT(stored=True),
path=ID(stored=True),
content=TEXT(stored=True))
然后我将作者 add_document 添加到内容字段
每行索引文本文件并将行号存储为 NUMERIC
字段,整行存储为 ID
字段(存储很便宜,对吧!)。
类似于以下内容(未经测试):
schema = Schema(
title=TEXT(stored=True),
path=ID(stored=True),
content=TEXT(stored=True),
line_number=NUMERIC(int, 32, stored=True, signed=False),
line_text=ID(stored=True),
)
ix = index.open_dir("index")
writer = ix.writer()
with open('a.txt') as f:
for line_number, line in enumerate(f):
writer.add_document(
title='This is a title',
path='a.txt',
content=line,
line_number=line_number,
line_text=line,
)
很明显,您可以将其扩展为索引多个文本文件:
files_to_index = [
{'title': 'Title A', 'path': 'a.txt'},
{'title': 'Title B', 'path': 'b.txt'},
{'title': 'Title C', 'path': 'c.txt'},
]
ix = index.open_dir("index")
writer = ix.writer()
for file_to_index in files_to_index:
with open(file_to_index['path']) as f:
for line_number, line in enumerate(f):
writer.add_document(
title=file_to_index['title'],
path=file_to_index['path'],
content=line,
line_number=line_number,
line_text=line,
)
假设给定文件 a.txt
:
hello world
good morning world
good night world
鉴于我要搜索的关键字是 morning
,我想使用 whoosh python 库来 return 匹配关键字 morning
的行文本文件 a.txt
。所以,它将 return good morning world
。我怎样才能做到这一点?
更新:这是我的架构:
schema = Schema(title=TEXT(stored=True),
path=ID(stored=True),
content=TEXT(stored=True))
然后我将作者 add_document 添加到内容字段
每行索引文本文件并将行号存储为 NUMERIC
字段,整行存储为 ID
字段(存储很便宜,对吧!)。
类似于以下内容(未经测试):
schema = Schema(
title=TEXT(stored=True),
path=ID(stored=True),
content=TEXT(stored=True),
line_number=NUMERIC(int, 32, stored=True, signed=False),
line_text=ID(stored=True),
)
ix = index.open_dir("index")
writer = ix.writer()
with open('a.txt') as f:
for line_number, line in enumerate(f):
writer.add_document(
title='This is a title',
path='a.txt',
content=line,
line_number=line_number,
line_text=line,
)
很明显,您可以将其扩展为索引多个文本文件:
files_to_index = [
{'title': 'Title A', 'path': 'a.txt'},
{'title': 'Title B', 'path': 'b.txt'},
{'title': 'Title C', 'path': 'c.txt'},
]
ix = index.open_dir("index")
writer = ix.writer()
for file_to_index in files_to_index:
with open(file_to_index['path']) as f:
for line_number, line in enumerate(f):
writer.add_document(
title=file_to_index['title'],
path=file_to_index['path'],
content=line,
line_number=line_number,
line_text=line,
)