Python 文本搜索库
Python text search library
我正在寻找可以让我执行以下操作的库:
matches(
user_input="hello world how are you what are you doing",
keywords='+world -tigers "how are" -"bye bye"'
)
基本上我希望它根据单词的存在、单词的缺失和单词序列来匹配字符串。我不需要像 Solr 这样的搜索引擎,因为字符串不会事先知道并且只会被搜索一次。这样的图书馆是否已经存在,如果存在,我在哪里可以找到它?还是我注定要创建一个正则表达式生成器?
regex
module 支持命名列表:
import regex
def match_words(words, string):
return regex.search(r"\b\L<words>\b", string, words=words)
def match(string, include_words, exclude_words):
return (match_words(include_words, string) and
not match_words(exclude_words, string))
示例:
if match("hello world how are you what are you doing",
include_words=["world", "how are"],
exclude_words=["tigers", "bye bye"]):
print('matches')
您可以使用标准 re
模块实现命名列表,例如:
import re
def match_words(words, string):
re_words = '|'.join(map(re.escape, sorted(words, key=len, reverse=True)))
return re.search(r"\b(?:{words})\b".format(words=re_words), string)
how do I build the list of included and excluded words based on the +, -, and "" grammar?
您可以使用 shlex.split()
:
import shlex
include_words, exclude_words = [], []
for word in shlex.split('+world -tigers "how are" -"bye bye"'):
(exclude_words if word.startswith('-') else include_words).append(word.lstrip('-+'))
print(include_words, exclude_words)
# -> (['world', 'how are'], ['tigers', 'bye bye'])
从你给出的例子来看,你不需要正则表达式,除非你在单词中寻找 patterns/expressions..
d="---your string ---"
mylist= d.split()
M=[]
Excl=["---excluded words---"]
for word in mylist:
if word not in Excl:
M.append(word)
print M
您可以编写可与任何字符串列表和排除列表一起使用的通用函数。
我正在寻找可以让我执行以下操作的库:
matches(
user_input="hello world how are you what are you doing",
keywords='+world -tigers "how are" -"bye bye"'
)
基本上我希望它根据单词的存在、单词的缺失和单词序列来匹配字符串。我不需要像 Solr 这样的搜索引擎,因为字符串不会事先知道并且只会被搜索一次。这样的图书馆是否已经存在,如果存在,我在哪里可以找到它?还是我注定要创建一个正则表达式生成器?
regex
module 支持命名列表:
import regex
def match_words(words, string):
return regex.search(r"\b\L<words>\b", string, words=words)
def match(string, include_words, exclude_words):
return (match_words(include_words, string) and
not match_words(exclude_words, string))
示例:
if match("hello world how are you what are you doing",
include_words=["world", "how are"],
exclude_words=["tigers", "bye bye"]):
print('matches')
您可以使用标准 re
模块实现命名列表,例如:
import re
def match_words(words, string):
re_words = '|'.join(map(re.escape, sorted(words, key=len, reverse=True)))
return re.search(r"\b(?:{words})\b".format(words=re_words), string)
how do I build the list of included and excluded words based on the +, -, and "" grammar?
您可以使用 shlex.split()
:
import shlex
include_words, exclude_words = [], []
for word in shlex.split('+world -tigers "how are" -"bye bye"'):
(exclude_words if word.startswith('-') else include_words).append(word.lstrip('-+'))
print(include_words, exclude_words)
# -> (['world', 'how are'], ['tigers', 'bye bye'])
从你给出的例子来看,你不需要正则表达式,除非你在单词中寻找 patterns/expressions..
d="---your string ---"
mylist= d.split()
M=[]
Excl=["---excluded words---"]
for word in mylist:
if word not in Excl:
M.append(word)
print M
您可以编写可与任何字符串列表和排除列表一起使用的通用函数。