高效寻找最长匹配前缀串
Efficiently finding the longest matching prefix string
我目前的实现是这样的:
def find_longest_matching_option(option, options):
options = sorted(options, key=len)
longest_matching_option = None
for valid_option in options:
# Don't want to treat "oreo" as matching "o",
# match only if it's "o reo"
if re.match(ur"^{}\s+".format(valid_option), option.strip()):
longest_matching_option = valid_option
return longest_matching_option
我尝试做的一些例子:
"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.
大多数情况下,我在这里寻找效率。当前的实现有效,但有人告诉我它是 O(m^2 * n)
,这很糟糕。
提前致谢!
正则表达式在这里有点矫枉过正;您可以简单地将 space 附加到每个字符串,然后再比较它们以获得相同的结果。
您也不需要对数据进行排序。简单地遍历每个值会更有效。
def find_longest_matching_option(option, options):
# append a space so that find_longest_matching_option("a b", ["a b"])
# works as expected
option += ' '
longest = None
for valid_option in options:
# append a space to each option so that only complete
# words are matched
valid_option += ' '
if option.startswith(valid_option):
# remember the longest match
if longest is None or len(longest) < len(valid_option):
longest = valid_option
if longest is not None:
# remove the trailing space
longest = longest[:-1]
return longest
让我们从 foo
开始。
def foo(x, y):
x, y = x.strip(), y.strip()
return x == y or x.startswith(y + " ")
foo
returns 如果两个字符串相等,或者一个(加上 space)是另一个字符串的子字符串,则为真。
接下来,给定一个 case 字符串和一个选项列表,您可以使用 filter
找到给定 case 字符串的所有有效子字符串,然后应用 max
找到最长的一个(见下面的测试)。
这里有一些 foo
的测试用例。出于演示目的,我将使用 partial
将 foo
柯里化到高阶函数。
from functools import partial
cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"]
options = [
["foo", "foo bar", "foo bar baz"],
["foo", "foo bar", "foo bar baz"],
["hello", "something_else"],
["a", "a b"],
["a", "a b\t"]
]
p_list = [partial(foo, c) for c in cases]
for p, o in zip(p_list, options):
print(max(filter(p, o), key=len))
foo bar baz
foo bar
hello
a b
a b
我目前的实现是这样的:
def find_longest_matching_option(option, options):
options = sorted(options, key=len)
longest_matching_option = None
for valid_option in options:
# Don't want to treat "oreo" as matching "o",
# match only if it's "o reo"
if re.match(ur"^{}\s+".format(valid_option), option.strip()):
longest_matching_option = valid_option
return longest_matching_option
我尝试做的一些例子:
"foo bar baz something", ["foo", "foo bar", "foo bar baz"]
# -> "foo bar baz"
"foo bar bazsomething", (same as above)
# -> "foo bar"
"hello world", ["hello", "something_else"]
# -> "hello"
"a b", ["a", "a b"]
# -> "a b" # Doesn't work in current impl.
大多数情况下,我在这里寻找效率。当前的实现有效,但有人告诉我它是 O(m^2 * n)
,这很糟糕。
提前致谢!
正则表达式在这里有点矫枉过正;您可以简单地将 space 附加到每个字符串,然后再比较它们以获得相同的结果。
您也不需要对数据进行排序。简单地遍历每个值会更有效。
def find_longest_matching_option(option, options):
# append a space so that find_longest_matching_option("a b", ["a b"])
# works as expected
option += ' '
longest = None
for valid_option in options:
# append a space to each option so that only complete
# words are matched
valid_option += ' '
if option.startswith(valid_option):
# remember the longest match
if longest is None or len(longest) < len(valid_option):
longest = valid_option
if longest is not None:
# remove the trailing space
longest = longest[:-1]
return longest
让我们从 foo
开始。
def foo(x, y):
x, y = x.strip(), y.strip()
return x == y or x.startswith(y + " ")
foo
returns 如果两个字符串相等,或者一个(加上 space)是另一个字符串的子字符串,则为真。
接下来,给定一个 case 字符串和一个选项列表,您可以使用 filter
找到给定 case 字符串的所有有效子字符串,然后应用 max
找到最长的一个(见下面的测试)。
这里有一些 foo
的测试用例。出于演示目的,我将使用 partial
将 foo
柯里化到高阶函数。
from functools import partial
cases = ["foo bar baz something", "foo bar bazsomething", "hello world", "a b", "a b"]
options = [
["foo", "foo bar", "foo bar baz"],
["foo", "foo bar", "foo bar baz"],
["hello", "something_else"],
["a", "a b"],
["a", "a b\t"]
]
p_list = [partial(foo, c) for c in cases]
for p, o in zip(p_list, options):
print(max(filter(p, o), key=len))
foo bar baz
foo bar
hello
a b
a b