查找 "random" 字符串列表中至少 2 个元素的最长前缀
Finding longest prefix present on at least 2 elements on a list of "random" strings
给定一个字符串列表,例如:
myList = ["foo", "foobar", "football", "footbag", "bar"]
找到列表中至少 2 个字符串中出现的最长前缀:
Longest prefix is "footba" present in "football" and "footbag"
列表将通过输入来填充,并不是所有的输入都有一个共同的前缀。
要被视为一个选项,前缀出现在列表中的两个字符串上就足够了。如果有多个选项,它必须return最长的一个。
在我的研究中,我已经能够找到如何获得所有字符串的最长公共前缀,例如:
列表:["foo_a","foo_b","foo_c","fnord"]
输出:Longest common prefix is "f"
但是,我列表中的字符串甚至可能不是以相同的字母开头。
您可以构建 prefix trie 的森林,然后搜索具有两个(非空)子节点的最深节点的 "height"(距离根有多远)。该节点代表最长公共前缀。
如果你不关心性能,你可以简单地迭代列表中的所有单词,并将它们中的每一个(它的前缀)与其余的进行比较,同时不断更新最大值:
def common_prefix_size(s1, s2):
res, i = 0, 0
while i < min(len(s1), len(s2)):
if s1[i] == s2[i]:
res += 1
i += 1
else:
break
return res
def longest_prefix(lst):
res = ''
maxsize = 0
for i in range(len(lst) - 1):
for j in range(i + 1, len(lst)):
t = common_prefix_size(lst[i], lst[j])
maxsize = max(maxsize, t)
if maxsize == t:
res = lst[i][:maxsize]
return res
myList = ["foo", "foobar", "football", "footbag", "bar"]
print(longest_prefix(myList)) # footba
这是一个混乱的实现,对于大型列表来说效率不高,但它可以完成工作。我建议查看提到的前缀尝试,但如果您有一点时间,它们会工作得更好。
这从全尺寸单词向后工作,直到两个单词共享相同的前缀。它会截断单词的末尾并计算同一个单词出现的次数,如果至少出现 2 次,则 returns。
from collections import defaultdict
def list_to_text(x):
x = list(map(str, x))
first = '", "'.join(x[:-1]) #Join together (a, b, c)
if first:
return '" and "'.join((first, x[-1])) #Add the last element (a, b, c and d)
return x[0] #Return a single value if list length is 1
def find_longest_prefix(x):
x_max = len(max(x, key=len))
for i in range(1, x_max)[::-1]:
#Chop off the end of every word
trim = [j[:i] for j in x]
#Iterate through every unique value
result = defaultdict(list)
for j in set(trim):
result[trim.count(j)].append(j)
#Finish iterating if there are more than 2 words that share a prefix
highest_count = max(result)
if highest_count >= 2:
prefix = result[highest_count]
words = [j for k in prefix for j in x if j.startswith(k)]
return prefix, words
myList = ["foo", "foobar", "football", "footbag", "bar"]
prefix, words = find_longest_prefix(myList)
#Put together the string
print('The longest common prefix{} "{}" present in "{}".'.format(' is' if len(prefix) ==1 else 'es are',
list_to_text(prefix), list_to_text(words)))
它会根据结果的数量对字符串进行一些格式化。您的列表仍将导致:
The longest common prefix is "footba" present in "football" and "footbag".
但是添加另一个具有相同长度和结果数量的前缀将导致如下结果:
The longest common prefixes are "footba" and "testin" present in "football", "footbag", "testing" and "testin".
给定一个字符串列表,例如:
myList = ["foo", "foobar", "football", "footbag", "bar"]
找到列表中至少 2 个字符串中出现的最长前缀:
Longest prefix is "footba" present in "football" and "footbag"
列表将通过输入来填充,并不是所有的输入都有一个共同的前缀。
要被视为一个选项,前缀出现在列表中的两个字符串上就足够了。如果有多个选项,它必须return最长的一个。
在我的研究中,我已经能够找到如何获得所有字符串的最长公共前缀,例如:
列表:["foo_a","foo_b","foo_c","fnord"]
输出:Longest common prefix is "f"
但是,我列表中的字符串甚至可能不是以相同的字母开头。
您可以构建 prefix trie 的森林,然后搜索具有两个(非空)子节点的最深节点的 "height"(距离根有多远)。该节点代表最长公共前缀。
如果你不关心性能,你可以简单地迭代列表中的所有单词,并将它们中的每一个(它的前缀)与其余的进行比较,同时不断更新最大值:
def common_prefix_size(s1, s2):
res, i = 0, 0
while i < min(len(s1), len(s2)):
if s1[i] == s2[i]:
res += 1
i += 1
else:
break
return res
def longest_prefix(lst):
res = ''
maxsize = 0
for i in range(len(lst) - 1):
for j in range(i + 1, len(lst)):
t = common_prefix_size(lst[i], lst[j])
maxsize = max(maxsize, t)
if maxsize == t:
res = lst[i][:maxsize]
return res
myList = ["foo", "foobar", "football", "footbag", "bar"]
print(longest_prefix(myList)) # footba
这是一个混乱的实现,对于大型列表来说效率不高,但它可以完成工作。我建议查看提到的前缀尝试,但如果您有一点时间,它们会工作得更好。
这从全尺寸单词向后工作,直到两个单词共享相同的前缀。它会截断单词的末尾并计算同一个单词出现的次数,如果至少出现 2 次,则 returns。
from collections import defaultdict
def list_to_text(x):
x = list(map(str, x))
first = '", "'.join(x[:-1]) #Join together (a, b, c)
if first:
return '" and "'.join((first, x[-1])) #Add the last element (a, b, c and d)
return x[0] #Return a single value if list length is 1
def find_longest_prefix(x):
x_max = len(max(x, key=len))
for i in range(1, x_max)[::-1]:
#Chop off the end of every word
trim = [j[:i] for j in x]
#Iterate through every unique value
result = defaultdict(list)
for j in set(trim):
result[trim.count(j)].append(j)
#Finish iterating if there are more than 2 words that share a prefix
highest_count = max(result)
if highest_count >= 2:
prefix = result[highest_count]
words = [j for k in prefix for j in x if j.startswith(k)]
return prefix, words
myList = ["foo", "foobar", "football", "footbag", "bar"]
prefix, words = find_longest_prefix(myList)
#Put together the string
print('The longest common prefix{} "{}" present in "{}".'.format(' is' if len(prefix) ==1 else 'es are',
list_to_text(prefix), list_to_text(words)))
它会根据结果的数量对字符串进行一些格式化。您的列表仍将导致:
The longest common prefix is "footba" present in "football" and "footbag".
但是添加另一个具有相同长度和结果数量的前缀将导致如下结果:
The longest common prefixes are "footba" and "testin" present in "football", "footbag", "testing" and "testin".