字符串中的单词等于另一个字符串中单词的排列
Words in string equal to a permutation of words in another string
我想知道一个字符串是否至少等于另一个字符串的排列之一。
比如下面两个比较的输出一定是true
:
print('united states of America' == 'America united states of')
print('united states of America' == 'states of united America')
我使用了 itertools.permutations
方法来查找排列,但是
我想要它用于单词而不是句子或字符串中的字母。
所以 'abcd' == 'bcad'
不等于 True
,而是 'a b c d' == 'b c a d'
应该是 True
.
应该这样做:
import itertools
def f(s1, s2):
perms = itertools.permutations(s1.split(" "))
return tuple(s2.split(" ")) in perms
>>> f('united states of America', 'America united states of')
True
>>> f('united states of America', 'states of united America')
True
>>> f("a b c d", "b a c d")
True
你不需要itertools.permutations
。相反,比较 重新排序的 字符串,所有单词都排序:
def reorder_str(s):
return " ".join(sorted(s.split()))
print(reorder_str('united states of America') == reorder_str('America united states of'))
# True
print(reorder_str('united states of America') == reorder_str('america united states of'))
# False
与排列相比,一种在计算上更有效的方法是拆分字符串并对其进行排序,然后比较排序后的列表。生成排列的一个挑战是,一旦你开始只得到几个单词,例如10,那你就有10了!排列数 (3628800)
string_1 = "states of united America"
string_2 = "united states of America"
def check_if_perm(s1, s2):
return sorted(s1.split()) == sorted(s2.split())
输出:
True
我推荐的解决方案是找出每个字符串中每个单词的频率并进行比较。
实际实现有点棘手,我们首先找到第一个字符串中每个单词的频率,然后对于第二个字符串中的每个单词我们将频率减1,如果频率小于0,我们知道第二个字符串比第一个字符串包含该单词的次数更多,我们可以跳过其余的单词。
def compare(s1, s2):
count = {} # this is the frequencies dict
for word in s1.split():
count[word] = count.get(word, 0) + 1 # increment the frequency of the word
for word in s2.split():
count[word] = count.get(word, 0) - 1 # decrement the frequency of the word
if count[word] < 0:
# s2 has more of this word than s1
return False
return True
此算法比排序或 itertools.permutations
更快,但需要更多 space。这个算法的时间复杂度是O(n)
其中n
是单词的个数,相比之下排序是O(nlog n)
,itertools.permutations
是O(n!)
.
如果 itertools.permutations
不是必须的,我会使用集合:
>>> set('united states of America'.split()) == set('America united states of'.split())
True
>>> set('united states of America'.split()) == set('America united states off'.split())
False
只需检查单词频率
from collections import Counter
c = Counter('united states of America')
c1 = Counter('America united states of')
c2 = Counter('states of united America'
print(c == c1 == c2)
# True
我想知道一个字符串是否至少等于另一个字符串的排列之一。
比如下面两个比较的输出一定是true
:
print('united states of America' == 'America united states of')
print('united states of America' == 'states of united America')
我使用了 itertools.permutations
方法来查找排列,但是
我想要它用于单词而不是句子或字符串中的字母。
所以 'abcd' == 'bcad'
不等于 True
,而是 'a b c d' == 'b c a d'
应该是 True
.
应该这样做:
import itertools
def f(s1, s2):
perms = itertools.permutations(s1.split(" "))
return tuple(s2.split(" ")) in perms
>>> f('united states of America', 'America united states of')
True
>>> f('united states of America', 'states of united America')
True
>>> f("a b c d", "b a c d")
True
你不需要itertools.permutations
。相反,比较 重新排序的 字符串,所有单词都排序:
def reorder_str(s):
return " ".join(sorted(s.split()))
print(reorder_str('united states of America') == reorder_str('America united states of'))
# True
print(reorder_str('united states of America') == reorder_str('america united states of'))
# False
与排列相比,一种在计算上更有效的方法是拆分字符串并对其进行排序,然后比较排序后的列表。生成排列的一个挑战是,一旦你开始只得到几个单词,例如10,那你就有10了!排列数 (3628800)
string_1 = "states of united America"
string_2 = "united states of America"
def check_if_perm(s1, s2):
return sorted(s1.split()) == sorted(s2.split())
输出:
True
我推荐的解决方案是找出每个字符串中每个单词的频率并进行比较。
实际实现有点棘手,我们首先找到第一个字符串中每个单词的频率,然后对于第二个字符串中的每个单词我们将频率减1,如果频率小于0,我们知道第二个字符串比第一个字符串包含该单词的次数更多,我们可以跳过其余的单词。
def compare(s1, s2):
count = {} # this is the frequencies dict
for word in s1.split():
count[word] = count.get(word, 0) + 1 # increment the frequency of the word
for word in s2.split():
count[word] = count.get(word, 0) - 1 # decrement the frequency of the word
if count[word] < 0:
# s2 has more of this word than s1
return False
return True
此算法比排序或 itertools.permutations
更快,但需要更多 space。这个算法的时间复杂度是O(n)
其中n
是单词的个数,相比之下排序是O(nlog n)
,itertools.permutations
是O(n!)
.
如果 itertools.permutations
不是必须的,我会使用集合:
>>> set('united states of America'.split()) == set('America united states of'.split())
True
>>> set('united states of America'.split()) == set('America united states off'.split())
False
只需检查单词频率
from collections import Counter
c = Counter('united states of America')
c1 = Counter('America united states of')
c2 = Counter('states of united America'
print(c == c1 == c2)
# True