如何检查一个变量与另一个变量共有多少个字符
How to check how many characters a variable has in common with another variable
如果我有两个变量,我想看看它们有多少个相同的字符,我该怎么做才能得出有多少个是错误的?例如:
a = "word"
b = "wind"
a - b = 2
有没有办法做到这一点或使上面的工作起作用?
编辑:计算时还应考虑顺序
Edit2:所有这些结果应该如下所示
a = bird
b = word
<program to find answer> 2
a = book
b = look
<program to find answer> 3
a = boat
b = obee
<program to find answer> 0
a = fizz
b = faze
<program to find answer> 2
你可以这样做:
sum(achar != bchar for achar, bchar in zip(a,b))
这适用于字符串长度相同的情况。如果它们可能有不同的长度,那么你也可以考虑到这一点:
sum(achar != bchar for achar, bchar in zip(a,b)) + abs(len(a) - len(b))
尽管这只会让单词在它们的开头匹配,因此 wordy
和 word
之间的差异将为 1,而 wordy
和 [= 之间的差异15=] 将是 5。如果您希望该差异为 1,那么您将需要更复杂的逻辑。
这可能不适用于所有情况,但如果您想比较字符,可以使用 set
:
a = "word"
b = "wind"
diff = set.intersection(set(a),set(b))
print(len(diff))
>> 2
这将忽略序列,因为您将它们分组为一组唯一字符。
您可以使用的另一个有趣的 Python 标准模块库是 difflib
。
from difflib import Differ
d = Differ()
a = "word"
b = "wind"
[i for i in d.compare(a,b) if i.startswith('-')]
>>['- o', '- r']
difflib
本质上是提供给你比较字符串等序列的方法。从上面的 Differ
对象,您可以比较 2 个字符串并识别添加或删除的字符以跟踪从字符串 a
到字符串 b
的变化。在给出的示例中,列表理解用于过滤掉从 a
到 b
中删除的字符,您还可以检查以 +
开头的字符以查找添加的字符。
[i for i in d.compare(a,b) if i.startswith('+')]
>>['+ i', '+ n']
或两个序列寻址共有的字符
How to check how many characters a variable has in common with another
variable
common = [i for i in d.compare(a,b) if i.startswith(' ')]
print(common, len(common))
>> [' w', ' d'] 2
您可以阅读有关 Differ
对象的更多信息 here
您所描述的需要的是单词之间的编辑距离度量。提到了汉明距离,但是它不能正确解释不同长度的单词,因为它只解释替换。其他常见指标包括 "longest common substring"、"Levenshtein distance"、"Jaro distance" 等。
您的问题似乎描述了 Levenshtein 距离,该距离定义为从一个词到另一个词(插入、删除或替换)的最小单个字符编辑次数。 wikipedia page for this is pretty thorough if you'd like to read and understand more on the topic (or go on a wikipedia tangent), but as far as coding, there already exists a library on pip: pip install python-Levenshtein
在 c 中实现算法以加快执行速度。
示例:
这是来自 rosetta code 的递归实现,其中包含大量注释以帮助您了解其工作原理..
from functools import lru_cache
@lru_cache(maxsize=4095) #recursive approach will calculate some substrings many times,
# so we can cache the result and re-use it to speed things up.
def ld(s, t):
if not s: return len(t) #if one of the substrings is empty, we've reached our maximum recursion
if not t: return len(s) # the difference in length must be added to edit distance (insert that many chars.)
if s[0] == t[0]: #equal chars do not increase edit distance
return ld(s[1:], t[1:]) #remove chars that are the same and find distance
else: #we must edit next char so we'll try insertion deletion and swapping
l1 = ld(s, t[1:]) #insert char (delete from `t`)
l2 = ld(s[1:], t) #delete char (insert to `t`)
l3 = ld(s[1:], t[1:]) #swap chars
#take minimum distance of the three cases we tried and add 1 for this edit
return 1 + min(l1, l2, l3)
并进行测试:
>>>ld('kitten', 'sitting') #swap k->s, swap e->i, insert g
Out[3]: 3
计算常见字符数并从较长字符串的长度中减去它。
根据您的编辑和评论,我认为您正在寻找这个:
def find_uncommon_chars(word1, word2):
# select shorter and longer word
shorter = word1
longer = word2
if len(shorter) > len(longer):
shorter = word2
longer = word1
# count common chars
count = 0
for i in range(len(shorter)):
if shorter[i] == longer[i]:
count += 1
# if you return just count you have number of common chars
return len(longer) - count
如果我有两个变量,我想看看它们有多少个相同的字符,我该怎么做才能得出有多少个是错误的?例如:
a = "word"
b = "wind"
a - b = 2
有没有办法做到这一点或使上面的工作起作用?
编辑:计算时还应考虑顺序
Edit2:所有这些结果应该如下所示
a = bird
b = word
<program to find answer> 2
a = book
b = look
<program to find answer> 3
a = boat
b = obee
<program to find answer> 0
a = fizz
b = faze
<program to find answer> 2
你可以这样做:
sum(achar != bchar for achar, bchar in zip(a,b))
这适用于字符串长度相同的情况。如果它们可能有不同的长度,那么你也可以考虑到这一点:
sum(achar != bchar for achar, bchar in zip(a,b)) + abs(len(a) - len(b))
尽管这只会让单词在它们的开头匹配,因此 wordy
和 word
之间的差异将为 1,而 wordy
和 [= 之间的差异15=] 将是 5。如果您希望该差异为 1,那么您将需要更复杂的逻辑。
这可能不适用于所有情况,但如果您想比较字符,可以使用 set
:
a = "word"
b = "wind"
diff = set.intersection(set(a),set(b))
print(len(diff))
>> 2
这将忽略序列,因为您将它们分组为一组唯一字符。
您可以使用的另一个有趣的 Python 标准模块库是 difflib
。
from difflib import Differ
d = Differ()
a = "word"
b = "wind"
[i for i in d.compare(a,b) if i.startswith('-')]
>>['- o', '- r']
difflib
本质上是提供给你比较字符串等序列的方法。从上面的 Differ
对象,您可以比较 2 个字符串并识别添加或删除的字符以跟踪从字符串 a
到字符串 b
的变化。在给出的示例中,列表理解用于过滤掉从 a
到 b
中删除的字符,您还可以检查以 +
开头的字符以查找添加的字符。
[i for i in d.compare(a,b) if i.startswith('+')]
>>['+ i', '+ n']
或两个序列寻址共有的字符
How to check how many characters a variable has in common with another variable
common = [i for i in d.compare(a,b) if i.startswith(' ')]
print(common, len(common))
>> [' w', ' d'] 2
您可以阅读有关 Differ
对象的更多信息 here
您所描述的需要的是单词之间的编辑距离度量。提到了汉明距离,但是它不能正确解释不同长度的单词,因为它只解释替换。其他常见指标包括 "longest common substring"、"Levenshtein distance"、"Jaro distance" 等。
您的问题似乎描述了 Levenshtein 距离,该距离定义为从一个词到另一个词(插入、删除或替换)的最小单个字符编辑次数。 wikipedia page for this is pretty thorough if you'd like to read and understand more on the topic (or go on a wikipedia tangent), but as far as coding, there already exists a library on pip: pip install python-Levenshtein
在 c 中实现算法以加快执行速度。
示例:
这是来自 rosetta code 的递归实现,其中包含大量注释以帮助您了解其工作原理..
from functools import lru_cache
@lru_cache(maxsize=4095) #recursive approach will calculate some substrings many times,
# so we can cache the result and re-use it to speed things up.
def ld(s, t):
if not s: return len(t) #if one of the substrings is empty, we've reached our maximum recursion
if not t: return len(s) # the difference in length must be added to edit distance (insert that many chars.)
if s[0] == t[0]: #equal chars do not increase edit distance
return ld(s[1:], t[1:]) #remove chars that are the same and find distance
else: #we must edit next char so we'll try insertion deletion and swapping
l1 = ld(s, t[1:]) #insert char (delete from `t`)
l2 = ld(s[1:], t) #delete char (insert to `t`)
l3 = ld(s[1:], t[1:]) #swap chars
#take minimum distance of the three cases we tried and add 1 for this edit
return 1 + min(l1, l2, l3)
并进行测试:
>>>ld('kitten', 'sitting') #swap k->s, swap e->i, insert g
Out[3]: 3
计算常见字符数并从较长字符串的长度中减去它。 根据您的编辑和评论,我认为您正在寻找这个:
def find_uncommon_chars(word1, word2):
# select shorter and longer word
shorter = word1
longer = word2
if len(shorter) > len(longer):
shorter = word2
longer = word1
# count common chars
count = 0
for i in range(len(shorter)):
if shorter[i] == longer[i]:
count += 1
# if you return just count you have number of common chars
return len(longer) - count