Python - 获取字符串之间的差异
Python - getting just the difference between strings
从两个多行字符串中获取差异的最佳方法是什么?
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
diff = difflib.ndiff(a,b)
print ''.join(diff)
这会产生:
t e s t i n g t h i s i s w o r k i n g
t e s t i n g t h i s i s w o r k i n g 1
+ + t+ e+ s+ t+ i+ n+ g+ + t+ h+ i+ s+ + i+ s+ + w+ o+ r+ k+ i+ n+ g+ + 2
准确获取的最佳方法是什么:
testing this is working 2
?
正则表达式会是这里的解决方案吗?
基于@Chris_Rands 评论,您也可以使用 splitlines() 操作(如果您的字符串是多行并且您希望该行不存在于其中一行中,而存在于另一行中):
b_s = b.splitlines()
a_s = a.splitlines()
[x for x in b_s if x not in a_s]
预期输出为:
[' testing this is working 2']
最简单的 Hack,学分 ,使用 split()
。
注意:您需要确定哪个是较长的字符串,并将其用于拆分。
if len(a)>len(b):
res=''.join(a.split(b)) #get diff
else:
res=''.join(b.split(a)) #get diff
print(res.strip()) #remove whitespace on either sides
# 驱动值
IN : a = 'testing this is working \n testing this is working 1 \n'
IN : b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
OUT : testing this is working 2
编辑: 感谢 使用 replace
进行另一次破解,不需要任何 join
计算。
if len(a)>len(b):
res=a.replace(b,'') #get diff
else:
res=b.replace(a,'') #get diff
import itertools as it
"".join(y for x, y in it.zip_longest(a, b) if x != y)
# ' testing this is working 2'
或者
import collections as ct
ca = ct.Counter(a.split("\n"))
cb = ct.Counter(b.split("\n"))
diff = cb - ca
"".join(diff.keys())
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...'
基本上使每个字符串成为一组行,并取集合差异 - 即 B 中不在 A 中的所有内容。然后获取该结果并将其全部连接到一个字符串中。
编辑:这是表达@ShreyasG 所说内容的一种概括方式 - [x for x if x not in y]...
这基本上是@Godron629 的回答,但由于我无法发表评论,所以我将其发布在这里并稍加修改:将 difference
更改为 symmetric_difference
以便集合的顺序没关系。
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.symmetric_difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, some more things...'
您可以使用以下功能:
def helper(a, b):
for i, l_a in enumerate(a):
if b == l_a:
return i
return -1
def diff(a, b):
t_b = b
c_i = 0
for c in a:
t_i = helper(t_b, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_b = t_b[:c_i] + t_b[c_i+1:]
t_a = a
c_i = 0
for c in b:
t_i = helper(t_a, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_a = t_a[:c_i] + t_a[c_i+1:]
return t_b + t_a
用法示例print diff(a, b)
从两个多行字符串中获取差异的最佳方法是什么?
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
diff = difflib.ndiff(a,b)
print ''.join(diff)
这会产生:
t e s t i n g t h i s i s w o r k i n g
t e s t i n g t h i s i s w o r k i n g 1
+ + t+ e+ s+ t+ i+ n+ g+ + t+ h+ i+ s+ + i+ s+ + w+ o+ r+ k+ i+ n+ g+ + 2
准确获取的最佳方法是什么:
testing this is working 2
?
正则表达式会是这里的解决方案吗?
基于@Chris_Rands 评论,您也可以使用 splitlines() 操作(如果您的字符串是多行并且您希望该行不存在于其中一行中,而存在于另一行中):
b_s = b.splitlines()
a_s = a.splitlines()
[x for x in b_s if x not in a_s]
预期输出为:
[' testing this is working 2']
最简单的 Hack,学分 split()
。
注意:您需要确定哪个是较长的字符串,并将其用于拆分。
if len(a)>len(b):
res=''.join(a.split(b)) #get diff
else:
res=''.join(b.split(a)) #get diff
print(res.strip()) #remove whitespace on either sides
# 驱动值
IN : a = 'testing this is working \n testing this is working 1 \n'
IN : b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
OUT : testing this is working 2
编辑: 感谢 replace
进行另一次破解,不需要任何 join
计算。
if len(a)>len(b):
res=a.replace(b,'') #get diff
else:
res=b.replace(a,'') #get diff
import itertools as it
"".join(y for x, y in it.zip_longest(a, b) if x != y)
# ' testing this is working 2'
或者
import collections as ct
ca = ct.Counter(a.split("\n"))
cb = ct.Counter(b.split("\n"))
diff = cb - ca
"".join(diff.keys())
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...'
基本上使每个字符串成为一组行,并取集合差异 - 即 B 中不在 A 中的所有内容。然后获取该结果并将其全部连接到一个字符串中。
编辑:这是表达@ShreyasG 所说内容的一种概括方式 - [x for x if x not in y]...
这基本上是@Godron629 的回答,但由于我无法发表评论,所以我将其发布在这里并稍加修改:将 difference
更改为 symmetric_difference
以便集合的顺序没关系。
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.symmetric_difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, some more things...'
您可以使用以下功能:
def helper(a, b):
for i, l_a in enumerate(a):
if b == l_a:
return i
return -1
def diff(a, b):
t_b = b
c_i = 0
for c in a:
t_i = helper(t_b, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_b = t_b[:c_i] + t_b[c_i+1:]
t_a = a
c_i = 0
for c in b:
t_i = helper(t_a, c)
if t_i != -1 and (t_i > c_i or t_i == c_i):
c_i = t_i
t_a = t_a[:c_i] + t_a[c_i+1:]
return t_b + t_a
用法示例print diff(a, b)