如何使用空格来适应字符串,最小化编辑距离?
How to fit strings using spaces, minimizing edit distance?
我正在寻找适合两个字符串的算法,必要时用空格填充它们以最小化它们之间的编辑距离:
fit('algorithm', 'lgrthm') == ' lg r thm'
这肯定有一些预先编写的算法。有什么想法吗?
采用了一种幼稚但简单的逻辑方法。
def fit(word1,word2):
A, B = list(word1), list(word2)
if len(B) < len(A):
B+= (len(A)-len(B)) * ['1']
else:
return ''.join(x if x in B else ' ' for x in A)
for i in range(len(B)):
if A[i] != B[i] :
B.insert(i,' ')
return ''.join(x for x in B if x != '1')
测试结果:
algorithm lgrthm
lg r thm
---
pineapple pine
pine
---
pineapple apple
apple
---
pineapple eale
ea le
---
foo fo
fo
---
stack sak
s a k
---
over or
o r
---
flow lw
l w
---
您可以执行以下操作:
def fit(target, source):
i, j = 0, 0
result = []
while i < len(source) and j < len(target):
if source[i] == target[j]:
result.append(source[i])
i += 1
else:
result.append(' ')
j += 1
return ''.join(result)
test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]
for t, s in test:
print(t)
print(fit(t, s))
print('---')
输出
algorithm
lg r thm
---
pineapple
pine
---
pineapple
apple
---
pineapple
ea le
---
foo
fo
---
stack
s a k
---
over
o r
---
flow
l w
---
也许更好的版本如下:
from collections import deque
def peak(q, default=' '):
"""Perform a safe peak, if the queue is empty return default"""
return q[0] if q else default
def fit(target, source):
ds = deque(source)
return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])
更好,因为您不需要像以前的方法那样跟踪状态变量i, j
。
我正在寻找适合两个字符串的算法,必要时用空格填充它们以最小化它们之间的编辑距离:
fit('algorithm', 'lgrthm') == ' lg r thm'
这肯定有一些预先编写的算法。有什么想法吗?
采用了一种幼稚但简单的逻辑方法。
def fit(word1,word2):
A, B = list(word1), list(word2)
if len(B) < len(A):
B+= (len(A)-len(B)) * ['1']
else:
return ''.join(x if x in B else ' ' for x in A)
for i in range(len(B)):
if A[i] != B[i] :
B.insert(i,' ')
return ''.join(x for x in B if x != '1')
测试结果:
algorithm lgrthm
lg r thm
---
pineapple pine
pine
---
pineapple apple
apple
---
pineapple eale
ea le
---
foo fo
fo
---
stack sak
s a k
---
over or
o r
---
flow lw
l w
---
您可以执行以下操作:
def fit(target, source):
i, j = 0, 0
result = []
while i < len(source) and j < len(target):
if source[i] == target[j]:
result.append(source[i])
i += 1
else:
result.append(' ')
j += 1
return ''.join(result)
test = [('algorithm', 'lgrthm'), ('pineapple', 'pine'), ('pineapple', 'apple'), ('pineapple', 'eale'),
('foo', 'fo'), ('stack', 'sak'), ('over', 'or'), ('flow', 'lw')]
for t, s in test:
print(t)
print(fit(t, s))
print('---')
输出
algorithm
lg r thm
---
pineapple
pine
---
pineapple
apple
---
pineapple
ea le
---
foo
fo
---
stack
s a k
---
over
o r
---
flow
l w
---
也许更好的版本如下:
from collections import deque
def peak(q, default=' '):
"""Perform a safe peak, if the queue is empty return default"""
return q[0] if q else default
def fit(target, source):
ds = deque(source)
return ''.join([ds.popleft() if peak(ds) == e else ' ' for e in target])
更好,因为您不需要像以前的方法那样跟踪状态变量i, j
。