在Matlab中计算可变长度字符串之间的汉明距离

Calculate Hamming distance between strings of variable length in Matlab

我想在 Matlab 中计算两个可变长度字符串之间的汉明距离。对于固定长度的字符串,以下语法解决了我的问题:

str1 = 'abcde';
str2 = 'abedc';

sum(str1 ~= str2)

ans = 2

我怎样才能有效地处理变长字符串?

谢谢!

编辑:因为这是一个合理的问题:对于每个字符,一个字符串比另一个长,汉明距离应该增加。所以例如

str1 = 'abcdef';
str2 = 'abc';

答案应该是 3。

方法如下:

str1 = 'abcdef';
str2 = 'abc';
clear t
t(1,:) = str1+1; % +1 to make sure there are no zeros
t(2,1:numel(str2)) = str2+1; % if needed, this right-pads with zero or causes t to grow
result = sum(t(1,:)~=t(2,:));

尽管@LuisMendo 的答案适用于给定的示例(这对您来说可能已经足够好了)但不适用于此示例:

str1 = 'abcdef';
str2 = 'bcd';
clear t
t(1,:) = str1+1; % +1 to make sure there are no zeros
t(2,1:numel(str2)) = str2+1; % if needed, this right-pads with zero or causes t to grow
result = sum(t(1,:)~=t(2,:)) % result = 6

确保即使较短的字符串出现在较长字符串的中间,您也应该检查所有选项。一种方法是:

str1 = 'bcd';
str2 = 'abcdef';
len1 = length(str1);
len2 = length(str2);
n = len2 - len1;
str1rep_temp = repmat(str1,[1,n+1]);
str1rep = -ones(n+1,len2);
str1rows = repmat(1:n+1,[len1,1]);
str1cols = bsxfun(@plus,(1:len1)',0:n);
str1idxs = sub2ind(size(str1rep),str1rows(:),str1cols(:));
str1rep(str1idxs) = str1rep_temp;
str2rep = double(repmat(str2,[n+1, 1]));
res = min(sum(str1rep ~= str2rep,2)); % res = 3