如何找出二进制数的相似度

how to find out binary number similarity

最近我参加了一个编码挑战,其中一个问题如下

有 N 辆汽车,每辆都有 M 种特征。汽车特征列表以二进制字符串形式给出。

例如:0000111

0 表示不支持该功能。

如果两辆车的特征描述至少有一个特征不同,则它们是相似的。例如:11001,每辆车有 11000 辆相似,在具有特征的汽车列表中找到相似汽车的数量

我已经尝试用异或运算符解决问题。但它适用于少数测试用例

cars = ["100", "110", "010", "011", "100"]

此处索引为 0 的汽车与索引 1 和 4 相似。所以输出应该是2. similar for all index car need to be found out with which it is similar.

我试过的解决方案:

def solution(cars):
    ret_list = []
    for i in range(len(cars)):
        count = 0
        for j in range(len(cars)):
            if i != j:
                if (int(cars[i]) ^ int(cars[j])) <= 100:
                    count += 1
        ret_list.append(count)
        print(ret_list)
    return ret_list

输出:[2, 3, 2, 1, 2]

但这不适用于以下输入:

cars = ["1000", "0110", "0010", "0101", "1010"]

有人可以提出一个适用于所有二进制数的更好的解决方案

试试这个:

def isPowerOf2(x):
    return (x & (x - 1)) == 0

def areSimilar(x, y):
    return isPowerOf2(x ^ y)

def solution(cars):
    similarPairCount = 0
    for i in range(len(cars) - 1):
        for j in range(i + 1, len(cars)):
            if areSimilar(int(cars[i], 2), int(cars[j], 2)):
                similarPairCount += 1
    return similarPairCount
import numpy as np

# ...

# conversion to int
cars_int = np.array([int(c,2) for c in cars]).reshape(1,-1)

# matrix to compare all against all
compare = (cars_int ^ cars_int.T) # or exclusive

# result of cars with 0 or 1 hamming distance 
# (not including comparison of each car with itself)
result = (np.sum(compare & (compare-1) == 0) - cars_int.size) // 2

如果你想要一个与汽车相似的列表:

# ...
result_per_car = np.sum(compare & (compare-1) == 0, axis=1) - 1