如何计算数字的相似度（在列表中）

Question

我正在寻找一种计算数字列表相似度分数的方法。理想情况下，该方法应在固定范围内给出结果。例如从 0 到 1，其中 0 完全不相似，1 表示所有数字都相同。

为了清楚起见，让我举几个例子：

0 1 2 3 4 5 6 7 8 9 10 => the similarity should be 0 or close to zero as all numbers are different
1 1 1 1 1 1 1 => 1
10 9 11 10.5 => close to 1
1 1 1 1 1 1 1 1 1 1 100 => score should be still pretty high as only the last value is different

我曾尝试根据归一化和平均值来计算相似度，但是当存在一个 'bad number' 时，这给了我非常糟糕的结果。

谢谢。

Answer 1

相似性测试总是非常主观，正确使用它在很大程度上取决于您尝试使用它的目的。我们已经有了三种典型的集中趋势度量（均值、中位数、众数）。很难说哪种测试对你有用，因为有不同的测量方法可以满足你的要求，但对其他列表（如 [1]*7 + [100] * 7）有截然不同的测量方法。这是一种解决方案：

import statistics as stats

def tester(ell):
    mode_measure = 1 - len(set(ell))/len(ell)
    avg_measure = 1 - stats.stdev(ell)/stats.mean(ell)
    return max(avg_measure, mode_measure)

如何计算数字的相似度（在列表中）

How to calculate similarity of numbers (in list)

math

similarity