浮动列表之间的相似性
Similarity between lists of floats
我有一个浮点数列表,我想将其与其他列表进行比较并获得 python 中的相似率:
我要比较的列表:
[0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001]
其他列表之一:
[0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000]
我尝试将它们转换为字符串并使用 fuzzywyzzy 库 python-Levenshtein 和 difflib 来比较字符串并获得比率,但这并没有给我想要的结果而且它们非常慢.我搜索了一下,找不到任何相关信息。
比较 2 个浮点数列表的最佳方法是什么?
我想知道是否有一种本机方法来比较浮点列表的相似性,或者是否有一个库可以完成这项工作,比如许多字符串比较示例。
这个问题在我看来不是很清楚,不过你可以看看下面的方法是否对你有帮助:
import numpy as np
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
mse1 = ((l1 - l2)**2).mean()
# Out[180]: 6.699999999999999e-08
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([1.0000,1.0002,1.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
mse2 = ((l1 - l2)**2).mean()
# Out[180]: 0.15000006700000001
mse1 < mse2
# Out[187]: True
您不会得到介于 0 和 1 之间的值,但您可以比较结果,越接近 0,它们越相同。mse 代表均方误差。但还有更多指标可能与您相关,例如 msle、mae 等
我有一个浮点数列表,我想将其与其他列表进行比较并获得 python 中的相似率:
我要比较的列表:
[0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001]
其他列表之一:
[0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000]
我尝试将它们转换为字符串并使用 fuzzywyzzy 库 python-Levenshtein 和 difflib 来比较字符串并获得比率,但这并没有给我想要的结果而且它们非常慢.我搜索了一下,找不到任何相关信息。
比较 2 个浮点数列表的最佳方法是什么?
我想知道是否有一种本机方法来比较浮点列表的相似性,或者是否有一个库可以完成这项工作,比如许多字符串比较示例。
这个问题在我看来不是很清楚,不过你可以看看下面的方法是否对你有帮助:
import numpy as np
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
mse1 = ((l1 - l2)**2).mean()
# Out[180]: 6.699999999999999e-08
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([1.0000,1.0002,1.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])
mse2 = ((l1 - l2)**2).mean()
# Out[180]: 0.15000006700000001
mse1 < mse2
# Out[187]: True
您不会得到介于 0 和 1 之间的值,但您可以比较结果,越接近 0,它们越相同。mse 代表均方误差。但还有更多指标可能与您相关,例如 msle、mae 等