从具有多个列特征的数据框计算欧氏距离
Calculating euclidean distance from a dataframe with several column features
我有一个如下所示的数据框,我需要计算欧氏距离。
a,b,c,d,e
10,11,13,14,9
11,12,14,15,10
12,13,15,16,11
13,14,16,17,12
14,15,17,18,13
15,16,18,19,14
16,17,19,20,15
17,18,20,21,16
18,19,21,22,17
19,20,22,23,18
20,21,23,24,19
21,22,24,25,20
22,23,25,26,21
23,24,26,27,22
24,25,27,28,23
我猜只有 2 列特征说 a 和 b,我可以轻松做到:
def euclidean_distance(a, b):
return np.sqrt(np.sum((a - b)**2))
如何计算具有多个列特征的数据帧的欧氏距离,例如 a、b、c , d, e 以上?
如果我对问题的理解正确,你想为你的所有行创建一个距离矩阵吗?
from scipy.spatial.distance import pdist, squareform
df = pd.DataFrame([{'a':1,'b':2,'c':3}, {'a':4,'b':5,'c':6}])
distances = squareform(pdist(df.values, metric='euclidean'))
导致矩阵包含
array([[0. , 5.19615242],
[5.19615242, 0. ]])
cdist
怎么样:
from scipy.spatial.distance import cdist
arr = df[['a','b','c','d']].values
dist_mat = cdist(arr,arr)
如果不喜欢外包,距离矩阵为:
dist_mat = ((arr[None,:,:] - arr[:,None,:])**2).sum(-1)**.5
您的数据有(15 个维度,5 个点),如果我没记错的话,您想要这些点中的每一个点之间的欧氏距离。
import numpy as np
import pandas as pd
# copied and pasted your data to a text file
df = pd.read_table("euclidean.txt", sep=',')
> df.shape
(15, 5)
(15,5)
距离矩阵将为 5x5
。初始化这个矩阵,使用for
循环计算这5个点之间的欧氏距离,并将它们填充到距离矩阵中。
n = df.shape[1] # this number is 5 for the dataset you provided
dm = np.zeros((n,n)) # initialize the distance matrix to zero
for i in range(n):
for j in range(n):
dm[i,j] = np.sqrt(np.sum((df.iloc[:,i] - df.iloc[:,j])**2))
dm
输出为:
> dm
array([[ 0. , 3.87298335, 11.61895004, 15.49193338, 3.87298335],
[ 3.87298335, 0. , 7.74596669, 11.61895004, 7.74596669],
[11.61895004, 7.74596669, 0. , 3.87298335, 15.49193338],
[15.49193338, 11.61895004, 3.87298335, 0. , 19.36491673],
[ 3.87298335, 7.74596669, 15.49193338, 19.36491673, 0. ]])
我有一个如下所示的数据框,我需要计算欧氏距离。
a,b,c,d,e
10,11,13,14,9
11,12,14,15,10
12,13,15,16,11
13,14,16,17,12
14,15,17,18,13
15,16,18,19,14
16,17,19,20,15
17,18,20,21,16
18,19,21,22,17
19,20,22,23,18
20,21,23,24,19
21,22,24,25,20
22,23,25,26,21
23,24,26,27,22
24,25,27,28,23
我猜只有 2 列特征说 a 和 b,我可以轻松做到:
def euclidean_distance(a, b):
return np.sqrt(np.sum((a - b)**2))
如何计算具有多个列特征的数据帧的欧氏距离,例如 a、b、c , d, e 以上?
如果我对问题的理解正确,你想为你的所有行创建一个距离矩阵吗?
from scipy.spatial.distance import pdist, squareform
df = pd.DataFrame([{'a':1,'b':2,'c':3}, {'a':4,'b':5,'c':6}])
distances = squareform(pdist(df.values, metric='euclidean'))
导致矩阵包含
array([[0. , 5.19615242],
[5.19615242, 0. ]])
cdist
怎么样:
from scipy.spatial.distance import cdist
arr = df[['a','b','c','d']].values
dist_mat = cdist(arr,arr)
如果不喜欢外包,距离矩阵为:
dist_mat = ((arr[None,:,:] - arr[:,None,:])**2).sum(-1)**.5
您的数据有(15 个维度,5 个点),如果我没记错的话,您想要这些点中的每一个点之间的欧氏距离。
import numpy as np
import pandas as pd
# copied and pasted your data to a text file
df = pd.read_table("euclidean.txt", sep=',')
> df.shape
(15, 5)
(15,5)
距离矩阵将为 5x5
。初始化这个矩阵,使用for
循环计算这5个点之间的欧氏距离,并将它们填充到距离矩阵中。
n = df.shape[1] # this number is 5 for the dataset you provided
dm = np.zeros((n,n)) # initialize the distance matrix to zero
for i in range(n):
for j in range(n):
dm[i,j] = np.sqrt(np.sum((df.iloc[:,i] - df.iloc[:,j])**2))
dm
输出为:
> dm
array([[ 0. , 3.87298335, 11.61895004, 15.49193338, 3.87298335],
[ 3.87298335, 0. , 7.74596669, 11.61895004, 7.74596669],
[11.61895004, 7.74596669, 0. , 3.87298335, 15.49193338],
[15.49193338, 11.61895004, 3.87298335, 0. , 19.36491673],
[ 3.87298335, 7.74596669, 15.49193338, 19.36491673, 0. ]])