从具有多个列特征的数据框计算欧氏距离

Calculating euclidean distance from a dataframe with several column features

我有一个如下所示的数据框,我需要计算欧氏距离。

a,b,c,d,e
10,11,13,14,9
11,12,14,15,10
12,13,15,16,11
13,14,16,17,12
14,15,17,18,13
15,16,18,19,14
16,17,19,20,15
17,18,20,21,16
18,19,21,22,17
19,20,22,23,18
20,21,23,24,19
21,22,24,25,20
22,23,25,26,21
23,24,26,27,22
24,25,27,28,23

我猜只有 2 列特征说 ab,我可以轻松做到:

def euclidean_distance(a, b):
    return np.sqrt(np.sum((a - b)**2))

如何计算具有多个列特征的数据帧的欧氏距离,例如 abc , d, e 以上?

如果我对问题的理解正确,你想为你的所有行创建一个距离矩阵吗?

from scipy.spatial.distance import pdist, squareform
df = pd.DataFrame([{'a':1,'b':2,'c':3}, {'a':4,'b':5,'c':6}])
distances = squareform(pdist(df.values, metric='euclidean'))

导致矩阵包含

array([[0.        , 5.19615242],
   [5.19615242, 0.        ]])

cdist怎么样:

from scipy.spatial.distance import cdist
arr = df[['a','b','c','d']].values
dist_mat = cdist(arr,arr)

如果不喜欢外包,距离矩阵为:

dist_mat = ((arr[None,:,:] - arr[:,None,:])**2).sum(-1)**.5

您的数据有(15 个维度,5 个点),如果我没记错的话,您想要这些点中的每一个点之间的欧氏距离。

import numpy as np
import pandas as pd

# copied and pasted your data to a text file
df = pd.read_table("euclidean.txt", sep=',') 

> df.shape 
(15, 5)

(15,5) 距离矩阵将为 5x5。初始化这个矩阵,使用for循环计算这5个点之间的欧氏距离,并将它们填充到距离矩阵中。

n = df.shape[1] # this number is 5 for the dataset you provided
dm = np.zeros((n,n)) # initialize the distance matrix to zero

for i in range(n):
    for j in range(n):
        dm[i,j] = np.sqrt(np.sum((df.iloc[:,i] - df.iloc[:,j])**2))

dm 输出为:

> dm
array([[ 0.        ,  3.87298335, 11.61895004, 15.49193338,  3.87298335],
       [ 3.87298335,  0.        ,  7.74596669, 11.61895004,  7.74596669],
       [11.61895004,  7.74596669,  0.        ,  3.87298335, 15.49193338],
       [15.49193338, 11.61895004,  3.87298335,  0.        , 19.36491673],
       [ 3.87298335,  7.74596669, 15.49193338, 19.36491673,  0.        ]])