计算后续成对坐标之间的累积欧氏距离

Question

我有以下两个数组：

X = array([37., 42., 31., 27., 37.])

Y = array([52., 57., 62., 68., 69.])

我也可以将它们组合如下：

XY = np.array((X, Y)).T

产生

 ([[37., 52.],
   [42., 57.],
   [31., 62.],
   [27., 68.],
   [37., 69.]])

我想计算所有点对之间的距离

例如我想这样做：

(
np.linalg.norm(np.array([37, 52]) - np.array([42, 57]))
+ np.linalg.norm(np.array([42, 57]) - np.array([31, 62]))
+ np.linalg.norm(np.array([31, 62]) - np.array([27, 68]))
+ np.linalg.norm(np.array([27, 68]) - np.array([37, 69]))
+ np.linalg.norm(np.array([37, 69]) - np.array([37, 52]))
)

然后生成 53.41509195750892

我写了一个这样做的函数：

def distance(X, Y):
    N = len(X)
    T = 0
    oldx, oldy = X[-1], Y[-1]
    for x, y in zip(X, Y):
        T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
        oldx = x
        oldy = y
    return T

print(distance(X, Y))

还生产 53.41509195750891

我很想知道是否有更多 elegant/efficient 方法来处理 numpy 数组操作。

编辑： 对不起，我给出的原始示例函数是错误的，现在应该是正确的

编辑： 谢谢大家的回答！这是我的大小为 50 的数组的基准测试，看起来 Dani 的答案是最快的，尽管 Akshay 的答案对于大小为 5 的数组更快。

def distance_charel(X, Y):
    N = len(X)
    T = 0
    oldx, oldy = X[-1], Y[-1]
    for x, y in zip(X, Y):
        T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
        oldx = x
        oldy = y
    return T

def distance_dani(X, Y):
    XY = np.array((X, Y)).T
    diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
    ss = np.power(diff, 2).sum(axis=1)
    res = np.sqrt(ss).sum()
    return res

def distance_akshay(X, Y):
    XY = np.array((X, Y)).T
    pairwise = pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
    total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
    return total

def distance_gwang(X, Y):
    XY = np.array((X, Y)).T
    return sum([sum((p1 - p2) ** 2) ** .5 for p1, p2 in zip(XY, XY[1:])])

def distane_andy(X, Y):
    arr = np.array((X, Y)).T
    return np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()

然后

print(distance_charel(X, Y))
print(distance_dani(X, Y))
print(distance_akshay(X, Y))
print(distance_gwang(X, Y))  # I think it misses the distance between last and first element
print(distane_andy(X, Y)) 
%timeit distance_charel(X, Y)
%timeit distance_dani(X, Y)
%timeit distance_akshay(X, Y)
%timeit distance_gwang(X, Y)
%timeit distane_andy(X, Y)

产出

2586.769647563161
2586.76964756316
2586.7696475631597
2568.8811037431624
2586.7696475631597
2.49 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
385 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

编辑： 我现在接受了 Dani 的回答，因为我发现他的代码对我来说是最好的（使用 numpy 向量运算，可读性强，而且速度最快（略有差距））情况。感谢大家的回答！

编辑： 我使用 280 coordinates

更新了基准

Answer 1

from scipy.spatial.distance import euclidean

X = np.array([37., 42., 31., 27., 37.])

Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T    
sum1 = euclidean(XY[0],XY[-1])


for i in range(len(XY)-1):
    sum1 += euclidean(XY[i],XY[i+1])

这应该可以做到，从最难的项开始算起。然后迭代更简单的。将它们全部加在一起。

作为支票 euclidean(XY[0],XY[1]) = 7.0710678118654755 与您提供的值相同。

Answer 2

您可以通过 hand 以向量化方式计算公式，方法是使用 diff, power and sqrt:

import numpy as np

# setup
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T


# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))

# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)

# find the square root and sum
res = np.sqrt(ss).sum()

print(res)

输出

53.41509195750891

第一步：

# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))

计算x1 - y1和x2 - y2，第二步：

# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)

将这些值提高到 2 的幂，即 (x1 - y1)^2，然后求和，最后：

# find the square root and sum
res = np.sqrt(ss).sum()

如其所说求平方根。

为了更好地理解它，让我们看一个更小的例子：

# setup
X = np.array([37., 42.])
Y = np.array([52., 57])
XY = np.array((X, Y)).T

diff = np.diff(XY, axis=0)
# [[5. 5.]] (42 - 37) (57 - 52)

ss = np.power(diff, 2).sum(axis=1)
# [50.] 5^2 + 5^2

res = np.sqrt(ss).sum()
# 7.0710678118654755

Answer 3

In [2]: df = pd.DataFrame([[37., 42., 31., 27., 37.],
   ...:                    [52., 57., 62., 68., 69.]]).T.rename(columns={0:"X", 1:"y"})
   ...: df
Out[2]: 
      X     y
0  37.0  52.0
1  42.0  57.0
2  31.0  62.0
3  27.0  68.0
4  37.0  69.0

In [3]: from scipy.spatial.distance import euclidean
   ...: np.sum([euclidean(df.iloc[i], df.iloc[i+1]) for i in range(len(df)-1)])
Out[3]: 36.41509195750892

Answer 4

您可以使用任何循环将其完全矢量化为一行代码，如下所示，使用广播 -

首先，(5,1,2) broadcasted with (1,5,2) -> (5,5,2)
用这个广播减去得到(5,5,2)
然后对(5,5,2)
对最后一个轴求和得到 (5,5)
终于开平方了！

接下来，你可以只取保持(1,2), (2,3) ...之间距离的移动对角线数组。总结一下，因为你想把距离加回到第一个，把它加到 [0,-1]

的值

#This get all pairwise distances calculated with broadcasting
pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))

#This takes sum of the first diagonal elements instead of 0th
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
print(total)

53.41509195750892

另一种方法如下，但上述方法仍然更快 -

np.sum(np.sqrt(np.sum(np.square(np.diff(np.vstack([XY,XY[0]]), axis=0)), axis=-1)))
#The np.vstack adds the first coordinate into the array so that you can
#calculate the distance from the last to the first again

基准-

Akshay Sehgal - 每个循环 19.9 µs ± 2.53 µs（7 次运行的平均值 ± 标准差，每次 10000 次循环）
Gwang - 每个循环 21.5 µs ± 1.01 µs（7 次运行的平均值 ± 标准偏差，每次 10000 次循环）
ombk - 每个循环 60.4 µs ± 5.72 µs（7 次运行的平均值 ± 标准偏差，每次 10000 次循环）
Dani Mesejo - 每个循环 16.4 µs ± 6.12 µs（7 次运行的平均值 ± 标准偏差，每次 10000 次循环）
Andy L - 每个循环 17.6 µs ± 3.08 µs（7 次运行的平均值 ± 标准偏差，每次 10000 次循环）

正如预期的那样，numpy 向量化始终占据主导地位！ Gj丹妮！

Answer 5

#preparation:
x = np.array([37., 42., 31., 27., 37.]) 
y = np.array([52., 57., 62.,68.,69.]) 
xy = np.array((x, y)).T 

def euclidean_distance(p1, p2): 
     return sum((p1 - p2) ** 2) ** .5

您可以使用函数式编程更优雅地完成它。在这里，您想 reduce 遍历 xy:

中连续元素对的列表

from functools import reduce
from operator import add
reduce(add, [euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
## 36.41509195750892

reduce 遍历列表 [1, 2, 3, 4, ..., k] 通过应用二元函数 func(a, b) 可以做到这一点： func( ... func(func(func(func(1, 2), 3), 4), 5) ..., k).

@DaniMesejo 指出 reduce(add, lst) 只是 sum(lst).

所以就更简单了:

sum([euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])

这里最好的技巧实际上是 zip(xy, xy[1:]) 从列表 [1, 2, 3, 4, ..., k] 创建对：[(1, 2), (2, 3), (3, 4), ... (k-1, k)]

Answer 6

您可以将 np.roll 与 np.linalg.norm 和 sum

一起使用

#arr = np.stack([X,Y], axis=1)

arr = np.array((X, Y)).T #as suggested in the comment

Out[50]:
array([[37., 52.],
       [42., 57.],
       [31., 62.],
       [27., 68.],
       [37., 69.]])

In [52]: np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
Out[52]: 53.41509195750892

计算后续成对坐标之间的累积欧氏距离

Compute cumulative euclidean distances between subsequent pairwise coordinates

python

numpy

distance

vectorization

scipy

我想计算所有点对之间的距离

我很想知道是否有更多 elegant/efficient 方法来处理 numpy 数组操作。