计算后续成对坐标之间的累积欧氏距离

Compute cumulative euclidean distances between subsequent pairwise coordinates

我有以下两个数组:

X = array([37., 42., 31., 27., 37.])

Y = array([52., 57., 62., 68., 69.])

我也可以将它们组合如下:

XY = np.array((X, Y)).T

产生

 ([[37., 52.],
   [42., 57.],
   [31., 62.],
   [27., 68.],
   [37., 69.]])

我想计算所有点对之间的距离

例如我想这样做:

(
np.linalg.norm(np.array([37, 52]) - np.array([42, 57]))
+ np.linalg.norm(np.array([42, 57]) - np.array([31, 62]))
+ np.linalg.norm(np.array([31, 62]) - np.array([27, 68]))
+ np.linalg.norm(np.array([27, 68]) - np.array([37, 69]))
+ np.linalg.norm(np.array([37, 69]) - np.array([37, 52]))
)

然后生成 53.41509195750892

我写了一个这样做的函数:

def distance(X, Y):
    N = len(X)
    T = 0
    oldx, oldy = X[-1], Y[-1]
    for x, y in zip(X, Y):
        T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
        oldx = x
        oldy = y
    return T

print(distance(X, Y))

还生产 53.41509195750891

我很想知道是否有更多 elegant/efficient 方法来处理 numpy 数组操作。

编辑: 对不起,我给出的原始示例函数是错误的,现在应该是正确的

编辑: 谢谢大家的回答!这是我的大小为 50 的数组的基准测试,看起来 Dani 的答案是最快的,尽管 Akshay 的答案对于大小为 5 的数组更快。

def distance_charel(X, Y):
    N = len(X)
    T = 0
    oldx, oldy = X[-1], Y[-1]
    for x, y in zip(X, Y):
        T += np.linalg.norm((np.array([x,y])-np.array([oldx,oldy])))
        oldx = x
        oldy = y
    return T

def distance_dani(X, Y):
    XY = np.array((X, Y)).T
    diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))
    ss = np.power(diff, 2).sum(axis=1)
    res = np.sqrt(ss).sum()
    return res

def distance_akshay(X, Y):
    XY = np.array((X, Y)).T
    pairwise = pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))
    total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
    return total

def distance_gwang(X, Y):
    XY = np.array((X, Y)).T
    return sum([sum((p1 - p2) ** 2) ** .5 for p1, p2 in zip(XY, XY[1:])])

def distane_andy(X, Y):
    arr = np.array((X, Y)).T
    return np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()

然后

print(distance_charel(X, Y))
print(distance_dani(X, Y))
print(distance_akshay(X, Y))
print(distance_gwang(X, Y))  # I think it misses the distance between last and first element
print(distane_andy(X, Y)) 
%timeit distance_charel(X, Y)
%timeit distance_dani(X, Y)
%timeit distance_akshay(X, Y)
%timeit distance_gwang(X, Y)
%timeit distane_andy(X, Y)

产出

2586.769647563161
2586.76964756316
2586.7696475631597
2568.8811037431624
2586.7696475631597
2.49 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
385 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.09 ms ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
31.2 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

编辑: 我现在接受了 Dani 的回答,因为我发现他的代码对我来说是最好的(使用 numpy 向量运算,可读性强,而且速度最快(略有差距))情况。感谢大家的回答!

编辑: 我使用 280 coordinates

更新了基准
from scipy.spatial.distance import euclidean

X = np.array([37., 42., 31., 27., 37.])

Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T    
sum1 = euclidean(XY[0],XY[-1])


for i in range(len(XY)-1):
    sum1 += euclidean(XY[i],XY[i+1])

这应该可以做到,从最难的项开始算起。然后迭代更简单的。将它们全部加在一起。

作为支票 euclidean(XY[0],XY[1]) = 7.0710678118654755 与您提供的值相同。

您可以通过 hand 以向量化方式计算公式,方法是使用 diff, power and sqrt:

import numpy as np

# setup
X = np.array([37., 42., 31., 27., 37.])
Y = np.array([52., 57., 62., 68., 69.])
XY = np.array((X, Y)).T


# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))

# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)

# find the square root and sum
res = np.sqrt(ss).sum()

print(res)

输出

53.41509195750891

第一步:

# find the differences, prepend the last value at the front
diff = np.diff(XY, axis=0, prepend=XY[-1].reshape((1, -1)))

计算x1 - y1x2 - y2,第二步:

# raise to the power of 2 and sum
ss = np.power(diff, 2).sum(axis=1)

将这些值提高到 2 的幂,即 (x1 - y1)^2,然后求和,最后:

# find the square root and sum
res = np.sqrt(ss).sum()

如其所说求平方根。

为了更好地理解它,让我们看一个更小的例子:

# setup
X = np.array([37., 42.])
Y = np.array([52., 57])
XY = np.array((X, Y)).T

diff = np.diff(XY, axis=0)
# [[5. 5.]] (42 - 37) (57 - 52)

ss = np.power(diff, 2).sum(axis=1)
# [50.] 5^2 + 5^2

res = np.sqrt(ss).sum()
# 7.0710678118654755 
In [2]: df = pd.DataFrame([[37., 42., 31., 27., 37.],
   ...:                    [52., 57., 62., 68., 69.]]).T.rename(columns={0:"X", 1:"y"})
   ...: df
Out[2]: 
      X     y
0  37.0  52.0
1  42.0  57.0
2  31.0  62.0
3  27.0  68.0
4  37.0  69.0

In [3]: from scipy.spatial.distance import euclidean
   ...: np.sum([euclidean(df.iloc[i], df.iloc[i+1]) for i in range(len(df)-1)])
Out[3]: 36.41509195750892

您可以使用任何循环将其完全矢量化为一行代码,如下所示,使用广播 -

  1. 首先,(5,1,2) broadcasted with (1,5,2) -> (5,5,2)
  2. 用这个广播减去得到(5,5,2)
  3. 然后对(5,5,2)
  4. 中的每个元素求平方
  5. 对最后一个轴求和得到 (5,5)
  6. 终于开平方了!

接下来,你可以只取保持(1,2), (2,3) ...之间距离的移动对角线数组。总结一下,因为你想把距离加回到第一个,把它加到 [0,-1]

的值

#This get all pairwise distances calculated with broadcasting
pairwise = np.sqrt(np.sum(np.square(np.subtract(XY[:,None,:],XY[None,:,:])), axis=-1))

#This takes sum of the first diagonal elements instead of 0th
total = np.sum(np.diag(pairwise,k=1))+pairwise[0,-1]
print(total)
53.41509195750892

另一种方法如下,但上述方法仍然更快 -

np.sum(np.sqrt(np.sum(np.square(np.diff(np.vstack([XY,XY[0]]), axis=0)), axis=-1)))
#The np.vstack adds the first coordinate into the array so that you can
#calculate the distance from the last to the first again

基准-

  • Akshay Sehgal - 每个循环 19.9 µs ± 2.53 µs(7 次运行的平均值 ± 标准差,每次 10000 次循环)
  • Gwang - 每个循环 21.5 µs ± 1.01 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
  • ombk - 每个循环 60.4 µs ± 5.72 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
  • Dani Mesejo - 每个循环 16.4 µs ± 6.12 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
  • Andy L - 每个循环 17.6 µs ± 3.08 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)

正如预期的那样,numpy 向量化始终占据主导地位! Gj丹妮!

#preparation:
x = np.array([37., 42., 31., 27., 37.]) 
y = np.array([52., 57., 62.,68.,69.]) 
xy = np.array((x, y)).T 

def euclidean_distance(p1, p2): 
     return sum((p1 - p2) ** 2) ** .5 

您可以使用函数式编程更优雅地完成它。 在这里,您想 reduce 遍历 xy:

中连续元素对的列表
from functools import reduce
from operator import add
reduce(add, [euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])
## 36.41509195750892

reduce 遍历列表 [1, 2, 3, 4, ..., k] 通过应用二元函数 func(a, b) 可以做到这一点: func( ... func(func(func(func(1, 2), 3), 4), 5) ..., k).

@DaniMesejo 指出 reduce(add, lst) 只是 sum(lst).

所以就更简单了:

sum([euclidean_distance(p1, p2) for p1, p2 in zip(xy, xy[1:])])

这里最好的技巧实际上是 zip(xy, xy[1:]) 从列表 [1, 2, 3, 4, ..., k] 创建 对:[(1, 2), (2, 3), (3, 4), ... (k-1, k)]

您可以将 np.rollnp.linalg.normsum

一起使用
#arr = np.stack([X,Y], axis=1)

arr = np.array((X, Y)).T #as suggested in the comment

Out[50]:
array([[37., 52.],
       [42., 57.],
       [31., 62.],
       [27., 68.],
       [37., 69.]])

In [52]: np.linalg.norm(arr - np.roll(arr, -1, axis=0), axis=1).sum()
Out[52]: 53.41509195750892