更省时的方式来实现代码?

More time efficient manner to implement code?

是否有更省时的方式来实现以下玩具代码,即,我可以使用它们的 numpy 函数(或其他函数)来代替嵌套的 for 循环吗?

import numpy as np
import time

x =np.asarray([[0.13528165, 0.75680003, 0.34140618, 0.27220936],
 [0.05577458, 0.10935562, 0.10391207, 0.27655284],
 [0.69261246, 0.473227  , 0.74132719, 0.49142857],
 [0.47410374, 0.16312079, 0.32911195, 0.9621932 ],
 [0.68697019, 0.29684091, 0.90821942, 0.17157798],
 [0.62682866, 0.50055864, 0.86398873, 0.70907045],
 [0.73800433, 0.92377443, 0.98588321, 0.84503027],
 [0.38787016, 0.13099305, 0.47687691, 0.54611905],
 [0.40795951, 0.43677015, 0.49634543, 0.1169693 ],
 [0.96947452, 0.64037515, 0.81471111, 0.85956936]])

clusters =[0, 0, 1, 0, 1, 1, 1, 0, 0, 1]

centroids =[[0.29219793, 0.31940793, 0.34953051, 0.43480875],
 [0.74277803, 0.56695522, 0.86282593, 0.61533533]]


tic=time.perf_counter()
d = np.zeros(x.shape[1])
for k in range(d.size):
    d[k] = 0
    for j in range(x.shape[0]):
        d[k] += abs(x[j][k] - centroids[clusters[j]][k])
print(d)
print(time.perf_counter()-tic)

#d = [1.24007222 1.96998689 0.88754977 2.41271772] #Output

如果这些嵌套的 for 循环是 if 语句,我该如何替换它们?

import numpy as np

d = np.array( [1.24007222, 1.96998689, 0.88754977, 2.41271772])
weights=np.array([0.25,0.25,0.25,0.25])
new_weights = np.zeros(weights.size)
eps=1e-3
beta=1.2
for k in range(new_weights.size):
    if abs(d[k]) < eps:
       continue
    for current_d in d:
        if abs(current_d) < eps:
           continue
        new_weights[k] += (d[k] / current_d) ** (1 / (beta - 1))
    new_weights[k] = 1 / new_weights[k]
weights = new_weights
print(weights)
#weights=[0.15482004 0.0153019  0.82432504 0.00555302] #output

通过阅读您的代码解决这个问题:

首先,将您的聚类标签和质心转换为数组

clusters = np.array([0, 0, 1, 0, 1, 1, 1, 0, 0, 1])

centroids = np.array([[0.29219793, 0.31940793, 0.34953051, 0.43480875],
 [0.74277803, 0.56695522, 0.86282593, 0.61533533]])

然后:

d = np.sum( np.abs(x - centroids[clusters]), axis=0)
# d = array([1.24007222, 1.9699869 , 0.88754978, 2.4127177 ])

我看到加速从 47 微秒到 7 微秒。


编辑

回答您进一步的问题,一种执行循环的矢量化方式:

new_weights = np.where( d < eps, 0, 1./np.sum( (d[None,:]/np.where(d<eps,np.inf, d)[:,None])** (1 / (beta - 1)), axis=0))
# = array([0.15482004, 0.0153019 , 0.82432504, 0.00555302])