我怎样才能加快我的 3D 欧氏距离矩阵代码
How can I speed up my 3D Euclidean distance matrix code
我已经创建了代码来根据每个时间步长(帧)的 x、y、z 坐标(TX、TY、TZ)计算所有对象(tagID)彼此之间的距离。虽然这段代码确实有效,但它对于我需要的东西来说太慢了。我目前的测试数据,大约有538,792行数据,我的实际数据会是大约6,880,000行数据。目前制作这些距离矩阵需要几分钟(可能是 10-15 分钟),而且由于我将有 40 组数据,我想加快速度。
当前代码如下:
# Sample data frame with correct columns:
data2 = ({'Frame' :[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7],
'tagID' : ['nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3'],
'TX':[5,2,3,4,5,6,7,5,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
'TY':[4,2,3,4,5,9,3,2,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
'TZ':[2,3,4,6,7,8,4,3,np.nan,5,2,3,4,5,6,7,5,4,8,3,2]})
df = pd.DataFrame(data2)
Frame tagID TX TY TZ
0 1 nb1 5.0 4.0 2.0
1 1 nb2 2.0 2.0 3.0
2 1 nb3 3.0 3.0 4.0
3 2 nb1 4.0 4.0 6.0
4 2 nb2 5.0 5.0 7.0
5 2 nb3 6.0 9.0 8.0
6 3 nb1 7.0 3.0 4.0
7 3 nb2 5.0 2.0 3.0
8 3 nb3 NaN NaN NaN
9 4 nb1 5.0 5.0 5.0
10 4 nb2 2.0 2.0 2.0
11 4 nb3 3.0 3.0 3.0
12 5 nb1 4.0 4.0 4.0
13 5 nb2 5.0 5.0 5.0
14 5 nb3 6.0 6.0 6.0
15 6 nb1 7.0 7.0 7.0
16 6 nb2 5.0 5.0 5.0
17 6 nb3 4.0 4.0 4.0
18 7 nb1 8.0 8.0 8.0
19 7 nb2 3.0 3.0 3.0
20 7 nb3 2.0 2.0 2.0
# Calculate the squared distance between all x points:
TXdf = []
for i in range(1,df['Frame'].max()+1):
boox = df['Frame'] == i
tempx = df[boox]
tx=tempx['TX'].apply(lambda x : (tempx['TX']-x)**2)
tx.columns=tempx.tagID
tx['ID']=tempx.tagID
tx['Frame'] = tempx.Frame
TXdf.append(tx)
TXdfFinal = pd.concat(TXdf) # once all df for every
print(TXdfFinal)
TXdfFinal.info()
# Calculate the squared distance between all y points:
print('y-diff sum')
TYdf = []
for i in range(1,df['Frame'].max()+1):
booy = df['Frame'] == i
tempy = df[booy]
ty=tempy['TY'].apply(lambda x : (tempy['TY']-x)**2)
ty.columns=tempy.tagID
ty['ID']=tempy.tagID
ty['Frame'] = tempy.Frame
TYdf.append(ty)
TYdfFinal = pd.concat(TYdf)
print(TYdfFinal)
TYdfFinal.info()
# Calculate the squared distance between all z points:
print('z-diff sum')
TZdf = []
for i in range(1,df['Frame'].max()+1):
booz = df['Frame'] == i
tempz = df[booz]
tz=tempz['TZ'].apply(lambda x : (tempz['TZ']-x)**2)
tz.columns=tempz.tagID
tz['ID']=tempz.tagID
tz['Frame'] = tempz.Frame
TZdf.append(tz)
TZdfFinal = pd.concat(TZdf)
# Add all squared differences together:
euSum = TXdfFinal + TYdfFinal + TZdfFinal
# Square root the sum of the differences of each coordinate for Euclidean distance and add Frame and ID columns back on:
euDist = euSum.loc[:, euSum.columns !='ID'].apply(lambda x: x**0.5)
euDist['tagID'] = list(TXdfFinal['ID'])
euDist['Frame'] = list(TXdfFinal['Frame'])
# Add the distance matrix to the original dataframe based on Frame and ID columns:
new_df = pd.merge(df, euDist, how='left', left_on=['Frame','tagID'], right_on = ['Frame','tagID'])
Frame tagID TX TY TZ nb1 nb2 nb3
0 1 nb1 5.0 4.0 2.0 0.0000 3.7417 3.0000
1 1 nb2 2.0 2.0 3.0 3.7417 0.0000 1.7321
2 1 nb3 3.0 3.0 4.0 3.0000 1.7321 0.0000
3 2 nb1 4.0 4.0 6.0 0.0000 1.7321 5.7446
4 2 nb2 5.0 5.0 7.0 1.7321 0.0000 4.2426
5 2 nb3 6.0 9.0 8.0 5.7446 4.2426 0.0000
6 3 nb1 7.0 3.0 4.0 0.0000 2.4495 NaN
7 3 nb2 5.0 2.0 3.0 2.4495 0.0000 NaN
8 3 nb3 NaN NaN NaN NaN NaN NaN
9 4 nb1 5.0 5.0 5.0 0.0000 5.1962 3.4641
10 4 nb2 2.0 2.0 2.0 5.1962 0.0000 1.7321
11 4 nb3 3.0 3.0 3.0 3.4641 1.7321 0.0000
12 5 nb1 4.0 4.0 4.0 0.0000 1.7321 3.4641
13 5 nb2 5.0 5.0 5.0 1.7321 0.0000 1.7321
14 5 nb3 6.0 6.0 6.0 3.4641 1.7321 0.0000
15 6 nb1 7.0 7.0 7.0 0.0000 3.4641 5.1962
16 6 nb2 5.0 5.0 5.0 3.4641 0.0000 1.7321
17 6 nb3 4.0 4.0 4.0 5.1962 1.7321 0.0000
18 7 nb1 8.0 8.0 8.0 0.0000 8.6603 10.3923
19 7 nb2 3.0 3.0 3.0 8.6603 0.0000 1.7321
20 7 nb3 2.0 2.0 2.0 10.3923 1.7321 0.0000
我尝试同时使用:euclidean() 和 pdist() with metric='euclidean' 但无法正确迭代。
任何关于如何获得相同结果但更快的建议将不胜感激。
您可以尝试将 for 循环的数量从 3 次减少到 1 次。看起来您正在对同一项目进行 3 次迭代。尝试在一个循环中完成所有计算
这应该会减少三分之二的时间。
方法来自 scipy
from scipy.spatial import distance
df['nb1'],df['nb2'],df['nb3']=np.concatenate([distance.cdist(y, y, metric='euclidean') for x , y in df[['TX','TY','TZ']].groupby(df['Frame'])]).T
我已经创建了代码来根据每个时间步长(帧)的 x、y、z 坐标(TX、TY、TZ)计算所有对象(tagID)彼此之间的距离。虽然这段代码确实有效,但它对于我需要的东西来说太慢了。我目前的测试数据,大约有538,792行数据,我的实际数据会是大约6,880,000行数据。目前制作这些距离矩阵需要几分钟(可能是 10-15 分钟),而且由于我将有 40 组数据,我想加快速度。
当前代码如下:
# Sample data frame with correct columns:
data2 = ({'Frame' :[1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7],
'tagID' : ['nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3','nb1','nb2','nb3'],
'TX':[5,2,3,4,5,6,7,5,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
'TY':[4,2,3,4,5,9,3,2,np.nan,5,2,3,4,5,6,7,5,4,8,3,2],
'TZ':[2,3,4,6,7,8,4,3,np.nan,5,2,3,4,5,6,7,5,4,8,3,2]})
df = pd.DataFrame(data2)
Frame tagID TX TY TZ
0 1 nb1 5.0 4.0 2.0
1 1 nb2 2.0 2.0 3.0
2 1 nb3 3.0 3.0 4.0
3 2 nb1 4.0 4.0 6.0
4 2 nb2 5.0 5.0 7.0
5 2 nb3 6.0 9.0 8.0
6 3 nb1 7.0 3.0 4.0
7 3 nb2 5.0 2.0 3.0
8 3 nb3 NaN NaN NaN
9 4 nb1 5.0 5.0 5.0
10 4 nb2 2.0 2.0 2.0
11 4 nb3 3.0 3.0 3.0
12 5 nb1 4.0 4.0 4.0
13 5 nb2 5.0 5.0 5.0
14 5 nb3 6.0 6.0 6.0
15 6 nb1 7.0 7.0 7.0
16 6 nb2 5.0 5.0 5.0
17 6 nb3 4.0 4.0 4.0
18 7 nb1 8.0 8.0 8.0
19 7 nb2 3.0 3.0 3.0
20 7 nb3 2.0 2.0 2.0
# Calculate the squared distance between all x points:
TXdf = []
for i in range(1,df['Frame'].max()+1):
boox = df['Frame'] == i
tempx = df[boox]
tx=tempx['TX'].apply(lambda x : (tempx['TX']-x)**2)
tx.columns=tempx.tagID
tx['ID']=tempx.tagID
tx['Frame'] = tempx.Frame
TXdf.append(tx)
TXdfFinal = pd.concat(TXdf) # once all df for every
print(TXdfFinal)
TXdfFinal.info()
# Calculate the squared distance between all y points:
print('y-diff sum')
TYdf = []
for i in range(1,df['Frame'].max()+1):
booy = df['Frame'] == i
tempy = df[booy]
ty=tempy['TY'].apply(lambda x : (tempy['TY']-x)**2)
ty.columns=tempy.tagID
ty['ID']=tempy.tagID
ty['Frame'] = tempy.Frame
TYdf.append(ty)
TYdfFinal = pd.concat(TYdf)
print(TYdfFinal)
TYdfFinal.info()
# Calculate the squared distance between all z points:
print('z-diff sum')
TZdf = []
for i in range(1,df['Frame'].max()+1):
booz = df['Frame'] == i
tempz = df[booz]
tz=tempz['TZ'].apply(lambda x : (tempz['TZ']-x)**2)
tz.columns=tempz.tagID
tz['ID']=tempz.tagID
tz['Frame'] = tempz.Frame
TZdf.append(tz)
TZdfFinal = pd.concat(TZdf)
# Add all squared differences together:
euSum = TXdfFinal + TYdfFinal + TZdfFinal
# Square root the sum of the differences of each coordinate for Euclidean distance and add Frame and ID columns back on:
euDist = euSum.loc[:, euSum.columns !='ID'].apply(lambda x: x**0.5)
euDist['tagID'] = list(TXdfFinal['ID'])
euDist['Frame'] = list(TXdfFinal['Frame'])
# Add the distance matrix to the original dataframe based on Frame and ID columns:
new_df = pd.merge(df, euDist, how='left', left_on=['Frame','tagID'], right_on = ['Frame','tagID'])
Frame tagID TX TY TZ nb1 nb2 nb3
0 1 nb1 5.0 4.0 2.0 0.0000 3.7417 3.0000
1 1 nb2 2.0 2.0 3.0 3.7417 0.0000 1.7321
2 1 nb3 3.0 3.0 4.0 3.0000 1.7321 0.0000
3 2 nb1 4.0 4.0 6.0 0.0000 1.7321 5.7446
4 2 nb2 5.0 5.0 7.0 1.7321 0.0000 4.2426
5 2 nb3 6.0 9.0 8.0 5.7446 4.2426 0.0000
6 3 nb1 7.0 3.0 4.0 0.0000 2.4495 NaN
7 3 nb2 5.0 2.0 3.0 2.4495 0.0000 NaN
8 3 nb3 NaN NaN NaN NaN NaN NaN
9 4 nb1 5.0 5.0 5.0 0.0000 5.1962 3.4641
10 4 nb2 2.0 2.0 2.0 5.1962 0.0000 1.7321
11 4 nb3 3.0 3.0 3.0 3.4641 1.7321 0.0000
12 5 nb1 4.0 4.0 4.0 0.0000 1.7321 3.4641
13 5 nb2 5.0 5.0 5.0 1.7321 0.0000 1.7321
14 5 nb3 6.0 6.0 6.0 3.4641 1.7321 0.0000
15 6 nb1 7.0 7.0 7.0 0.0000 3.4641 5.1962
16 6 nb2 5.0 5.0 5.0 3.4641 0.0000 1.7321
17 6 nb3 4.0 4.0 4.0 5.1962 1.7321 0.0000
18 7 nb1 8.0 8.0 8.0 0.0000 8.6603 10.3923
19 7 nb2 3.0 3.0 3.0 8.6603 0.0000 1.7321
20 7 nb3 2.0 2.0 2.0 10.3923 1.7321 0.0000
我尝试同时使用:euclidean() 和 pdist() with metric='euclidean' 但无法正确迭代。
任何关于如何获得相同结果但更快的建议将不胜感激。
您可以尝试将 for 循环的数量从 3 次减少到 1 次。看起来您正在对同一项目进行 3 次迭代。尝试在一个循环中完成所有计算
这应该会减少三分之二的时间。
方法来自 scipy
from scipy.spatial import distance
df['nb1'],df['nb2'],df['nb3']=np.concatenate([distance.cdist(y, y, metric='euclidean') for x , y in df[['TX','TY','TZ']].groupby(df['Frame'])]).T