我如何 select 使用 python 从一组点等距点
How can i select equal distances points from a set of points using python
假设有一个点或节点列表。他们每个人都有 x y 和 z 坐标。两点 i 和 j 之间的距离等于 D(i,j)= sqrt((xi-xj)^2+(yi-yj)^2+(zi-zj)^2)
。这里我得到了400000个数据点。
现在,我想要 select 一组这些节点,它们之间的距离相等(之前指定的间距 --> 0.05)。因此 selected 点是均匀分布的。
如果运行使用while循环,完成整个数据集大约需要3小时。
正在寻找最快的方法。
no_rows = len(df)
i = 1
while i < no_rows:
a1 = df.iloc[i-1, 1]
a2 = df.iloc[i, 1]
b1 = df.iloc[i-1, 2]
b2 = df.iloc[i, 2]
c1 = df.iloc[i-1, 3]
c2 = df.iloc[i, 3]
dist = np.round(((a2-a1)**2+(b2-b1)**2+(c2-c1)**2)**0.5,5)
df.iloc[i, 6]= dist
if dist < 0.05000:
df = df.drop(i)
df.reset_index(drop = True, inplace = True)
no_rows = len(df)
i = i-1
i+=1
编辑
一种选择是直接使用 pandas 并将数据框合并到自身上。像 :
import pandas as pd
import numpy as np
df = pd.DataFrame([
[131.404866,16.176877,128.120177 ],
[131.355045,16.176441,128.115972 ],
[131.305224,16.176005,128.111767 ],
[131.255403,16.175569,128.107562 ],
[131.205582,16.175133,128.103357 ],
[131.158858,16.174724,128.099413 ],
[131.15576,16.174702,128.09916 ],
[131.105928,16.174342,128.095089 ],
[131.05988,16.174009,128.091328 ],
[131.056094,16.173988,128.09103 ],
[131.006249,16.173712,128.087107 ],
[130.956404,16.173436,128.083184],
],
columns=['x', 'y', 'z']
)
df.reset_index(drop=False, inplace=True)
dist = 0.05
df['CROSS'] = 1
df = df.merge(df, on="CROSS")
df.reset_index(drop=True, inplace=True)
df['distance'] = np.round(
np.sqrt(
np.square(df['x_x'] - df['x_y'])
+ np.square(df['y_x']-df['y_y'])
+ np.square(df['z_x']-df['z_y'])
),
5
)
#drop values where distances are = 0 (same points)
ix = df[df.distance==0].index
df.drop(ix, inplace=True)
print('These are all pair of points which are matching the distance', dist)
ix = df[df.distance.astype(float)==dist].index
df.sort_values('distance', inplace=True)
print(df.loc[ix])
print('-'*50)
points = pd.DataFrame(
df.loc[ix, ['index_x', 'x_x', 'y_x', 'z_x']].values.tolist()
+ df.loc[ix, ['index_y', 'x_y', 'y_y', 'z_y']].values.tolist(),
columns=['index', 'x', 'y', 'z'])
points.drop_duplicates(keep='first', inplace=True)
print('These are all the points which have another at distance', dist)
print(points)
Numpy 的函数比任何循环都快得多,并且允许您同时处理整个数据集。
另一种可能是使用 geopandas(它也可以非常快,但我不确定这里会是这种情况:最快的方法涉及 pyproj 的距离计算(用 C 编写)和我认为3D没有偏角)
假设有一个点或节点列表。他们每个人都有 x y 和 z 坐标。两点 i 和 j 之间的距离等于 D(i,j)= sqrt((xi-xj)^2+(yi-yj)^2+(zi-zj)^2)
。这里我得到了400000个数据点。
现在,我想要 select 一组这些节点,它们之间的距离相等(之前指定的间距 --> 0.05)。因此 selected 点是均匀分布的。
如果运行使用while循环,完成整个数据集大约需要3小时。 正在寻找最快的方法。
no_rows = len(df)
i = 1
while i < no_rows:
a1 = df.iloc[i-1, 1]
a2 = df.iloc[i, 1]
b1 = df.iloc[i-1, 2]
b2 = df.iloc[i, 2]
c1 = df.iloc[i-1, 3]
c2 = df.iloc[i, 3]
dist = np.round(((a2-a1)**2+(b2-b1)**2+(c2-c1)**2)**0.5,5)
df.iloc[i, 6]= dist
if dist < 0.05000:
df = df.drop(i)
df.reset_index(drop = True, inplace = True)
no_rows = len(df)
i = i-1
i+=1
编辑
一种选择是直接使用 pandas 并将数据框合并到自身上。像 :
import pandas as pd
import numpy as np
df = pd.DataFrame([
[131.404866,16.176877,128.120177 ],
[131.355045,16.176441,128.115972 ],
[131.305224,16.176005,128.111767 ],
[131.255403,16.175569,128.107562 ],
[131.205582,16.175133,128.103357 ],
[131.158858,16.174724,128.099413 ],
[131.15576,16.174702,128.09916 ],
[131.105928,16.174342,128.095089 ],
[131.05988,16.174009,128.091328 ],
[131.056094,16.173988,128.09103 ],
[131.006249,16.173712,128.087107 ],
[130.956404,16.173436,128.083184],
],
columns=['x', 'y', 'z']
)
df.reset_index(drop=False, inplace=True)
dist = 0.05
df['CROSS'] = 1
df = df.merge(df, on="CROSS")
df.reset_index(drop=True, inplace=True)
df['distance'] = np.round(
np.sqrt(
np.square(df['x_x'] - df['x_y'])
+ np.square(df['y_x']-df['y_y'])
+ np.square(df['z_x']-df['z_y'])
),
5
)
#drop values where distances are = 0 (same points)
ix = df[df.distance==0].index
df.drop(ix, inplace=True)
print('These are all pair of points which are matching the distance', dist)
ix = df[df.distance.astype(float)==dist].index
df.sort_values('distance', inplace=True)
print(df.loc[ix])
print('-'*50)
points = pd.DataFrame(
df.loc[ix, ['index_x', 'x_x', 'y_x', 'z_x']].values.tolist()
+ df.loc[ix, ['index_y', 'x_y', 'y_y', 'z_y']].values.tolist(),
columns=['index', 'x', 'y', 'z'])
points.drop_duplicates(keep='first', inplace=True)
print('These are all the points which have another at distance', dist)
print(points)
Numpy 的函数比任何循环都快得多,并且允许您同时处理整个数据集。
另一种可能是使用 geopandas(它也可以非常快,但我不确定这里会是这种情况:最快的方法涉及 pyproj 的距离计算(用 C 编写)和我认为3D没有偏角)