计算到 python 中某些点的最近距离
Calculate nearest distance to certain points in python
我有一个如下所示的数据集,每个样本都有x和y值以及对应的结果
Sr. X Y Resut
1 2 12 Positive
2 4 3 positive
....
可视化
网格大小为 12 * 8
我如何计算每个样本与红点(正点)的最近距离?
红色=正,
蓝色 = 负数
Sr. X Y Result Nearest-distance-red
1 2 23 Positive ?
2 4 3 Negative ?
....
数据集
cKDTree for scipy 可以为您计算该距离。这些方面的东西应该有效:
df['Distance_To_Red'] = cKDTree(coordinates_of_red_points).query((df['x'], df['y']), k=1)
有示例数据时会容易得多,下次一定要包含它。
我生成随机数据
import numpy as np
import pandas as pd
import sklearn
x = np.linspace(1,50)
y = np.linspace(1,50)
GRID = np.meshgrid(x,y)
grid_colors = 1* ( np.random.random(GRID[0].size) > .8 )
sample_data = pd.DataFrame( {'X': GRID[0].flatten(), 'Y':GRID[1].flatten(), 'grid_color' : grid_colors})
sample_data.plot.scatter(x="X",y='Y', c='grid_color', colormap='bwr', figsize=(10,10))
BallTree(或 KDTree)可以创建一个树来查询
from sklearn.neighbors import BallTree
red_points = sample_data[sample_data.grid_color == 1]
blue_points = sample_data[sample_data.grid_color != 1]
tree = BallTree(red_points[['X','Y']], leaf_size=15, metric='minkowski')
并与
一起使用
distance, index = tree.query(sample_data[['X','Y']], k=1)
现在将其添加到 DataFrame
sample_data['nearest_point_distance'] = distance
sample_data['nearest_point_X'] = red_points.X.values[index]
sample_data['nearest_point_Y'] = red_points.Y.values[index]
这给出了
X Y grid_color nearest_point_distance nearest_point_X \
0 1.0 1.0 0 2.0 3.0
1 2.0 1.0 0 1.0 3.0
2 3.0 1.0 1 0.0 3.0
3 4.0 1.0 0 1.0 3.0
4 5.0 1.0 1 0.0 5.0
nearest_point_Y
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
修改有红点不自己找;
找到最近的 k=2
而不是 k=1
;
distance, index = tree.query(sample_data[['X','Y']], k=2)
并且,在 numpy
索引的帮助下,使红点使用第二个而不是第一个;
sample_size = GRID[0].size
sample_data['nearest_point_distance'] = distance[np.arange(sample_size),sample_data.grid_color]
sample_data['nearest_point_X'] = red_points.X.values[index[np.arange(sample_size),sample_data.grid_color]]
sample_data['nearest_point_Y'] = red_points.Y.values[index[np.arange(sample_size),sample_data.grid_color]]
输出类型相同,但由于随机性,与之前制作的图片不一致。
我有一个如下所示的数据集,每个样本都有x和y值以及对应的结果
Sr. X Y Resut
1 2 12 Positive
2 4 3 positive
....
可视化
网格大小为 12 * 8
我如何计算每个样本与红点(正点)的最近距离?
红色=正, 蓝色 = 负数
Sr. X Y Result Nearest-distance-red
1 2 23 Positive ?
2 4 3 Negative ?
....
数据集
cKDTree for scipy 可以为您计算该距离。这些方面的东西应该有效:
df['Distance_To_Red'] = cKDTree(coordinates_of_red_points).query((df['x'], df['y']), k=1)
有示例数据时会容易得多,下次一定要包含它。
我生成随机数据
import numpy as np
import pandas as pd
import sklearn
x = np.linspace(1,50)
y = np.linspace(1,50)
GRID = np.meshgrid(x,y)
grid_colors = 1* ( np.random.random(GRID[0].size) > .8 )
sample_data = pd.DataFrame( {'X': GRID[0].flatten(), 'Y':GRID[1].flatten(), 'grid_color' : grid_colors})
sample_data.plot.scatter(x="X",y='Y', c='grid_color', colormap='bwr', figsize=(10,10))
BallTree(或 KDTree)可以创建一个树来查询
from sklearn.neighbors import BallTree
red_points = sample_data[sample_data.grid_color == 1]
blue_points = sample_data[sample_data.grid_color != 1]
tree = BallTree(red_points[['X','Y']], leaf_size=15, metric='minkowski')
并与
一起使用distance, index = tree.query(sample_data[['X','Y']], k=1)
现在将其添加到 DataFrame
sample_data['nearest_point_distance'] = distance
sample_data['nearest_point_X'] = red_points.X.values[index]
sample_data['nearest_point_Y'] = red_points.Y.values[index]
这给出了
X Y grid_color nearest_point_distance nearest_point_X \
0 1.0 1.0 0 2.0 3.0
1 2.0 1.0 0 1.0 3.0
2 3.0 1.0 1 0.0 3.0
3 4.0 1.0 0 1.0 3.0
4 5.0 1.0 1 0.0 5.0
nearest_point_Y
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
修改有红点不自己找;
找到最近的 k=2
而不是 k=1
;
distance, index = tree.query(sample_data[['X','Y']], k=2)
并且,在 numpy
索引的帮助下,使红点使用第二个而不是第一个;
sample_size = GRID[0].size
sample_data['nearest_point_distance'] = distance[np.arange(sample_size),sample_data.grid_color]
sample_data['nearest_point_X'] = red_points.X.values[index[np.arange(sample_size),sample_data.grid_color]]
sample_data['nearest_point_Y'] = red_points.Y.values[index[np.arange(sample_size),sample_data.grid_color]]
输出类型相同,但由于随机性,与之前制作的图片不一致。