在数组第 4 列中找到最接近的值，其中数组第 1 列和第 2 列与另一个数组的数据匹配。创建一个新数组提取结果

Question

我有一个数组格式的广泛数据集 a=[X, Y, Z, value]。同时，我还有另一个数组 b=[X,Y]，其中包含同一数据集的所有唯一坐标组合 (X,Y)。

我想生成一个新数组，其中对于给定的 z=100，它包含原始数组 a[X,Y,Z,value] 的记录，其中 Z 最接近给定的 z=每个可能的 X、Y 组合 100 个。

这样做的目的是在给定深度提取原始数据集的 Z 切片

对期望结果的描述应该是这样的

np.in1d(a[:,0], b[:,0]) and np.in1d(a[:,1], b[:,1]) # for each row
#where both these two arguments are True

a[:,2] == z+min(abs(a[:,2]-z))) # find the rows where Z is closest to z=100
#and append these rows to a new array c[X,Y,Z,value]

想法是首先找到唯一的 X,Y 数据，并有效地将数据集分割成域的 X,Y 列。然后搜索每一列以提取 Z 最接近给定 z 值的行

任何建议，即使是非常不同的方法，我们也将不胜感激

Answer 1

from pylab import *
a=array(rand(10000,4))*[[20,20,200,1]] # data in a 20*20*200 space
a[:,:2] //= 1 # int coords for X,Y
bj=a.T[0]+1j*a.T[1] # trick for sorting on 2 cols.
b=np.unique(bj)
ib=bj.argsort() #  indices for sorting /X,Y
splits=bj[ib].searchsorted(b) # indices for splitting.
xy=np.split(a[ib],splits)  # list of subsets of data grouped by (x,y)
c=array([s[abs(s.T[2]-100).argmin()] for s in xy[1:]]) #locate the good point in each list 
print(c[:10])

给出：

[[   0.            0.          110.44068611    0.71688432]
 [   0.            1.          103.64897184    0.31287547]
 [   0.            2.          100.85948189    0.74353677]
 [   0.            3.          105.28286975    0.98118126]
 [   0.            4.           99.1188121     0.85775638]
 [   0.            5.          107.53733825    0.61015178]
 [   0.            6.          100.82311896    0.25322798]
 [   0.            7.          104.16430907    0.26522796]
 [   0.            8.          100.47370563    0.2433701 ]
 [   0.            9.          102.40445547    0.89028359]]

在更高层次上，pandas :

labels=list('xyzt')
df=pd.DataFrame(a,columns=labels)
df['dist']=abs(df.z-100)
indices=df.groupby(['x','y'])['dist'].apply(argmin)
c=df.ix[indices][labels].reset_index(drop=True)
print(c.head())

为

   x  y           z         t
0  0  0  110.440686  0.716884
1  0  1  103.648972  0.312875
2  0  2  100.859482  0.743537
3  0  3  105.282870  0.981181
4  0  4   99.118812  0.857756

更清晰，但速度慢了 8 倍。

在数组第 4 列中找到最接近的值，其中数组第 1 列和第 2 列与另一个数组的数据匹配。创建一个新数组提取结果

Find closest value in array column 4 where array column 1 and 2 match data of another array. Create a new array extracting the results

arrays

comparison

numpy

geospatial

python-2.7