如何在数据立方体中搜索具有非 NaN 值的最近邻点？

Question

我正在使用数据立方体，例如 data[x,y,z]。每个点都是一个通过时间的速度，[x,y] 网格对应于坐标。如果我选择一个坐标点 x 和 y，时间序列很可能是不完整的（有一些 NaN）。我创建了一个函数，它搜索具有值的最近邻居，并用它替换我的 xy 点的 NaN。但是我想知道是否有更有效的方法来编写具有相同功能的代码？

加入此消息的是该函数如何评估邻居的照片。每个点的编号代表它的排名（5 是评估的第 5 个邻居）。

我试过这样的事情：

假设我有一个 10x10x100 的数据立方体（100 是时间序列）：

import math
import numpy as np

Vel = np.random.rand(10,10,100)
Vel[4:7,4:7,0:10] = np.nan

x = 5
y = 5

Vpoint = Vel[5,5,:]


for i in range(0,len(Vpoint)):
        
        xx = x
        yy = y
        
        
        if math.isnan(Vel[xx,yy,i]) == True:
        
            for n in range(0,50):
                
                n = n + 1
            
                if n > 10:
                    raise Exception("The interpolation is way too far") 
            
                xx = x + n
                yy = y
            
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                xx = x
                yy = y + n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
                
                yy = y-n
                if math.isnan(Vel[xx,yy,i]) == False:
                    Vpoint[i] = Vel[xx,yy,i]
                    break
            
                for p in range(1,n):
                    
                    xx = x+p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    xx = x-p
                    if math.isnan(Vel[xx,yy,i]) == False:
                        Vpoint[i] = Vel[xx,yy,i]
                        break
                    
                    
                    for p in range(1,n):
                        
                        
                        yy = y+n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        yy = y-n
                        xx = x+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        xx = x-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        
                        xx = x+n
                        yy = y+p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break    
                        
                        
                        xx = x-n
                        yy = y-p
                        
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break
                        yy = y-p
                        if math.isnan(Vel[xx,yy,i]) == False:
                            Vpoint[i] = Vel[xx,yy,i]
                            break

        print(n,xx,yy)

Ps：实际上我的时间序列接近 330x300x38000，最近的非 nan 邻居每次都应该改变。

Answer 1

这是我想出的：

import numpy as np
from scipy import interpolate

Vel = np.random.rand(10,10,100) 
Vel[4:7,4:7,0:10] = np.nan
Vel[4:7,4:7,20:30] = np.nan

def gap_filling(vect, interpolation):

    time = np.arange(0, np.shape(vect)[0])
    mask = np.isfinite(vect)
    f = interpolate.interp1d(time[mask], vect[mask], 
                             kind=interpolation, bounds_error=False)

    vect_filled = np.copy(vect)
    vect_filled[np.isnan(vect)] = f(time[np.isnan(vect)])

    return vect_filled

Vel_filled_nn = np.apply_along_axis(gap_filling, -1, Vel, 'nearest')

Vel_filled_li = np.apply_along_axis(gap_filling, -1, Vel, 'linear')

我根据随时间变化的可用数据创建一个插值函数，然后将其映射到缺失值以及数据集的每个时间序列的缺失值。

但是因为我知道您正在为哪个应用程序开发此代码（地球科学中的数据分析），所以我建议您使用插值法而不是最近邻法（此处 Vel_filled_li）。其中一个时间序列的结果：

import matplotlib.pyplot as plt

plt.figure()
plt.plot(Vel_filled_nn[6, 6, :], 'o-', label='nearest neighbour')
plt.plot(Vel_filled_li[6, 6, :], 'o-', label='linear interpolation')
plt.plot(Vel[6, 6, :], 'o-', label='raw')
plt.legend(loc='upper right')
plt.xlabel('Time', fontsize=15)
plt.ylabel('Variable', fontsize=15)

它只是一个基础，can/should被矢量化，使用interpolate.interp1d的轴参数。

如何在数据立方体中搜索具有非 NaN 值的最近邻点？

How to search for a nearest-neighbor point with a value else than NaN in a datacube?

python

arrays

loops

time-series

nearest-neighbor