在忽略 NaN 的同时取 np.average?

Taking np.average while ignoring NaN's?

我有一个形状为 (64,17) 的矩阵对应于时间和纬度。我想取一个加权纬度平均值,我知道 np.average 可以这样做,因为与我用来平均经度的 np.nanmean 不同,权重可以在参数中使用。但是,np.average 不会像 np.nanmean 那样忽略 NaN,所以我每行的前 5 个条目都包含在纬度平均中,并使整个时间序列充满 NaN。

有没有一种方法可以在不将 NaN 包含在计算中的情况下进行加权平均?

file = Dataset("sst_aso_1951-2014latlon_seasavgs.nc")
sst = file.variables['sst']
lat = file.variables['lat']

sst_filt = np.asarray(sst)
missing_values_indices = sst_filt < -8000000   #missing values have value -infinity
sst_filt[missing_values_indices] = np.nan      #all missing values set to NaN

weights = np.cos(np.deg2rad(lat))
sst_zonalavg = np.nanmean(sst_filt, axis=2)
print sst_zonalavg[0,:]
sst_ts = np.average(sst_zonalavg, axis=1, weights=weights)
print sst_ts[:]

输出:

[ nan nan nan nan nan
 27.08499908 27.33333397 28.1457119 28.32899857 28.34454346
 28.27285767 28.18571472 28.10199928 28.10812378 28.03411865
 28.06411552 28.16529465]

[ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan]

您可以像这样创建一个掩码数组:

data = np.array([[1,2,3], [4,5,np.NaN], [np.NaN,6,np.NaN], [0,0,0]])
masked_data = np.ma.masked_array(data, np.isnan(data))
# calculate your weighted average here instead
weights = [1, 1, 1]
average = np.ma.average(masked_data, axis=1, weights=weights)
# this gives you the result
result = average.filled(np.nan)
print(result)

这输出:

[ 2.   4.5  6.   0. ]

您可以简单地将输入数组与 weights 相乘,然后沿指定的轴求和,忽略 NaNsnp.nansum。因此,对于您的情况,假设要在输入数组 sst_filt 上沿 axis = 1 使用 weights,总和将为 -

np.nansum(sst_filt*weights,axis=1)

在取平均时考虑到 NaN,我们将得到:

def nanaverage(A,weights,axis):
    return np.nansum(A*weights,axis=axis)/((~np.isnan(A))*weights).sum(axis=axis)

样本运行-

In [200]: sst_filt  # 2D array case
Out[200]: 
array([[  0.,   1.],
       [ nan,   3.],
       [  4.,   5.]])

In [201]: weights
Out[201]: array([ 0.25,  0.75])

In [202]: nanaverage(sst_filt,weights=weights,axis=1)
Out[202]: array([0.75, 3.  , 4.75])

我可能只是 select 数组中不是 NaN 的部分,然后将这些索引也用于 select 权重。

例如:

import numpy as np
data = np.random.rand(10)
weights = np.random.rand(10)
data[[2, 4, 8]] = np.nan

print data
# [ 0.32849204,  0.90310062,         nan,  0.58580299,         nan,
#    0.934721  ,  0.44412978,  0.78804409,         nan,  0.24942098]

ii = ~np.isnan(data)
print ii
# [ True  True False  True False  True  True  True False  True]

result = np.average(data[ii], weights = weights[ii])
print result
# .6470319

编辑:我意识到这不适用于二维数组。在那种情况下,我可能只是将 NaN 的值和权重设置为零。这会产生相同的结果,就好像这些指数不包括在计算中一样。

之前 运行 np.average:

data[np.isnan(data)] = 0;
weights[np.isnan(data)] = 0;
result = np.average(data, weights=weights)

如果您想跟踪哪些索引是 NaN,也可以创建副本。

@deto

第一行删除了所有 nan,这将导致第二行的结果不正确。

data[np.isnan(data)] = 0;
weights[np.isnan(data)] = 0;
result = np.average(data, weights=weights)

第一行运行之前应该抄一份

data_copy = copy.deepcopy(data)
data[np.isnan(data_copy)] = 0;
weights[np.isnan(data_copy)] = 0;
result = np.average(data, weights=weights)