通过索引将 numpy 数组中的值设置为 NaN
Set values in numpy array to NaN by index
我想将 numpy 数组中的特定值设置为 NaN
(将它们排除在逐行均值计算之外)。
我试过了
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
查看 x
,我只看到 -9223372036854775808
我期望 NaN
。
我想到了一个替代方案:
for i in range(len(x)):
for k in range(cutoff[i]):
x[i][k] = numpy.nan
没有任何反应。我做错了什么?
nan
是一个浮点值。当 x
是整数数据类型的数组时,不能为其分配 nan 值。当 nan
被分配给整数 dtype 的数组时,该值会自动转换为 int:
In [85]: np.array(np.nan).astype(int).item()
Out[85]: -9223372036854775808
因此,要修复您的代码,请将 x
设为 float dtype 数组:
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
print(x)
产量
array([[ nan, nan, nan, nan, nan, 5., 6., 7., 8., 9.],
[ nan, nan, nan, nan, nan, nan, nan, 0., 1., 0.]])
将适当元素设置为 NaN 的矢量化方法
must get rid of the value error you were getting. If you are looking to vectorize
for performance, you can use boolean indexing
像这样-
import numpy as np
# Create mask of positions in x (with float datatype) where NaNs are to be put
mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])
# Put NaNs into masked region of x for the desired ouput
x[mask] = np.nan
样本运行-
In [92]: x = np.random.randint(0,9,(4,7)).astype(float)
In [93]: x
Out[93]:
array([[ 2., 1., 5., 2., 5., 2., 1.],
[ 2., 5., 7., 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ 5., 8., 7., 5., 0., 2., 1.]])
In [94]: cutoff = [5,3,0,6]
In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan
In [96]: x
Out[96]:
array([[ nan, nan, nan, nan, nan, 2., 1.],
[ nan, nan, nan, 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ nan, nan, nan, nan, nan, nan, 1.]])
直接计算适当元素的行平均的矢量化方法
如果您试图获得掩蔽的平均值,您可以修改之前提出的矢量化方法以避免完全处理 NaNs
,更重要的是保持 x
为整数值。这是修改后的方法-
# Get array version of cutoff
cutoff_arr = np.asarray(cutoff)
# Mask of positions in x which are to be considered for row-wise mean calculations
mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
# Mask x, calculate the corresponding sum and thus mean values for each row
masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
这是此类解决方案的示例 运行 -
In [61]: x = np.random.randint(0,9,(4,7))
In [62]: x
Out[62]:
array([[5, 0, 1, 2, 4, 2, 0],
[3, 2, 0, 7, 5, 0, 2],
[7, 2, 2, 3, 3, 2, 3],
[4, 1, 2, 1, 4, 6, 8]])
In [63]: cutoff = [5,3,0,6]
In [64]: cutoff_arr = np.asarray(cutoff)
In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
In [66]: mask1
Out[66]:
array([[False, False, False, False, False, True, True],
[False, False, False, True, True, True, True],
[ True, True, True, True, True, True, True],
[False, False, False, False, False, False, True]], dtype=bool)
In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
In [68]: masked_mean_vals
Out[68]: array([ 1. , 3.5 , 3.14285714, 8. ])
我想将 numpy 数组中的特定值设置为 NaN
(将它们排除在逐行均值计算之外)。
我试过了
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
查看 x
,我只看到 -9223372036854775808
我期望 NaN
。
我想到了一个替代方案:
for i in range(len(x)):
for k in range(cutoff[i]):
x[i][k] = numpy.nan
没有任何反应。我做错了什么?
nan
是一个浮点值。当 x
是整数数据类型的数组时,不能为其分配 nan 值。当 nan
被分配给整数 dtype 的数组时,该值会自动转换为 int:
In [85]: np.array(np.nan).astype(int).item()
Out[85]: -9223372036854775808
因此,要修复您的代码,请将 x
设为 float dtype 数组:
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
import numpy
x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]],
dtype=float)
cutoff = [5, 7]
for i in range(len(x)):
x[i][0:cutoff[i]:1] = numpy.nan
print(x)
产量
array([[ nan, nan, nan, nan, nan, 5., 6., 7., 8., 9.],
[ nan, nan, nan, nan, nan, nan, nan, 0., 1., 0.]])
将适当元素设置为 NaN 的矢量化方法
vectorize
for performance, you can use boolean indexing
像这样-
import numpy as np
# Create mask of positions in x (with float datatype) where NaNs are to be put
mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])
# Put NaNs into masked region of x for the desired ouput
x[mask] = np.nan
样本运行-
In [92]: x = np.random.randint(0,9,(4,7)).astype(float)
In [93]: x
Out[93]:
array([[ 2., 1., 5., 2., 5., 2., 1.],
[ 2., 5., 7., 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ 5., 8., 7., 5., 0., 2., 1.]])
In [94]: cutoff = [5,3,0,6]
In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan
In [96]: x
Out[96]:
array([[ nan, nan, nan, nan, nan, 2., 1.],
[ nan, nan, nan, 1., 5., 4., 8.],
[ 1., 1., 7., 4., 8., 3., 1.],
[ nan, nan, nan, nan, nan, nan, 1.]])
直接计算适当元素的行平均的矢量化方法
如果您试图获得掩蔽的平均值,您可以修改之前提出的矢量化方法以避免完全处理 NaNs
,更重要的是保持 x
为整数值。这是修改后的方法-
# Get array version of cutoff
cutoff_arr = np.asarray(cutoff)
# Mask of positions in x which are to be considered for row-wise mean calculations
mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
# Mask x, calculate the corresponding sum and thus mean values for each row
masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
这是此类解决方案的示例 运行 -
In [61]: x = np.random.randint(0,9,(4,7))
In [62]: x
Out[62]:
array([[5, 0, 1, 2, 4, 2, 0],
[3, 2, 0, 7, 5, 0, 2],
[7, 2, 2, 3, 3, 2, 3],
[4, 1, 2, 1, 4, 6, 8]])
In [63]: cutoff = [5,3,0,6]
In [64]: cutoff_arr = np.asarray(cutoff)
In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
In [66]: mask1
Out[66]:
array([[False, False, False, False, False, True, True],
[False, False, False, True, True, True, True],
[ True, True, True, True, True, True, True],
[False, False, False, False, False, False, True]], dtype=bool)
In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] - cutoff_arr)
In [68]: masked_mean_vals
Out[68]: array([ 1. , 3.5 , 3.14285714, 8. ])