删除特定列包含某些值的行
Delete rows where specific columns contain some value
我有一个数组,每行包含 8 个值:
data = np.array([[ 1, 2, 3, 5, 6, 7, 15, 27],
[ 5, 6, 7, 5, 10, 12, 23, 52],
[ 9, 10, 0, 0, 0, 0, 27,44]])
我想删除 data[:,2:5]
等于零的每一行(所以 2 到 5 之间的所有列都等于零)
我发现使用下面的有效,但是有点啰嗦,我无法扩展到更多的列:
data_nonzero = np.delete(data, np.where(np.bitwise_and(np.bitwise_and((data[:,2]==0), (data[:,3]==0)), np.bitwise_and((data[:,4]==0), (data[:,5]==0)) ) )[0], 0)
我试过类似的东西:
new_a = np.delete(data, np.s_[:,2:5] == 0, axis=0)
但这似乎不起作用:
boolean array argument obj to delete must be one dimensional
最好,它会检查每行中 4 列的 2 个条件。类似于:
new_a = np.delete(data, np.where(np.s_[:,2:5] == 0 | np.s_[:,2:5] > 50000), axis=0)
在这种特殊情况下,我只会使用布尔索引来否定您的条件,即
>>> data[(data[:, 2:6] != 0).any(axis=1), ...]
array([[ 1, 2, 3, 5, 6, 7, 15, 27],
[ 5, 6, 7, 5, 10, 12, 23, 52]])
换句话说,您想要 select 包含任何非零值的行。
我想出了一个解决办法:
data.csv 文件包含:
var1, var2, var3, var4, var5, var6, var7
x,x,0,0,0,0,x
x,x,0,0,0,0,x
x,x,65535,65535,65535,65535,x
x,x,0,40,116,3,x
x,x,65535,95,208,2,x
x,x,3,147,277,2,x
x,x,2,203,325,2,x
代码:
data = genfromtxt(filename[0], delimiter=',',skip_header=1)
print('------ Original data ------')
print(data[0:7,:])
new_a = np.delete(data, ~np.any(data[:,2:5], axis=1),axis=0)
print('------ Rows where data[:,2:5] == 0 removed ------')
print(new_a[0:5,:])
new_b = np.delete(new_a, np.all(new_a[:,2:5] > 60000,axis=1),axis=0)
print('------ Rows where data[:,2:5] > 60000 removed ------')
print(new_b[0:4,:])
结果:
------ Original data ------
[[x x 0.00 0.00 0.00 0.00 x]
[x x 0.00 0.00 0.00 0.00 x]
[x x 65535.00 65535.00 65535.00 65535.00 x]
[x x 0.00 40.00 116.00 3.00 x]
[x x 65535.00 95.00 208.00 2.00 x]
[x x 3.00 147.00 277.00 2.00 x]
[x x 2.00 203.00 325.00 2.00 x]]
------ Rows where data[:,2:5] == 0 removed ------
[[x x 65535.00 65535.00 65535.00 65535.00 x]
[x x 0.00 40.00 116.00 3.00 x]
[x x 65535.00 95.00 208.00 2.00 x]
[x x 3.00 147.00 277.00 2.00 x]
[x x 2.00 203.00 325.00 2.00 x]]
------ Rows where data[:,2:5] > 60000 removed ------
[[66.01 -46.05 0.00 40.00 116.00 3.00 x]
[66.01 -39.46 65535.00 95.00 208.00 2.00 x]
[66.01 -32.87 3.00 147.00 277.00 2.00 x]
[66.01 -26.28 2.00 203.00 325.00 2.00 x]]
我有一个数组,每行包含 8 个值:
data = np.array([[ 1, 2, 3, 5, 6, 7, 15, 27],
[ 5, 6, 7, 5, 10, 12, 23, 52],
[ 9, 10, 0, 0, 0, 0, 27,44]])
我想删除 data[:,2:5]
等于零的每一行(所以 2 到 5 之间的所有列都等于零)
我发现使用下面的有效,但是有点啰嗦,我无法扩展到更多的列:
data_nonzero = np.delete(data, np.where(np.bitwise_and(np.bitwise_and((data[:,2]==0), (data[:,3]==0)), np.bitwise_and((data[:,4]==0), (data[:,5]==0)) ) )[0], 0)
我试过类似的东西:
new_a = np.delete(data, np.s_[:,2:5] == 0, axis=0)
但这似乎不起作用:
boolean array argument obj to delete must be one dimensional
最好,它会检查每行中 4 列的 2 个条件。类似于:
new_a = np.delete(data, np.where(np.s_[:,2:5] == 0 | np.s_[:,2:5] > 50000), axis=0)
在这种特殊情况下,我只会使用布尔索引来否定您的条件,即
>>> data[(data[:, 2:6] != 0).any(axis=1), ...]
array([[ 1, 2, 3, 5, 6, 7, 15, 27],
[ 5, 6, 7, 5, 10, 12, 23, 52]])
换句话说,您想要 select 包含任何非零值的行。
我想出了一个解决办法:
data.csv 文件包含:
var1, var2, var3, var4, var5, var6, var7
x,x,0,0,0,0,x
x,x,0,0,0,0,x
x,x,65535,65535,65535,65535,x
x,x,0,40,116,3,x
x,x,65535,95,208,2,x
x,x,3,147,277,2,x
x,x,2,203,325,2,x
代码:
data = genfromtxt(filename[0], delimiter=',',skip_header=1)
print('------ Original data ------')
print(data[0:7,:])
new_a = np.delete(data, ~np.any(data[:,2:5], axis=1),axis=0)
print('------ Rows where data[:,2:5] == 0 removed ------')
print(new_a[0:5,:])
new_b = np.delete(new_a, np.all(new_a[:,2:5] > 60000,axis=1),axis=0)
print('------ Rows where data[:,2:5] > 60000 removed ------')
print(new_b[0:4,:])
结果:
------ Original data ------
[[x x 0.00 0.00 0.00 0.00 x]
[x x 0.00 0.00 0.00 0.00 x]
[x x 65535.00 65535.00 65535.00 65535.00 x]
[x x 0.00 40.00 116.00 3.00 x]
[x x 65535.00 95.00 208.00 2.00 x]
[x x 3.00 147.00 277.00 2.00 x]
[x x 2.00 203.00 325.00 2.00 x]]
------ Rows where data[:,2:5] == 0 removed ------
[[x x 65535.00 65535.00 65535.00 65535.00 x]
[x x 0.00 40.00 116.00 3.00 x]
[x x 65535.00 95.00 208.00 2.00 x]
[x x 3.00 147.00 277.00 2.00 x]
[x x 2.00 203.00 325.00 2.00 x]]
------ Rows where data[:,2:5] > 60000 removed ------
[[66.01 -46.05 0.00 40.00 116.00 3.00 x]
[66.01 -39.46 65535.00 95.00 208.00 2.00 x]
[66.01 -32.87 3.00 147.00 277.00 2.00 x]
[66.01 -26.28 2.00 203.00 325.00 2.00 x]]