按变量名从重新排列中删除变量

Question

我需要连接 2 rec.arrays（我对我工作中的所有其他人执行的程序相同）。我遇到的问题是我为数组阅读的文档之一，有 2 个额外的变量，我需要删除它们以匹配要连接的另一个数组的变量。我试过好几种，比如用索引删除，都报错。

这是数组

vswhr1
rec.array([('ny20110325s0a06c.001', 2011.23149798,  84.49677, 11.9223, 1.000e+00, 78.923, 11.923, 0.024, 0.024, 77.286, 189.465  ,  1.688, 180.     , 0.0019, 0., 0.00167, 60., 1003.84003, -15.7, 1003.84003, 65.8, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.96541e+21, 2.60898e+19, 8.45080e+21, 7.92632e+19, 8.74633e+21, 8.68890e+19),
           ('ny20110325s0a06c.002', 2011.23150704,  84.50007, 12.0017, 2.000e+00, 78.923, 11.923, 0.024, 0.024, 77.325, 190.686  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16. , 1003.83002, 68.7, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.93553e+21, 2.54199e+19, 8.43518e+21, 7.75936e+19, 8.72990e+21, 8.60191e+19),
           ('ny20110325s0a06c.003', 2011.23150736,  84.50019, 12.0045, 3.000e+00, 78.923, 11.923, 0.024, 0.024, 77.326, 190.728  ,  1.694, 180.     , 0.0019, 0., 0.00167, 60., 1003.83002, -16.1, 1003.83002, 68.9, -1., 0.    , -1., -1., 9.8765e+35, 9.8765e+35, 5.93643e+21, 2.59443e+19, 8.42675e+21, 8.17653e+19, 8.73537e+21, 8.68880e+19),
           ...,
           ('ny20180919s0i06c.0042', 2018.71887239, 262.38843,  9.3221, 1.234e+03, 78.923, 11.923, 0.024, 0.027, 78.69 , 152.737  , -1.722, 180.00999, 0.0019, 0., 0.00188, 60., 1011.84003,  -2.2, 1011.84003, 77.6, -1., 0.0125, -1., -1., 9.8765e+35, 9.8765e+35, 2.11077e+22, 8.61874e+19, 8.72151e+21, 5.33405e+19, 9.01945e+21, 7.07619e+19),
           ('ny20180920s0i06c.0491', 2018.72160282, 263.38504,  9.2407, 1.235e+03, 78.923, 11.923, 0.024, 0.034, 79.177, 151.62399, -1.735, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,   0. , 1006.65997, 62.8, -1., 0.0095, -1., -1., 9.8765e+35, 9.8765e+35, 1.96888e+22, 7.48627e+19, 8.70719e+21, 5.40175e+19, 8.97596e+21, 7.49834e+19),
           ('ny20180920s0i06c.0492', 2018.72161188, 263.38834,  9.3201, 1.236e+03, 78.923, 11.923, 0.024, 0.034, 79.072, 152.83299, -1.729, 180.00999, 0.0019, 0., 0.00188, 60., 1006.65997,  -0.6, 1006.65997, 64.6, -1., 0.0078, -1., -1., 9.8765e+35, 9.8765e+35, 1.94867e+22, 7.83111e+19, 8.71765e+21, 4.97304e+19, 8.97784e+21, 7.23055e+19)],
          dtype=[('spectrum', '<U21'), ('year', '<f8'), ('day', '<f8'), ('hour', '<f8'), ('run', '<f8'), ('lat', '<f8'), ('long', '<f8'), ('zobs', '<f8'), ('zmin', '<f8'), ('solzen', '<f8'), ('azim', '<f8'), ('osds', '<f8'), ('opd', '<f8'), ('fovi', '<f8'), ('amal', '<f8'), ('graw', '<f8'), ('tins', '<f8'), ('pins', '<f8'), ('tout', '<f8'), ('pout', '<f8'), ('hout', '<f8'), ('sia', '<f8'), ('fvsi', '<f8'), ('wspd', '<f8'), ('wdir', '<f8'), ('luft', '<f8'), ('luft_error', '<f8'), ('h2o', '<f8'), ('h2o_error', '<f8'), ('co2', '<f8'), ('co2_error', '<f8'), ('3co2', '<f8'), ('3co2_error', '<f8')])

vswhr1.shape 
(1236,)

*不相关的数字

我需要删除 las 2 个变量 ('3co2', '

谢谢

Answer 1

如果您从 csv 文件加载这些数组，那么使用 usecols 到 select 您加载的列可能是获得两个在 dtype 中匹配的数组的最简单方法。

但也可以 select 现有数组中的字段子集。

举例说明：

In [1]: dt1 = np.dtype('U10,i,f')
In [2]: dt2 = np.dtype('U10,i,f,i,i')
In [3]: x = np.ones(2,dtype=dt1)
In [4]: y = np.zeros(2,dtype=dt2)
In [5]: x
Out[5]: 
array([('1', 1, 1.), ('1', 1, 1.)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4')])
In [6]: y
Out[6]: 
array([('', 0, 0., 0, 0), ('', 0, 0., 0, 0)],
      dtype=[('f0', '<U10'), ('f1', '<i4'), ('f2', '<f4'), ('f3', '<i4'), ('f4', '<i4')])

y 字段的子集：

In [7]: y[['f0','f1','f2']]
Out[7]: 
array([('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

这个 view 有一些复杂性，新 dtype 中的 offsets 参数证明了这一点。 structured arrays 文档页面对此进行了讨论。有时需要使用 recfunctions.repack 函数来创建 copy。

但是在 concatenate 中使用时 view 似乎很好:

In [8]: np.concatenate((x,y[['f0','f1','f2']]))
Out[8]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

我们也可以从另一个数组的 dtype:

中获取索引列表

In [9]: x.dtype.names
Out[9]: ('f0', 'f1', 'f2')

这是一个元组，我们需要将其转换为列表：

In [13]: np.concatenate((x,y[list(x.dtype.names)]))
Out[13]: 
array([('1', 1, 1.), ('1', 1, 1.), ('', 0, 0.), ('', 0, 0.)],
      dtype={'names': ['f0', 'f1', 'f2'], 'formats': ['<U10', '<i4', '<f4'], 'offsets': [0, 40, 44], 'itemsize': 56})

（通常在 Python 中列表和元组可以互换，但在 numpy 索引中它们以不同的方式解释，因此区别很重要。）

按变量名从重新排列中删除变量

Remove variables from a recarray by variable name

python

numpy