从数据数组中删除特定数据

Drop specific data from a data array

我正在处理具有时间、纬度和经度维度的数据数组。 数据数组如下所示:

print (data)
<xarray.DataArray (lon: 2, lat: 2, time: 48)>
 array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
     9.72198563, 0.23416978, ............],
  .......

    [0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
     6.0229346 , 8.26839182, .........]])

 Coordinates:
    * lon      (lon) float64 -99.83 -99.32
    * lat      (lat) float64 42.25 42.21
    * time     (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:30:00

对于每个小时,在 00 分钟和 30 分钟有两条记录。所以时间维度看起来像:

<xarray.DataArray 'time' (time: 48)>
 array(['2017-06-01T00:00:00.000000000', '2017-06-01T00:30:00.000000000',
   '2017-06-01T01:00:00.000000000', '2017-06-01T01:30:00.000000000',
   '2017-06-01T02:00:00.000000000', '2017-06-01T02:30:00.000000000',
   '2017-06-01T03:00:00.000000000', '2017-06-01T03:30:00.000000000',
   '2017-06-01T04:00:00.000000000', '2017-06-01T04:30:00.000000000',
   '2017-06-01T05:00:00.000000000', '2017-06-01T05:30:00.000000000',
   '2017-06-01T06:00:00.000000000', '2017-06-01T06:30:00.000000000',
   '2017-06-01T07:00:00.000000000', '2017-06-01T07:30:00.000000000',
   '2017-06-01T08:00:00.000000000', '2017-06-01T08:30:00.000000000',
   '2017-06-01T09:00:00.000000000', '2017-06-01T09:30:00.000000000',
   '2017-06-01T10:00:00.000000000', '2017-06-01T10:30:00.000000000',
   '2017-06-01T11:00:00.000000000', '2017-06-01T11:30:00.000000000',
   '2017-06-01T12:00:00.000000000', '2017-06-01T12:30:00.000000000',
   '2017-06-01T13:00:00.000000000', '2017-06-01T13:30:00.000000000',
   '2017-06-01T14:00:00.000000000', '2017-06-01T14:30:00.000000000',
   '2017-06-01T15:00:00.000000000', '2017-06-01T15:30:00.000000000',
   '2017-06-01T16:00:00.000000000', '2017-06-01T16:30:00.000000000',
   '2017-06-01T17:00:00.000000000', '2017-06-01T17:30:00.000000000',
   '2017-06-01T18:00:00.000000000', '2017-06-01T18:30:00.000000000',
   '2017-06-01T19:00:00.000000000', '2017-06-01T19:30:00.000000000',
   '2017-06-01T20:00:00.000000000', '2017-06-01T20:30:00.000000000',
   '2017-06-01T21:00:00.000000000', '2017-06-01T21:30:00.000000000',
   '2017-06-01T22:00:00.000000000', '2017-06-01T22:30:00.000000000',
   '2017-06-01T23:00:00.000000000', '2017-06-01T23:30:00.000000000'],
  dtype='datetime64[ns]')

我只想保留每小时 00 分钟记录的数据,并删除 30 分钟记录的数据。所以数据会像

print (data2)
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
    9.72198563, 0.23416978, ............],
     .......

    [0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
      6.0229346 , 8.26839182, .........]])

 Coordinates:
       * lon      (lon) float64 -99.83 -99.32
       * lat      (lat) float64 42.25 42.21
       * time     (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00

因此,新数据数组 (data2) 的时间维度将类似于:

array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
   '2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
   '2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
   '2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
   '2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
   '2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
   '2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
   '2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
   '2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
   '2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
   '2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
   '2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
  dtype='datetime64[ns]')

有什么方法可以做到吗?

这是复现原始数据的代码

import numpy as np
from datetime import timedelta
import datetime
import xarray as xr

precipitation = 10 * np.random.rand(2, 2, 24)
lon = [-99.83, -99.32]
lat = [42.25, 42.21]
time = np.arange('2017-06-01', '2017-06-02', 
                  timedelta(minutes=30),dtype='datetime64[ns]')

data =xr.DataArray(
    data=precipitation,
    dims=["lon","lat","time"],
    coords=[lon,lat,time]          
            )

谢谢!

您可以使用 time 值的 datetime components 轻松做到这一点:

data2 = data.sel(time=data.time.dt.minute==0)

print(data2.time)

#<xarray.DataArray 'time' (time: 24)>
#array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
#       '2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
#       '2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
#       '2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
#       '2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
#       '2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
#       '2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
#       '2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
#       '2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
#       '2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
#       '2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
#       '2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
#      dtype='datetime64[ns]')
#Coordinates:
#  * time     (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
#

您可以使用 resample。重新采样 returns resample object 然后使用 pad 方法

data.resample(time='1H').pad()

O/P
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[0.93092321, 8.9256469 , 2.0902752 , 1.46022299, 9.63865453,
         3.06746535, 2.84095699, 9.4583144 , 4.81973945, 1.85398961,
         5.6259217 , 0.73004426, 8.48781372, 8.67918668, 7.19521316,
         6.67589949, 2.07546901, 1.4322415 , 2.13495418, 4.37055217,
         8.85306247, 4.43165936, 4.0294716 , 1.69092842],
        [0.52261575, 5.21821873, 1.32905263, 8.92984526, 1.81558321,
         3.89992125, 1.8788682 , 7.3124596 , 2.5068265 , 9.73076981,
         0.4511222 , 9.09497158, 0.89253979, 9.53972274, 7.15277816,
         0.08596348, 2.24376496, 2.06680292, 4.03876723, 5.55558076,
         8.26049985, 3.91292107, 8.43491467, 5.48503772]],

       [[8.34117163, 1.44051784, 2.78164548, 8.55049381, 9.43753831,
         7.35745785, 1.22652596, 9.55220335, 0.99754358, 9.3994966 ,
         7.92541645, 2.68894144, 9.61408994, 7.34960423, 2.74209431,
         4.19041801, 8.92849725, 9.98010787, 9.16994776, 4.75409515,
         3.10524118, 5.12308453, 8.61494954, 1.63399851],
        [1.02355383, 5.64350097, 5.76928407, 2.76870009, 6.86109118,
         9.1430836 , 1.81166855, 3.19906641, 2.28457262, 5.30030649,
         2.86022039, 5.46551606, 0.62270996, 7.86203301, 3.38400052,
         5.22623667, 5.49521413, 6.26552406, 0.93926924, 7.98750356,
         6.72156675, 9.5673477 , 3.03319399, 9.71812105]]])
Coordinates:
  * time     (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
  * lon      (lon) float64 -99.83 -99.32
  * lat      (lat) float64 42.25 42.21