从数据数组中删除特定数据
Drop specific data from a data array
我正在处理具有时间、纬度和经度维度的数据数组。
数据数组如下所示:
print (data)
<xarray.DataArray (lon: 2, lat: 2, time: 48)>
array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
9.72198563, 0.23416978, ............],
.......
[0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
6.0229346 , 8.26839182, .........]])
Coordinates:
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:30:00
对于每个小时,在 00 分钟和 30 分钟有两条记录。所以时间维度看起来像:
<xarray.DataArray 'time' (time: 48)>
array(['2017-06-01T00:00:00.000000000', '2017-06-01T00:30:00.000000000',
'2017-06-01T01:00:00.000000000', '2017-06-01T01:30:00.000000000',
'2017-06-01T02:00:00.000000000', '2017-06-01T02:30:00.000000000',
'2017-06-01T03:00:00.000000000', '2017-06-01T03:30:00.000000000',
'2017-06-01T04:00:00.000000000', '2017-06-01T04:30:00.000000000',
'2017-06-01T05:00:00.000000000', '2017-06-01T05:30:00.000000000',
'2017-06-01T06:00:00.000000000', '2017-06-01T06:30:00.000000000',
'2017-06-01T07:00:00.000000000', '2017-06-01T07:30:00.000000000',
'2017-06-01T08:00:00.000000000', '2017-06-01T08:30:00.000000000',
'2017-06-01T09:00:00.000000000', '2017-06-01T09:30:00.000000000',
'2017-06-01T10:00:00.000000000', '2017-06-01T10:30:00.000000000',
'2017-06-01T11:00:00.000000000', '2017-06-01T11:30:00.000000000',
'2017-06-01T12:00:00.000000000', '2017-06-01T12:30:00.000000000',
'2017-06-01T13:00:00.000000000', '2017-06-01T13:30:00.000000000',
'2017-06-01T14:00:00.000000000', '2017-06-01T14:30:00.000000000',
'2017-06-01T15:00:00.000000000', '2017-06-01T15:30:00.000000000',
'2017-06-01T16:00:00.000000000', '2017-06-01T16:30:00.000000000',
'2017-06-01T17:00:00.000000000', '2017-06-01T17:30:00.000000000',
'2017-06-01T18:00:00.000000000', '2017-06-01T18:30:00.000000000',
'2017-06-01T19:00:00.000000000', '2017-06-01T19:30:00.000000000',
'2017-06-01T20:00:00.000000000', '2017-06-01T20:30:00.000000000',
'2017-06-01T21:00:00.000000000', '2017-06-01T21:30:00.000000000',
'2017-06-01T22:00:00.000000000', '2017-06-01T22:30:00.000000000',
'2017-06-01T23:00:00.000000000', '2017-06-01T23:30:00.000000000'],
dtype='datetime64[ns]')
我只想保留每小时 00 分钟记录的数据,并删除 30 分钟记录的数据。所以数据会像
print (data2)
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
9.72198563, 0.23416978, ............],
.......
[0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
6.0229346 , 8.26839182, .........]])
Coordinates:
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
因此,新数据数组 (data2) 的时间维度将类似于:
array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
'2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
'2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
'2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
'2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
'2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
'2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
'2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
'2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
'2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
'2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
'2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
dtype='datetime64[ns]')
有什么方法可以做到吗?
这是复现原始数据的代码
import numpy as np
from datetime import timedelta
import datetime
import xarray as xr
precipitation = 10 * np.random.rand(2, 2, 24)
lon = [-99.83, -99.32]
lat = [42.25, 42.21]
time = np.arange('2017-06-01', '2017-06-02',
timedelta(minutes=30),dtype='datetime64[ns]')
data =xr.DataArray(
data=precipitation,
dims=["lon","lat","time"],
coords=[lon,lat,time]
)
谢谢!
您可以使用 time
值的 datetime components 轻松做到这一点:
data2 = data.sel(time=data.time.dt.minute==0)
print(data2.time)
#<xarray.DataArray 'time' (time: 24)>
#array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
# '2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
# '2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
# '2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
# '2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
# '2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
# '2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
# '2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
# '2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
# '2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
# '2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
# '2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
# dtype='datetime64[ns]')
#Coordinates:
# * time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
#
您可以使用 resample
。重新采样 returns resample object
然后使用 pad
方法
data.resample(time='1H').pad()
O/P
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[0.93092321, 8.9256469 , 2.0902752 , 1.46022299, 9.63865453,
3.06746535, 2.84095699, 9.4583144 , 4.81973945, 1.85398961,
5.6259217 , 0.73004426, 8.48781372, 8.67918668, 7.19521316,
6.67589949, 2.07546901, 1.4322415 , 2.13495418, 4.37055217,
8.85306247, 4.43165936, 4.0294716 , 1.69092842],
[0.52261575, 5.21821873, 1.32905263, 8.92984526, 1.81558321,
3.89992125, 1.8788682 , 7.3124596 , 2.5068265 , 9.73076981,
0.4511222 , 9.09497158, 0.89253979, 9.53972274, 7.15277816,
0.08596348, 2.24376496, 2.06680292, 4.03876723, 5.55558076,
8.26049985, 3.91292107, 8.43491467, 5.48503772]],
[[8.34117163, 1.44051784, 2.78164548, 8.55049381, 9.43753831,
7.35745785, 1.22652596, 9.55220335, 0.99754358, 9.3994966 ,
7.92541645, 2.68894144, 9.61408994, 7.34960423, 2.74209431,
4.19041801, 8.92849725, 9.98010787, 9.16994776, 4.75409515,
3.10524118, 5.12308453, 8.61494954, 1.63399851],
[1.02355383, 5.64350097, 5.76928407, 2.76870009, 6.86109118,
9.1430836 , 1.81166855, 3.19906641, 2.28457262, 5.30030649,
2.86022039, 5.46551606, 0.62270996, 7.86203301, 3.38400052,
5.22623667, 5.49521413, 6.26552406, 0.93926924, 7.98750356,
6.72156675, 9.5673477 , 3.03319399, 9.71812105]]])
Coordinates:
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21
我正在处理具有时间、纬度和经度维度的数据数组。 数据数组如下所示:
print (data)
<xarray.DataArray (lon: 2, lat: 2, time: 48)>
array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
9.72198563, 0.23416978, ............],
.......
[0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
6.0229346 , 8.26839182, .........]])
Coordinates:
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:30:00
对于每个小时,在 00 分钟和 30 分钟有两条记录。所以时间维度看起来像:
<xarray.DataArray 'time' (time: 48)>
array(['2017-06-01T00:00:00.000000000', '2017-06-01T00:30:00.000000000',
'2017-06-01T01:00:00.000000000', '2017-06-01T01:30:00.000000000',
'2017-06-01T02:00:00.000000000', '2017-06-01T02:30:00.000000000',
'2017-06-01T03:00:00.000000000', '2017-06-01T03:30:00.000000000',
'2017-06-01T04:00:00.000000000', '2017-06-01T04:30:00.000000000',
'2017-06-01T05:00:00.000000000', '2017-06-01T05:30:00.000000000',
'2017-06-01T06:00:00.000000000', '2017-06-01T06:30:00.000000000',
'2017-06-01T07:00:00.000000000', '2017-06-01T07:30:00.000000000',
'2017-06-01T08:00:00.000000000', '2017-06-01T08:30:00.000000000',
'2017-06-01T09:00:00.000000000', '2017-06-01T09:30:00.000000000',
'2017-06-01T10:00:00.000000000', '2017-06-01T10:30:00.000000000',
'2017-06-01T11:00:00.000000000', '2017-06-01T11:30:00.000000000',
'2017-06-01T12:00:00.000000000', '2017-06-01T12:30:00.000000000',
'2017-06-01T13:00:00.000000000', '2017-06-01T13:30:00.000000000',
'2017-06-01T14:00:00.000000000', '2017-06-01T14:30:00.000000000',
'2017-06-01T15:00:00.000000000', '2017-06-01T15:30:00.000000000',
'2017-06-01T16:00:00.000000000', '2017-06-01T16:30:00.000000000',
'2017-06-01T17:00:00.000000000', '2017-06-01T17:30:00.000000000',
'2017-06-01T18:00:00.000000000', '2017-06-01T18:30:00.000000000',
'2017-06-01T19:00:00.000000000', '2017-06-01T19:30:00.000000000',
'2017-06-01T20:00:00.000000000', '2017-06-01T20:30:00.000000000',
'2017-06-01T21:00:00.000000000', '2017-06-01T21:30:00.000000000',
'2017-06-01T22:00:00.000000000', '2017-06-01T22:30:00.000000000',
'2017-06-01T23:00:00.000000000', '2017-06-01T23:30:00.000000000'],
dtype='datetime64[ns]')
我只想保留每小时 00 分钟记录的数据,并删除 30 分钟记录的数据。所以数据会像
print (data2)
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[9.38898492, 6.65535271, 3.92192596, 1.83168364, 9.91812091,
9.72198563, 0.23416978, ............],
.......
[0.38138545, 8.66420929, 4.62462928, 7.95165651, 2.06577888,
6.0229346 , 8.26839182, .........]])
Coordinates:
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
因此,新数据数组 (data2) 的时间维度将类似于:
array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
'2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
'2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
'2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
'2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
'2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
'2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
'2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
'2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
'2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
'2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
'2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
dtype='datetime64[ns]')
有什么方法可以做到吗?
这是复现原始数据的代码
import numpy as np
from datetime import timedelta
import datetime
import xarray as xr
precipitation = 10 * np.random.rand(2, 2, 24)
lon = [-99.83, -99.32]
lat = [42.25, 42.21]
time = np.arange('2017-06-01', '2017-06-02',
timedelta(minutes=30),dtype='datetime64[ns]')
data =xr.DataArray(
data=precipitation,
dims=["lon","lat","time"],
coords=[lon,lat,time]
)
谢谢!
您可以使用 time
值的 datetime components 轻松做到这一点:
data2 = data.sel(time=data.time.dt.minute==0)
print(data2.time)
#<xarray.DataArray 'time' (time: 24)>
#array(['2017-06-01T00:00:00.000000000', '2017-06-01T01:00:00.000000000',
# '2017-06-01T02:00:00.000000000', '2017-06-01T03:00:00.000000000',
# '2017-06-01T04:00:00.000000000', '2017-06-01T05:00:00.000000000',
# '2017-06-01T06:00:00.000000000', '2017-06-01T07:00:00.000000000',
# '2017-06-01T08:00:00.000000000', '2017-06-01T09:00:00.000000000',
# '2017-06-01T10:00:00.000000000', '2017-06-01T11:00:00.000000000',
# '2017-06-01T12:00:00.000000000', '2017-06-01T13:00:00.000000000',
# '2017-06-01T14:00:00.000000000', '2017-06-01T15:00:00.000000000',
# '2017-06-01T16:00:00.000000000', '2017-06-01T17:00:00.000000000',
# '2017-06-01T18:00:00.000000000', '2017-06-01T19:00:00.000000000',
# '2017-06-01T20:00:00.000000000', '2017-06-01T21:00:00.000000000',
# '2017-06-01T22:00:00.000000000', '2017-06-01T23:00:00.000000000'],
# dtype='datetime64[ns]')
#Coordinates:
# * time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
#
您可以使用 resample
。重新采样 returns resample object
然后使用 pad
方法
data.resample(time='1H').pad()
O/P
<xarray.DataArray (lon: 2, lat: 2, time: 24)>
array([[[0.93092321, 8.9256469 , 2.0902752 , 1.46022299, 9.63865453,
3.06746535, 2.84095699, 9.4583144 , 4.81973945, 1.85398961,
5.6259217 , 0.73004426, 8.48781372, 8.67918668, 7.19521316,
6.67589949, 2.07546901, 1.4322415 , 2.13495418, 4.37055217,
8.85306247, 4.43165936, 4.0294716 , 1.69092842],
[0.52261575, 5.21821873, 1.32905263, 8.92984526, 1.81558321,
3.89992125, 1.8788682 , 7.3124596 , 2.5068265 , 9.73076981,
0.4511222 , 9.09497158, 0.89253979, 9.53972274, 7.15277816,
0.08596348, 2.24376496, 2.06680292, 4.03876723, 5.55558076,
8.26049985, 3.91292107, 8.43491467, 5.48503772]],
[[8.34117163, 1.44051784, 2.78164548, 8.55049381, 9.43753831,
7.35745785, 1.22652596, 9.55220335, 0.99754358, 9.3994966 ,
7.92541645, 2.68894144, 9.61408994, 7.34960423, 2.74209431,
4.19041801, 8.92849725, 9.98010787, 9.16994776, 4.75409515,
3.10524118, 5.12308453, 8.61494954, 1.63399851],
[1.02355383, 5.64350097, 5.76928407, 2.76870009, 6.86109118,
9.1430836 , 1.81166855, 3.19906641, 2.28457262, 5.30030649,
2.86022039, 5.46551606, 0.62270996, 7.86203301, 3.38400052,
5.22623667, 5.49521413, 6.26552406, 0.93926924, 7.98750356,
6.72156675, 9.5673477 , 3.03319399, 9.71812105]]])
Coordinates:
* time (time) datetime64[ns] 2017-06-01 ... 2017-06-01T23:00:00
* lon (lon) float64 -99.83 -99.32
* lat (lat) float64 42.25 42.21