Python xarray groupby 没有创建正确的组
Python xarray groupby not creating correct groups
我正在尝试使用基于 datetime64 时间维度的 xarray groupy 创建每周组。出于某种原因,它正在创建额外的组并将一些日期放在错误的组中。我使用 S
坐标按周分组。每年应该有五个每周小组,但它正在创建七个小组。
正在创建的组:
In [38]: em.groupby('S.week').groups
Out[38]:
{1: [1, 6, 10, 15, 20, 25, 31, 35, 40, 45, 50, 56, 61, 65, 70, 75, 80, 86, 90],
2: [2, 7, 11, 16, 21, 26, 32, 36, 41, 46, 51, 57, 62, 66, 71, 76, 81, 87, 91],
3: [3, 8, 12, 17, 22, 27, 33, 37, 42, 47, 52, 58, 63, 67, 72, 77, 82, 88, 92],
4: [4, 9, 13, 18, 23, 28, 34, 38, 43, 48, 53, 59, 64, 68, 73, 78, 83, 89, 93],
5: [14, 19, 24, 29, 39, 44, 49, 54, 69, 74, 79, 84, 94],
52: [5, 60],
53: [0, 30, 55, 85]}
关于em
的信息:
In [39]: em
Out[39]:
<xarray.Dataset>
Dimensions: (S: 95, latitude: 181, lead: 32, longitude: 360)
Coordinates:
* latitude (latitude) float64 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* longitude (longitude) float64 0.0 1.0 2.0 3.0 ... 356.0 357.0 358.0 359.0
* lead (lead) timedelta64[ns] 0 days 12:00:00 ... 31 days 12:00:00
* S (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-30
Data variables:
eto (S, lead, latitude, longitude) float64 dask.array<shape=(95, 32, 181, 360), chunksize=(1, 32, 181, 360)>
S
的值:
In [35]: em.S
Out[35]:
<xarray.DataArray 'S' (S: 95)>
array(['1999-01-02T00:00:00.000000000', '1999-01-09T00:00:00.000000000',
'1999-01-16T00:00:00.000000000', '1999-01-23T00:00:00.000000000',
'1999-01-30T00:00:00.000000000', '2000-01-02T00:00:00.000000000',
'2000-01-09T00:00:00.000000000', '2000-01-16T00:00:00.000000000',
'2000-01-23T00:00:00.000000000', '2000-01-30T00:00:00.000000000',
'2001-01-02T00:00:00.000000000', '2001-01-09T00:00:00.000000000',
'2001-01-16T00:00:00.000000000', '2001-01-23T00:00:00.000000000',
'2001-01-30T00:00:00.000000000', '2002-01-02T00:00:00.000000000',
'2002-01-09T00:00:00.000000000', '2002-01-16T00:00:00.000000000',
'2002-01-23T00:00:00.000000000', '2002-01-30T00:00:00.000000000',
'2003-01-02T00:00:00.000000000', '2003-01-09T00:00:00.000000000',
'2003-01-16T00:00:00.000000000', '2003-01-23T00:00:00.000000000',
'2003-01-30T00:00:00.000000000', '2004-01-02T00:00:00.000000000',
'2004-01-09T00:00:00.000000000', '2004-01-16T00:00:00.000000000',
'2004-01-23T00:00:00.000000000', '2004-01-30T00:00:00.000000000',
'2005-01-02T00:00:00.000000000', '2005-01-09T00:00:00.000000000',
'2005-01-16T00:00:00.000000000', '2005-01-23T00:00:00.000000000',
'2005-01-30T00:00:00.000000000', '2006-01-02T00:00:00.000000000',
'2006-01-09T00:00:00.000000000', '2006-01-16T00:00:00.000000000',
'2006-01-23T00:00:00.000000000', '2006-01-30T00:00:00.000000000',
'2007-01-02T00:00:00.000000000', '2007-01-09T00:00:00.000000000',
'2007-01-16T00:00:00.000000000', '2007-01-23T00:00:00.000000000',
'2007-01-30T00:00:00.000000000', '2008-01-02T00:00:00.000000000',
'2008-01-09T00:00:00.000000000', '2008-01-16T00:00:00.000000000',
'2008-01-23T00:00:00.000000000', '2008-01-30T00:00:00.000000000',
'2009-01-02T00:00:00.000000000', '2009-01-09T00:00:00.000000000',
'2009-01-16T00:00:00.000000000', '2009-01-23T00:00:00.000000000',
'2009-01-30T00:00:00.000000000', '2010-01-02T00:00:00.000000000',
'2010-01-09T00:00:00.000000000', '2010-01-16T00:00:00.000000000',
'2010-01-23T00:00:00.000000000', '2010-01-30T00:00:00.000000000',
'2011-01-02T00:00:00.000000000', '2011-01-09T00:00:00.000000000',
'2011-01-16T00:00:00.000000000', '2011-01-23T00:00:00.000000000',
'2011-01-30T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-09T00:00:00.000000000', '2012-01-16T00:00:00.000000000',
'2012-01-23T00:00:00.000000000', '2012-01-30T00:00:00.000000000',
'2013-01-02T00:00:00.000000000', '2013-01-09T00:00:00.000000000',
'2013-01-16T00:00:00.000000000', '2013-01-23T00:00:00.000000000',
'2013-01-30T00:00:00.000000000', '2014-01-02T00:00:00.000000000',
'2014-01-09T00:00:00.000000000', '2014-01-16T00:00:00.000000000',
'2014-01-23T00:00:00.000000000', '2014-01-30T00:00:00.000000000',
'2015-01-02T00:00:00.000000000', '2015-01-09T00:00:00.000000000',
'2015-01-16T00:00:00.000000000', '2015-01-23T00:00:00.000000000',
'2015-01-30T00:00:00.000000000', '2016-01-02T00:00:00.000000000',
'2016-01-09T00:00:00.000000000', '2016-01-16T00:00:00.000000000',
'2016-01-23T00:00:00.000000000', '2016-01-30T00:00:00.000000000',
'2017-01-02T00:00:00.000000000', '2017-01-09T00:00:00.000000000',
'2017-01-16T00:00:00.000000000', '2017-01-23T00:00:00.000000000',
'2017-01-30T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
* S (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-23 2017-01-30
因此,例如 53
组实际上应该都在 1
组中,然后其他人在错误的组中。所有组 53
日期:
In [40]: em.S[0].values
Out[40]: numpy.datetime64('1999-01-02T00:00:00.000000000')
In [41]: em.S[5].values
Out[41]: numpy.datetime64('2000-01-02T00:00:00.000000000')
In [42]: em.S[10].values
Out[42]: numpy.datetime64('2001-01-02T00:00:00.000000000')
In [43]: em.S[55].values
Out[43]: numpy.datetime64('2010-01-02T00:00:00.000000000')
In [44]: em.S[85].values
Out[44]: numpy.datetime64('2016-01-02T00:00:00.000000000')
有什么建议吗?
创建的组实际上并没有错,正如已经有几个人指出的那样。我原以为每个每周组都有相同的月-日,但事实并非如此,因为组是基于 ISO weeks。因此,根据 ISO 周数,1 月 2 日实际上可以在第 1、52 或 53 周。
我正在尝试使用基于 datetime64 时间维度的 xarray groupy 创建每周组。出于某种原因,它正在创建额外的组并将一些日期放在错误的组中。我使用 S
坐标按周分组。每年应该有五个每周小组,但它正在创建七个小组。
正在创建的组:
In [38]: em.groupby('S.week').groups
Out[38]:
{1: [1, 6, 10, 15, 20, 25, 31, 35, 40, 45, 50, 56, 61, 65, 70, 75, 80, 86, 90],
2: [2, 7, 11, 16, 21, 26, 32, 36, 41, 46, 51, 57, 62, 66, 71, 76, 81, 87, 91],
3: [3, 8, 12, 17, 22, 27, 33, 37, 42, 47, 52, 58, 63, 67, 72, 77, 82, 88, 92],
4: [4, 9, 13, 18, 23, 28, 34, 38, 43, 48, 53, 59, 64, 68, 73, 78, 83, 89, 93],
5: [14, 19, 24, 29, 39, 44, 49, 54, 69, 74, 79, 84, 94],
52: [5, 60],
53: [0, 30, 55, 85]}
关于em
的信息:
In [39]: em
Out[39]:
<xarray.Dataset>
Dimensions: (S: 95, latitude: 181, lead: 32, longitude: 360)
Coordinates:
* latitude (latitude) float64 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
* longitude (longitude) float64 0.0 1.0 2.0 3.0 ... 356.0 357.0 358.0 359.0
* lead (lead) timedelta64[ns] 0 days 12:00:00 ... 31 days 12:00:00
* S (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-30
Data variables:
eto (S, lead, latitude, longitude) float64 dask.array<shape=(95, 32, 181, 360), chunksize=(1, 32, 181, 360)>
S
的值:
In [35]: em.S
Out[35]:
<xarray.DataArray 'S' (S: 95)>
array(['1999-01-02T00:00:00.000000000', '1999-01-09T00:00:00.000000000',
'1999-01-16T00:00:00.000000000', '1999-01-23T00:00:00.000000000',
'1999-01-30T00:00:00.000000000', '2000-01-02T00:00:00.000000000',
'2000-01-09T00:00:00.000000000', '2000-01-16T00:00:00.000000000',
'2000-01-23T00:00:00.000000000', '2000-01-30T00:00:00.000000000',
'2001-01-02T00:00:00.000000000', '2001-01-09T00:00:00.000000000',
'2001-01-16T00:00:00.000000000', '2001-01-23T00:00:00.000000000',
'2001-01-30T00:00:00.000000000', '2002-01-02T00:00:00.000000000',
'2002-01-09T00:00:00.000000000', '2002-01-16T00:00:00.000000000',
'2002-01-23T00:00:00.000000000', '2002-01-30T00:00:00.000000000',
'2003-01-02T00:00:00.000000000', '2003-01-09T00:00:00.000000000',
'2003-01-16T00:00:00.000000000', '2003-01-23T00:00:00.000000000',
'2003-01-30T00:00:00.000000000', '2004-01-02T00:00:00.000000000',
'2004-01-09T00:00:00.000000000', '2004-01-16T00:00:00.000000000',
'2004-01-23T00:00:00.000000000', '2004-01-30T00:00:00.000000000',
'2005-01-02T00:00:00.000000000', '2005-01-09T00:00:00.000000000',
'2005-01-16T00:00:00.000000000', '2005-01-23T00:00:00.000000000',
'2005-01-30T00:00:00.000000000', '2006-01-02T00:00:00.000000000',
'2006-01-09T00:00:00.000000000', '2006-01-16T00:00:00.000000000',
'2006-01-23T00:00:00.000000000', '2006-01-30T00:00:00.000000000',
'2007-01-02T00:00:00.000000000', '2007-01-09T00:00:00.000000000',
'2007-01-16T00:00:00.000000000', '2007-01-23T00:00:00.000000000',
'2007-01-30T00:00:00.000000000', '2008-01-02T00:00:00.000000000',
'2008-01-09T00:00:00.000000000', '2008-01-16T00:00:00.000000000',
'2008-01-23T00:00:00.000000000', '2008-01-30T00:00:00.000000000',
'2009-01-02T00:00:00.000000000', '2009-01-09T00:00:00.000000000',
'2009-01-16T00:00:00.000000000', '2009-01-23T00:00:00.000000000',
'2009-01-30T00:00:00.000000000', '2010-01-02T00:00:00.000000000',
'2010-01-09T00:00:00.000000000', '2010-01-16T00:00:00.000000000',
'2010-01-23T00:00:00.000000000', '2010-01-30T00:00:00.000000000',
'2011-01-02T00:00:00.000000000', '2011-01-09T00:00:00.000000000',
'2011-01-16T00:00:00.000000000', '2011-01-23T00:00:00.000000000',
'2011-01-30T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-09T00:00:00.000000000', '2012-01-16T00:00:00.000000000',
'2012-01-23T00:00:00.000000000', '2012-01-30T00:00:00.000000000',
'2013-01-02T00:00:00.000000000', '2013-01-09T00:00:00.000000000',
'2013-01-16T00:00:00.000000000', '2013-01-23T00:00:00.000000000',
'2013-01-30T00:00:00.000000000', '2014-01-02T00:00:00.000000000',
'2014-01-09T00:00:00.000000000', '2014-01-16T00:00:00.000000000',
'2014-01-23T00:00:00.000000000', '2014-01-30T00:00:00.000000000',
'2015-01-02T00:00:00.000000000', '2015-01-09T00:00:00.000000000',
'2015-01-16T00:00:00.000000000', '2015-01-23T00:00:00.000000000',
'2015-01-30T00:00:00.000000000', '2016-01-02T00:00:00.000000000',
'2016-01-09T00:00:00.000000000', '2016-01-16T00:00:00.000000000',
'2016-01-23T00:00:00.000000000', '2016-01-30T00:00:00.000000000',
'2017-01-02T00:00:00.000000000', '2017-01-09T00:00:00.000000000',
'2017-01-16T00:00:00.000000000', '2017-01-23T00:00:00.000000000',
'2017-01-30T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
* S (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-23 2017-01-30
因此,例如 53
组实际上应该都在 1
组中,然后其他人在错误的组中。所有组 53
日期:
In [40]: em.S[0].values
Out[40]: numpy.datetime64('1999-01-02T00:00:00.000000000')
In [41]: em.S[5].values
Out[41]: numpy.datetime64('2000-01-02T00:00:00.000000000')
In [42]: em.S[10].values
Out[42]: numpy.datetime64('2001-01-02T00:00:00.000000000')
In [43]: em.S[55].values
Out[43]: numpy.datetime64('2010-01-02T00:00:00.000000000')
In [44]: em.S[85].values
Out[44]: numpy.datetime64('2016-01-02T00:00:00.000000000')
有什么建议吗?
创建的组实际上并没有错,正如已经有几个人指出的那样。我原以为每个每周组都有相同的月-日,但事实并非如此,因为组是基于 ISO weeks。因此,根据 ISO 周数,1 月 2 日实际上可以在第 1、52 或 53 周。