Python xarray groupby 没有创建正确的组

Python xarray groupby not creating correct groups

我正在尝试使用基于 datetime64 时间维度的 xarray groupy 创建每周组。出于某种原因,它正在创建额外的组并将一些日期放在错误的组中。我使用 S 坐标按周分组。每年应该有五个每周小组,但它正在创建七个小组。

正在创建的组:

In [38]: em.groupby('S.week').groups                                                                                         
Out[38]: 
{1: [1, 6, 10, 15, 20, 25, 31, 35, 40, 45, 50, 56, 61, 65, 70, 75, 80, 86, 90],
 2: [2, 7, 11, 16, 21, 26, 32, 36, 41, 46, 51, 57, 62, 66, 71, 76, 81, 87, 91],
 3: [3, 8, 12, 17, 22, 27, 33, 37, 42, 47, 52, 58, 63, 67, 72, 77, 82, 88, 92],
 4: [4, 9, 13, 18, 23, 28, 34, 38, 43, 48, 53, 59, 64, 68, 73, 78, 83, 89, 93],
 5: [14, 19, 24, 29, 39, 44, 49, 54, 69, 74, 79, 84, 94],
 52: [5, 60],
 53: [0, 30, 55, 85]}

关于em的信息:

In [39]: em                                                                                                                  
Out[39]: 
<xarray.Dataset>
Dimensions:    (S: 95, latitude: 181, lead: 32, longitude: 360)
Coordinates:
  * latitude   (latitude) float64 -90.0 -89.0 -88.0 -87.0 ... 88.0 89.0 90.0
  * longitude  (longitude) float64 0.0 1.0 2.0 3.0 ... 356.0 357.0 358.0 359.0
  * lead       (lead) timedelta64[ns] 0 days 12:00:00 ... 31 days 12:00:00
  * S          (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-30
Data variables:
    eto        (S, lead, latitude, longitude) float64 dask.array<shape=(95, 32, 181, 360), chunksize=(1, 32, 181, 360)>

S 的值:

In [35]: em.S                                                                                                                
Out[35]: 
<xarray.DataArray 'S' (S: 95)>
array(['1999-01-02T00:00:00.000000000', '1999-01-09T00:00:00.000000000',
       '1999-01-16T00:00:00.000000000', '1999-01-23T00:00:00.000000000',
       '1999-01-30T00:00:00.000000000', '2000-01-02T00:00:00.000000000',
       '2000-01-09T00:00:00.000000000', '2000-01-16T00:00:00.000000000',
       '2000-01-23T00:00:00.000000000', '2000-01-30T00:00:00.000000000',
       '2001-01-02T00:00:00.000000000', '2001-01-09T00:00:00.000000000',
       '2001-01-16T00:00:00.000000000', '2001-01-23T00:00:00.000000000',
       '2001-01-30T00:00:00.000000000', '2002-01-02T00:00:00.000000000',
       '2002-01-09T00:00:00.000000000', '2002-01-16T00:00:00.000000000',
       '2002-01-23T00:00:00.000000000', '2002-01-30T00:00:00.000000000',
       '2003-01-02T00:00:00.000000000', '2003-01-09T00:00:00.000000000',
       '2003-01-16T00:00:00.000000000', '2003-01-23T00:00:00.000000000',
       '2003-01-30T00:00:00.000000000', '2004-01-02T00:00:00.000000000',
       '2004-01-09T00:00:00.000000000', '2004-01-16T00:00:00.000000000',
       '2004-01-23T00:00:00.000000000', '2004-01-30T00:00:00.000000000',
       '2005-01-02T00:00:00.000000000', '2005-01-09T00:00:00.000000000',
       '2005-01-16T00:00:00.000000000', '2005-01-23T00:00:00.000000000',
       '2005-01-30T00:00:00.000000000', '2006-01-02T00:00:00.000000000',
       '2006-01-09T00:00:00.000000000', '2006-01-16T00:00:00.000000000',
       '2006-01-23T00:00:00.000000000', '2006-01-30T00:00:00.000000000',
       '2007-01-02T00:00:00.000000000', '2007-01-09T00:00:00.000000000',
       '2007-01-16T00:00:00.000000000', '2007-01-23T00:00:00.000000000',
       '2007-01-30T00:00:00.000000000', '2008-01-02T00:00:00.000000000',
       '2008-01-09T00:00:00.000000000', '2008-01-16T00:00:00.000000000',
       '2008-01-23T00:00:00.000000000', '2008-01-30T00:00:00.000000000',
       '2009-01-02T00:00:00.000000000', '2009-01-09T00:00:00.000000000',
       '2009-01-16T00:00:00.000000000', '2009-01-23T00:00:00.000000000',
       '2009-01-30T00:00:00.000000000', '2010-01-02T00:00:00.000000000',
       '2010-01-09T00:00:00.000000000', '2010-01-16T00:00:00.000000000',
       '2010-01-23T00:00:00.000000000', '2010-01-30T00:00:00.000000000',
       '2011-01-02T00:00:00.000000000', '2011-01-09T00:00:00.000000000',
       '2011-01-16T00:00:00.000000000', '2011-01-23T00:00:00.000000000',
       '2011-01-30T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-09T00:00:00.000000000', '2012-01-16T00:00:00.000000000',
       '2012-01-23T00:00:00.000000000', '2012-01-30T00:00:00.000000000',
       '2013-01-02T00:00:00.000000000', '2013-01-09T00:00:00.000000000',
       '2013-01-16T00:00:00.000000000', '2013-01-23T00:00:00.000000000',
       '2013-01-30T00:00:00.000000000', '2014-01-02T00:00:00.000000000',
       '2014-01-09T00:00:00.000000000', '2014-01-16T00:00:00.000000000',
       '2014-01-23T00:00:00.000000000', '2014-01-30T00:00:00.000000000',
       '2015-01-02T00:00:00.000000000', '2015-01-09T00:00:00.000000000',
       '2015-01-16T00:00:00.000000000', '2015-01-23T00:00:00.000000000',
       '2015-01-30T00:00:00.000000000', '2016-01-02T00:00:00.000000000',
       '2016-01-09T00:00:00.000000000', '2016-01-16T00:00:00.000000000',
       '2016-01-23T00:00:00.000000000', '2016-01-30T00:00:00.000000000',
       '2017-01-02T00:00:00.000000000', '2017-01-09T00:00:00.000000000',
       '2017-01-16T00:00:00.000000000', '2017-01-23T00:00:00.000000000',
       '2017-01-30T00:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * S        (S) datetime64[ns] 1999-01-02 1999-01-09 ... 2017-01-23 2017-01-30

因此,例如 53 组实际上应该都在 1 组中,然后其他人在错误的组中。所有组 53 日期:

In [40]: em.S[0].values                                                                                                      
Out[40]: numpy.datetime64('1999-01-02T00:00:00.000000000')

In [41]: em.S[5].values                                                                                                      
Out[41]: numpy.datetime64('2000-01-02T00:00:00.000000000')

In [42]: em.S[10].values                                                                                                     
Out[42]: numpy.datetime64('2001-01-02T00:00:00.000000000')

In [43]: em.S[55].values                                                                                                     
Out[43]: numpy.datetime64('2010-01-02T00:00:00.000000000')

In [44]: em.S[85].values                                                                                                     
Out[44]: numpy.datetime64('2016-01-02T00:00:00.000000000')

有什么建议吗?

创建的组实际上并没有错,正如已经有几个人指出的那样。我原以为每个每周组都有相同的月-日,但事实并非如此,因为组是基于 ISO weeks。因此,根据 ISO 周数,1 月 2 日实际上可以在第 1、52 ​​或 53 周。