我有每周卫星数据，但我想将其转换为每月数据。我该怎么做？

Question

我有每周的卫星数据，我想将其转换为每月数据，其中包含 1993 年（1 月）至 2019 年（12 月）的月份、经度和纬度。

我最初做了一个 for 循环，只取每 4 周的平均值来获得每月的平均值：

sss_md_monthly = []

weeks = sss_md.time.size//4 
for i in range(weeks):   
    sss_md_monthly.append(np.mean(sss_md[i::4],axis=0))
    
sss_md_monthly = np.array(sss_md_monthly)

但是，我注意到有些闰年和特定月份每个月有 5 周而不是 4 周，所以我获取每月平均值的 for 循环不正确，因为我每 4 周取一次平均值（一个月），但有些月份可能有 5 周而不是 4 周。

time = np.array(sss_md.time) #making time array

for i in range(int(len(time)/4)):
    print(time[i*4:(i+1)*4]) # printing the time step for every 4 weeks 

['1993-01-06T12:00:00.000000000' '1993-01-13T12:00:00.000000000'
 '1993-01-20T12:00:00.000000000' '1993-01-27T12:00:00.000000000'] #all of january 1993
['1993-02-03T12:00:00.000000000' '1993-02-10T12:00:00.000000000'
 '1993-02-17T12:00:00.000000000' '1993-02-24T12:00:00.000000000'] # all of february 1993
['1993-03-03T12:00:00.000000000' '1993-03-10T12:00:00.000000000'
 '1993-03-17T12:00:00.000000000' '1993-03-24T12:00:00.000000000'] # MARCH 1993 has 5 weeks instead of 4
['1993-03-31T12:00:00.000000000' '1993-04-07T12:00:00.000000000'
 '1993-04-14T12:00:00.000000000' '1993-04-21T12:00:00.000000000']
['1993-04-28T12:00:00.000000000' '1993-05-05T12:00:00.000000000'
 '1993-05-12T12:00:00.000000000' '1993-05-19T12:00:00.000000000']
['1993-05-26T12:00:00.000000000' '1993-06-02T12:00:00.000000000'
 '1993-06-09T12:00:00.000000000' '1993-06-16T12:00:00.000000000']
['1993-06-23T12:00:00.000000000' '1993-06-30T12:00:00.000000000'
 '1993-07-07T12:00:00.000000000' '1993-07-14T12:00:00.000000000']
....

当有闰年或某些月份的周数比其他月份多时，我如何将每周数据转换为正确的每月时间序列？

有人好心建议：

import datetime
from datetime import datetime as dt
import numpy as np
time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time] # time = np.array(sss_md.time)

month, year = time[0].month, time[0].year
group_month = {}
for i in time:
    if (i.month, i.year) in group_month:
       group_month[(i.month, i.year)].append(i)
    else:
       group_month[(i.month, i.year)] = i
print(group_month)

但是我得到一个错误：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-178-cb22eada7b48> in <module>
      2 from datetime import datetime as dt
      3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
      5 
      6 month, year = time[0].month, time[0].year

<ipython-input-178-cb22eada7b48> in <listcomp>(.0)
      2 from datetime import datetime as dt
      3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
      5 
      6 month, year = time[0].month, time[0].year

IndexError: invalid index to scalar variable.

这个错误是我的时间变量的结构造成的吗？

Answer 1

我们必须将时间转换为 datetime 对象，然后进行比较和分组。

from datetime import datetime as dt
import numpy as np
time = [datetime.strptime(str(n)[:10],"%Y-%m-%d") for n in np.array(sss_md.time)]

month, year = time[0].month, time[0].year
group_month = {}
for i in time:
    if (i.month, i.year) in group_month:
       group_month[(i.month, i.year)].append(i)
    else:
       group_month[(i.month, i.year)] = [i]
print(group_month)

您可以使用 datetime.strfttime 将值转换回旧格式。

请注意，我在列表推导中使用 n[:10] 以使我们的格式更容易，因为您的采样时间有很多重复值。

Answer 2

如果您有一组如下所示的字符串：

dates = (
    "1993-01-06T12:00:00.000000000",
    "1993-01-13T12:00:00.000000000",
    "1993-01-20T12:00:00.000000000",
    "1993-01-27T12:00:00.000000000",
    "1993-02-03T12:00:00.000000000",
    "1993-02-10T12:00:00.000000000",
    "1993-02-17T12:00:00.000000000",
    "1993-02-24T12:00:00.000000000",
    "1993-03-03T12:00:00.000000000",
    "1993-03-10T12:00:00.000000000",
    "1993-03-17T12:00:00.000000000",
    "1993-03-24T12:00:00.000000000",
    "1993-03-31T12:00:00.000000000",
    "1993-04-07T12:00:00.000000000",
    "1993-04-14T12:00:00.000000000",
    "1993-04-21T12:00:00.000000000",
    "1993-04-28T12:00:00.000000000",
    "1993-05-05T12:00:00.000000000",
    "1993-05-12T12:00:00.000000000",
    "1993-05-19T12:00:00.000000000",
    "1993-05-26T12:00:00.000000000",
    "1993-06-02T12:00:00.000000000",
    "1993-06-09T12:00:00.000000000",
    "1993-06-16T12:00:00.000000000",
    "1993-06-23T12:00:00.000000000",
    "1993-06-30T12:00:00.000000000",
    "1993-07-07T12:00:00.000000000",
    "1993-07-14T12:00:00.000000000"
)

然后您可以使用 itertools.groupby 和自定义键按年和月对字符串进行分组。这假设字符串已经按照年份和月份排序。

from itertools import groupby

def key(string):
    return string.split("-")[:2]

month_groups = [list(group) for _, group in groupby(dates, key=key)]
print(month_groups)

您可以使 key 分组功能更可爱，而不是拆分 "-"，而是解析每个字符串并将其转换为 datetime.datetime 对象。然后 return datetime 对象的年月属性。

我有每周卫星数据，但我想将其转换为每月数据。我该怎么做？

I have weekly satellite data, but I want to convert it to monthly data. How would I do that?

python

arrays

numpy

jupyter