我有每周卫星数据,但我想将其转换为每月数据。我该怎么做?
I have weekly satellite data, but I want to convert it to monthly data. How would I do that?
我有每周的卫星数据,我想将其转换为每月数据,其中包含 1993 年(1 月)至 2019 年(12 月)的月份、经度和纬度。
我最初做了一个 for 循环,只取每 4 周的平均值来获得每月的平均值:
sss_md_monthly = []
weeks = sss_md.time.size//4
for i in range(weeks):
sss_md_monthly.append(np.mean(sss_md[i::4],axis=0))
sss_md_monthly = np.array(sss_md_monthly)
但是,我注意到有些闰年和特定月份每个月有 5 周而不是 4 周,所以我获取每月平均值的 for 循环不正确,因为我每 4 周取一次平均值(一个月),但有些月份可能有 5 周而不是 4 周。
time = np.array(sss_md.time) #making time array
for i in range(int(len(time)/4)):
print(time[i*4:(i+1)*4]) # printing the time step for every 4 weeks
['1993-01-06T12:00:00.000000000' '1993-01-13T12:00:00.000000000'
'1993-01-20T12:00:00.000000000' '1993-01-27T12:00:00.000000000'] #all of january 1993
['1993-02-03T12:00:00.000000000' '1993-02-10T12:00:00.000000000'
'1993-02-17T12:00:00.000000000' '1993-02-24T12:00:00.000000000'] # all of february 1993
['1993-03-03T12:00:00.000000000' '1993-03-10T12:00:00.000000000'
'1993-03-17T12:00:00.000000000' '1993-03-24T12:00:00.000000000'] # MARCH 1993 has 5 weeks instead of 4
['1993-03-31T12:00:00.000000000' '1993-04-07T12:00:00.000000000'
'1993-04-14T12:00:00.000000000' '1993-04-21T12:00:00.000000000']
['1993-04-28T12:00:00.000000000' '1993-05-05T12:00:00.000000000'
'1993-05-12T12:00:00.000000000' '1993-05-19T12:00:00.000000000']
['1993-05-26T12:00:00.000000000' '1993-06-02T12:00:00.000000000'
'1993-06-09T12:00:00.000000000' '1993-06-16T12:00:00.000000000']
['1993-06-23T12:00:00.000000000' '1993-06-30T12:00:00.000000000'
'1993-07-07T12:00:00.000000000' '1993-07-14T12:00:00.000000000']
....
当有闰年或某些月份的周数比其他月份多时,我如何将每周数据转换为正确的每月时间序列?
有人好心建议:
import datetime
from datetime import datetime as dt
import numpy as np
time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time] # time = np.array(sss_md.time)
month, year = time[0].month, time[0].year
group_month = {}
for i in time:
if (i.month, i.year) in group_month:
group_month[(i.month, i.year)].append(i)
else:
group_month[(i.month, i.year)] = i
print(group_month)
但是我得到一个错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-178-cb22eada7b48> in <module>
2 from datetime import datetime as dt
3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
5
6 month, year = time[0].month, time[0].year
<ipython-input-178-cb22eada7b48> in <listcomp>(.0)
2 from datetime import datetime as dt
3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
5
6 month, year = time[0].month, time[0].year
IndexError: invalid index to scalar variable.
这个错误是我的时间变量的结构造成的吗?
我们必须将时间转换为 datetime 对象,然后进行比较和分组。
from datetime import datetime as dt
import numpy as np
time = [datetime.strptime(str(n)[:10],"%Y-%m-%d") for n in np.array(sss_md.time)]
month, year = time[0].month, time[0].year
group_month = {}
for i in time:
if (i.month, i.year) in group_month:
group_month[(i.month, i.year)].append(i)
else:
group_month[(i.month, i.year)] = [i]
print(group_month)
您可以使用 datetime.strfttime 将值转换回旧格式。
请注意,我在列表推导中使用 n[:10]
以使我们的格式更容易,因为您的采样时间有很多重复值。
如果您有一组如下所示的字符串:
dates = (
"1993-01-06T12:00:00.000000000",
"1993-01-13T12:00:00.000000000",
"1993-01-20T12:00:00.000000000",
"1993-01-27T12:00:00.000000000",
"1993-02-03T12:00:00.000000000",
"1993-02-10T12:00:00.000000000",
"1993-02-17T12:00:00.000000000",
"1993-02-24T12:00:00.000000000",
"1993-03-03T12:00:00.000000000",
"1993-03-10T12:00:00.000000000",
"1993-03-17T12:00:00.000000000",
"1993-03-24T12:00:00.000000000",
"1993-03-31T12:00:00.000000000",
"1993-04-07T12:00:00.000000000",
"1993-04-14T12:00:00.000000000",
"1993-04-21T12:00:00.000000000",
"1993-04-28T12:00:00.000000000",
"1993-05-05T12:00:00.000000000",
"1993-05-12T12:00:00.000000000",
"1993-05-19T12:00:00.000000000",
"1993-05-26T12:00:00.000000000",
"1993-06-02T12:00:00.000000000",
"1993-06-09T12:00:00.000000000",
"1993-06-16T12:00:00.000000000",
"1993-06-23T12:00:00.000000000",
"1993-06-30T12:00:00.000000000",
"1993-07-07T12:00:00.000000000",
"1993-07-14T12:00:00.000000000"
)
然后您可以使用 itertools.groupby
和自定义键按年和月对字符串进行分组。这假设字符串已经按照年份和月份排序。
from itertools import groupby
def key(string):
return string.split("-")[:2]
month_groups = [list(group) for _, group in groupby(dates, key=key)]
print(month_groups)
您可以使 key
分组功能更可爱,而不是拆分 "-"
,而是解析每个字符串并将其转换为 datetime.datetime
对象。然后 return datetime 对象的年月属性。
我有每周的卫星数据,我想将其转换为每月数据,其中包含 1993 年(1 月)至 2019 年(12 月)的月份、经度和纬度。
我最初做了一个 for 循环,只取每 4 周的平均值来获得每月的平均值:
sss_md_monthly = []
weeks = sss_md.time.size//4
for i in range(weeks):
sss_md_monthly.append(np.mean(sss_md[i::4],axis=0))
sss_md_monthly = np.array(sss_md_monthly)
但是,我注意到有些闰年和特定月份每个月有 5 周而不是 4 周,所以我获取每月平均值的 for 循环不正确,因为我每 4 周取一次平均值(一个月),但有些月份可能有 5 周而不是 4 周。
time = np.array(sss_md.time) #making time array
for i in range(int(len(time)/4)):
print(time[i*4:(i+1)*4]) # printing the time step for every 4 weeks
['1993-01-06T12:00:00.000000000' '1993-01-13T12:00:00.000000000'
'1993-01-20T12:00:00.000000000' '1993-01-27T12:00:00.000000000'] #all of january 1993
['1993-02-03T12:00:00.000000000' '1993-02-10T12:00:00.000000000'
'1993-02-17T12:00:00.000000000' '1993-02-24T12:00:00.000000000'] # all of february 1993
['1993-03-03T12:00:00.000000000' '1993-03-10T12:00:00.000000000'
'1993-03-17T12:00:00.000000000' '1993-03-24T12:00:00.000000000'] # MARCH 1993 has 5 weeks instead of 4
['1993-03-31T12:00:00.000000000' '1993-04-07T12:00:00.000000000'
'1993-04-14T12:00:00.000000000' '1993-04-21T12:00:00.000000000']
['1993-04-28T12:00:00.000000000' '1993-05-05T12:00:00.000000000'
'1993-05-12T12:00:00.000000000' '1993-05-19T12:00:00.000000000']
['1993-05-26T12:00:00.000000000' '1993-06-02T12:00:00.000000000'
'1993-06-09T12:00:00.000000000' '1993-06-16T12:00:00.000000000']
['1993-06-23T12:00:00.000000000' '1993-06-30T12:00:00.000000000'
'1993-07-07T12:00:00.000000000' '1993-07-14T12:00:00.000000000']
....
当有闰年或某些月份的周数比其他月份多时,我如何将每周数据转换为正确的每月时间序列?
有人好心建议:
import datetime
from datetime import datetime as dt
import numpy as np
time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time] # time = np.array(sss_md.time)
month, year = time[0].month, time[0].year
group_month = {}
for i in time:
if (i.month, i.year) in group_month:
group_month[(i.month, i.year)].append(i)
else:
group_month[(i.month, i.year)] = i
print(group_month)
但是我得到一个错误:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-178-cb22eada7b48> in <module>
2 from datetime import datetime as dt
3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
5
6 month, year = time[0].month, time[0].year
<ipython-input-178-cb22eada7b48> in <listcomp>(.0)
2 from datetime import datetime as dt
3 import numpy as np
----> 4 time = [datetime.datetime.strptime(n[:10],"%Y-%m-%d") for n in time]
5
6 month, year = time[0].month, time[0].year
IndexError: invalid index to scalar variable.
这个错误是我的时间变量的结构造成的吗?
我们必须将时间转换为 datetime 对象,然后进行比较和分组。
from datetime import datetime as dt
import numpy as np
time = [datetime.strptime(str(n)[:10],"%Y-%m-%d") for n in np.array(sss_md.time)]
month, year = time[0].month, time[0].year
group_month = {}
for i in time:
if (i.month, i.year) in group_month:
group_month[(i.month, i.year)].append(i)
else:
group_month[(i.month, i.year)] = [i]
print(group_month)
您可以使用 datetime.strfttime 将值转换回旧格式。
请注意,我在列表推导中使用 n[:10]
以使我们的格式更容易,因为您的采样时间有很多重复值。
如果您有一组如下所示的字符串:
dates = (
"1993-01-06T12:00:00.000000000",
"1993-01-13T12:00:00.000000000",
"1993-01-20T12:00:00.000000000",
"1993-01-27T12:00:00.000000000",
"1993-02-03T12:00:00.000000000",
"1993-02-10T12:00:00.000000000",
"1993-02-17T12:00:00.000000000",
"1993-02-24T12:00:00.000000000",
"1993-03-03T12:00:00.000000000",
"1993-03-10T12:00:00.000000000",
"1993-03-17T12:00:00.000000000",
"1993-03-24T12:00:00.000000000",
"1993-03-31T12:00:00.000000000",
"1993-04-07T12:00:00.000000000",
"1993-04-14T12:00:00.000000000",
"1993-04-21T12:00:00.000000000",
"1993-04-28T12:00:00.000000000",
"1993-05-05T12:00:00.000000000",
"1993-05-12T12:00:00.000000000",
"1993-05-19T12:00:00.000000000",
"1993-05-26T12:00:00.000000000",
"1993-06-02T12:00:00.000000000",
"1993-06-09T12:00:00.000000000",
"1993-06-16T12:00:00.000000000",
"1993-06-23T12:00:00.000000000",
"1993-06-30T12:00:00.000000000",
"1993-07-07T12:00:00.000000000",
"1993-07-14T12:00:00.000000000"
)
然后您可以使用 itertools.groupby
和自定义键按年和月对字符串进行分组。这假设字符串已经按照年份和月份排序。
from itertools import groupby
def key(string):
return string.split("-")[:2]
month_groups = [list(group) for _, group in groupby(dates, key=key)]
print(month_groups)
您可以使 key
分组功能更可爱,而不是拆分 "-"
,而是解析每个字符串并将其转换为 datetime.datetime
对象。然后 return datetime 对象的年月属性。