将 pandas 时间序列切成 n 个月的块
Slice pandas time-series into n-month chunks
给定一个按日期索引的 pandas 系列,我需要将该系列分成 n 个月的块。下面的代码将数据分成 12 个月的块。如何将其概括为切成 n 个月的块?另外,请注意,并非所有日期都在系列中,因此系列中可能不存在每个月的第一天和最后一天。
# Create a pandas series indexed by date
import pandas as pd
import numpy as np
dates = pd.date_range('2000-01-01', '2009-12-31')
data = np.random.rand(len(dates))
series = pd.Series(data, dates)
# Poke holes in the data, so not all dates are represented
series = series[series > 0.50]
# Slice the series into chunks of 12 months each
for year in range(2000, 2009+1):
slice = series[str(year):str(year)]
print "Start date =", slice.index[0], " End date =", slice.index[-1]
您可以使用 pd.cut()
将您的时间序列索引切割成块,然后使用 groupby
执行您的自定义计算。
# Create a pandas series indexed by date
import pandas as pd
import numpy as np
np.random.seed(0)
dates = pd.date_range('2000-01-01', '2009-12-31', freq='D')
data = np.random.rand(len(dates))
series = pd.Series(data, dates)
# Poke holes in the data, so not all dates are represented
series = series[series > 0.8]
# create a data_range, suppose start at 2001-01-01, 3 month
date_rng = pd.date_range('2000-01-01', periods=50, freq='3MS')
labels = date_rng[1:]
# use pd.cut to cut ts index into chunks
grouped = series.groupby(pd.cut(series.index, bins=date_rng, labels=labels, right=False))
start_date = grouped.head(1).index
Out[206]:
DatetimeIndex(['2000-01-08', '2000-04-08', '2000-07-03', '2000-10-02',
'2001-01-03', '2001-04-04', '2001-07-01', '2001-10-02',
'2002-01-11', '2002-04-05', '2002-07-01', '2002-10-02',
'2003-01-02', '2003-04-03', '2003-07-02', '2003-10-04',
'2004-01-01', '2004-04-01', '2004-07-03', '2004-10-03',
'2005-01-07', '2005-04-08', '2005-07-12', '2005-10-05',
'2006-01-01', '2006-04-01', '2006-07-01', '2006-10-04',
'2007-01-05', '2007-04-04', '2007-07-05', '2007-10-06',
'2008-01-01', '2008-04-05', '2008-07-05', '2008-10-01',
'2009-01-02', '2009-04-04', '2009-07-04', '2009-10-02'],
dtype='datetime64[ns]', freq=None, tz=None)
end_date = grouped.tail(1).index
Out[207]:
DatetimeIndex(['2000-03-30', '2000-06-26', '2000-09-30', '2000-12-30',
'2001-03-30', '2001-06-28', '2001-09-27', '2001-12-28',
'2002-03-24', '2002-06-29', '2002-09-24', '2002-12-29',
'2003-03-27', '2003-06-22', '2003-09-28', '2003-12-31',
'2004-03-31', '2004-06-27', '2004-09-17', '2004-12-31',
'2005-03-23', '2005-06-23', '2005-09-30', '2005-12-30',
'2006-03-29', '2006-06-24', '2006-09-30', '2006-12-31',
'2007-03-26', '2007-06-27', '2007-09-29', '2007-12-31',
'2008-03-25', '2008-06-30', '2008-09-28', '2008-12-30',
'2009-03-25', '2009-06-29', '2009-09-26', '2009-12-27'],
dtype='datetime64[ns]', freq=None, tz=None)
给定一个按日期索引的 pandas 系列,我需要将该系列分成 n 个月的块。下面的代码将数据分成 12 个月的块。如何将其概括为切成 n 个月的块?另外,请注意,并非所有日期都在系列中,因此系列中可能不存在每个月的第一天和最后一天。
# Create a pandas series indexed by date
import pandas as pd
import numpy as np
dates = pd.date_range('2000-01-01', '2009-12-31')
data = np.random.rand(len(dates))
series = pd.Series(data, dates)
# Poke holes in the data, so not all dates are represented
series = series[series > 0.50]
# Slice the series into chunks of 12 months each
for year in range(2000, 2009+1):
slice = series[str(year):str(year)]
print "Start date =", slice.index[0], " End date =", slice.index[-1]
您可以使用 pd.cut()
将您的时间序列索引切割成块,然后使用 groupby
执行您的自定义计算。
# Create a pandas series indexed by date
import pandas as pd
import numpy as np
np.random.seed(0)
dates = pd.date_range('2000-01-01', '2009-12-31', freq='D')
data = np.random.rand(len(dates))
series = pd.Series(data, dates)
# Poke holes in the data, so not all dates are represented
series = series[series > 0.8]
# create a data_range, suppose start at 2001-01-01, 3 month
date_rng = pd.date_range('2000-01-01', periods=50, freq='3MS')
labels = date_rng[1:]
# use pd.cut to cut ts index into chunks
grouped = series.groupby(pd.cut(series.index, bins=date_rng, labels=labels, right=False))
start_date = grouped.head(1).index
Out[206]:
DatetimeIndex(['2000-01-08', '2000-04-08', '2000-07-03', '2000-10-02',
'2001-01-03', '2001-04-04', '2001-07-01', '2001-10-02',
'2002-01-11', '2002-04-05', '2002-07-01', '2002-10-02',
'2003-01-02', '2003-04-03', '2003-07-02', '2003-10-04',
'2004-01-01', '2004-04-01', '2004-07-03', '2004-10-03',
'2005-01-07', '2005-04-08', '2005-07-12', '2005-10-05',
'2006-01-01', '2006-04-01', '2006-07-01', '2006-10-04',
'2007-01-05', '2007-04-04', '2007-07-05', '2007-10-06',
'2008-01-01', '2008-04-05', '2008-07-05', '2008-10-01',
'2009-01-02', '2009-04-04', '2009-07-04', '2009-10-02'],
dtype='datetime64[ns]', freq=None, tz=None)
end_date = grouped.tail(1).index
Out[207]:
DatetimeIndex(['2000-03-30', '2000-06-26', '2000-09-30', '2000-12-30',
'2001-03-30', '2001-06-28', '2001-09-27', '2001-12-28',
'2002-03-24', '2002-06-29', '2002-09-24', '2002-12-29',
'2003-03-27', '2003-06-22', '2003-09-28', '2003-12-31',
'2004-03-31', '2004-06-27', '2004-09-17', '2004-12-31',
'2005-03-23', '2005-06-23', '2005-09-30', '2005-12-30',
'2006-03-29', '2006-06-24', '2006-09-30', '2006-12-31',
'2007-03-26', '2007-06-27', '2007-09-29', '2007-12-31',
'2008-03-25', '2008-06-30', '2008-09-28', '2008-12-30',
'2009-03-25', '2009-06-29', '2009-09-26', '2009-12-27'],
dtype='datetime64[ns]', freq=None, tz=None)