获取时间序列每个月的最后日期 pandas

Question

目前我正在使用某个函数生成 DateTimeIndex，zipline.utils.tradingcalendar.get_trading_days。时间序列大致是每天，但有一些差距。

我的目标是获取 DateTimeIndex 每个月的最后一个日期。

.to_period('M') 和 .to_timestamp('M') 不起作用，因为它们给出的是月份的最后一天，而不是每个月变量的最后一个值。

例如，如果这是我的时间序列，我想 select '2015-05-29' 而该月的最后一天是 '2015-05-31'。

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01']

Answer 1

我的策略是按月分组，然后 select 每个组的 "maximum"：

如果 "dt" 是您的 DatetimeIndex 对象：

last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
    last_date = max(dt_month_group_dict[month])
    last_dates_of_the_month.append(last_date)

列表 "last_date_of_the_month" 包含数据集中每个月的所有最后日期。您可以使用此列表再次在 pandas 中创建 DatetimeIndex（或任何您想用它做的事情）。

Answer 2

Condla 的答案最接近我的需要，只是因为我的时间索引延长了一年多，所以我需要按月份和年份进行分组，然后 select 最大日期。下面是我最终得到的代码。

# tempTradeDays is the initial DatetimeIndex
dateRange = []  
tempYear = None  
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
    tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
    for m in tempYear.keys():
        dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()

Answer 3

也许不再需要答案，但在搜索同一问题的答案时，我发现了一个更简单的解决方案：

import pandas as pd 

sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]

Answer 4

这是一个老问题，但这里所有现有的答案都不是完美的。这是我想出的解决方案（假设日期是一个排序索引），它甚至可以写在一行中，但为了便于阅读，我将其拆分：

month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)

这里有几点说明：

转换日期时间系列需要另一个 pd.Series 实例（参见）
布尔掩码索引需要 .values（参见）

顺便说一下，如果日期是 工作日，使用重采样会更容易：apple.resample('BM')

Answer 5

假设您的数据框如下所示

original dataframe

然后下面的代码会给你每个月的最后一天。

df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')

transformed_dataframe

这一行代码就完成了它的工作:)

Answer 6

试试这个，创建一个新的差异列，其中值 1 指向从一个月到下一个月的变化。

     df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0)

获取时间序列每个月的最后日期 pandas

Get last date in each month of a time series pandas

python

pandas

zipline