选择最近 X 个月的数据
Selecting Data from Last X Months
我想要select 最近 4 个月的数据。我想从月初开始,所以如果现在是 7 月 28 日,我需要从 3 月 1 日到 7 月 28 日的数据。
目前我使用 DateOffset,我意识到它正在调用 March28-July28 并遗漏了我的很多数据。
df = pd.read_csv('MyData.csv')
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(months=4)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
我查找了其他答案并找到了这个:How do I calculate the date six months from the current date using the datetime Python module? 所以我尝试使用 relativedelta(months=-4)
并得到 ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements
如有任何帮助,我们将不胜感激。
您可以使用 pd.tseries.offsets.MonthBegin
.
import pandas as pd
# simulate some data
# =================================
np.random.seed(0)
date_rng = pd.date_range('2015-01-01', '2015-07-28', freq='D')
df = pd.DataFrame(np.random.randn(len(date_rng)), index=date_rng, columns=['col'])
df
col
2015-01-01 1.7641
2015-01-02 0.4002
2015-01-03 0.9787
2015-01-04 2.2409
2015-01-05 1.8676
2015-01-06 -0.9773
2015-01-07 0.9501
2015-01-08 -0.1514
... ...
2015-07-21 -0.2394
2015-07-22 1.0997
2015-07-23 0.6553
2015-07-24 0.6401
2015-07-25 -1.6170
2015-07-26 -0.0243
2015-07-27 -0.7380
2015-07-28 0.2799
[209 rows x 1 columns]
# processing
# ===============================
start_date = df.index[-1] - pd.tseries.offsets.MonthBegin(5)
# output: Timestamp('2015-03-01 00:00:00')
df[start_date:]
col
2015-03-01 -0.3627
2015-03-02 -0.6725
2015-03-03 -0.3596
2015-03-04 -0.8131
2015-03-05 -1.7263
2015-03-06 0.1774
2015-03-07 -0.4018
2015-03-08 -1.6302
... ...
2015-07-21 -0.2394
2015-07-22 1.0997
2015-07-23 0.6553
2015-07-24 0.6401
2015-07-25 -1.6170
2015-07-26 -0.0243
2015-07-27 -0.7380
2015-07-28 0.2799
[150 rows x 1 columns]
我想要select 最近 4 个月的数据。我想从月初开始,所以如果现在是 7 月 28 日,我需要从 3 月 1 日到 7 月 28 日的数据。
目前我使用 DateOffset,我意识到它正在调用 March28-July28 并遗漏了我的很多数据。
df = pd.read_csv('MyData.csv')
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data
range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(months=4)
# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) &
(df['recvd_dttm'] <= range_max)]
我查找了其他答案并找到了这个:How do I calculate the date six months from the current date using the datetime Python module? 所以我尝试使用 relativedelta(months=-4)
并得到 ValueError: Length mismatch: Expected axis has 1 elements, new values have 3 elements
如有任何帮助,我们将不胜感激。
您可以使用 pd.tseries.offsets.MonthBegin
.
import pandas as pd
# simulate some data
# =================================
np.random.seed(0)
date_rng = pd.date_range('2015-01-01', '2015-07-28', freq='D')
df = pd.DataFrame(np.random.randn(len(date_rng)), index=date_rng, columns=['col'])
df
col
2015-01-01 1.7641
2015-01-02 0.4002
2015-01-03 0.9787
2015-01-04 2.2409
2015-01-05 1.8676
2015-01-06 -0.9773
2015-01-07 0.9501
2015-01-08 -0.1514
... ...
2015-07-21 -0.2394
2015-07-22 1.0997
2015-07-23 0.6553
2015-07-24 0.6401
2015-07-25 -1.6170
2015-07-26 -0.0243
2015-07-27 -0.7380
2015-07-28 0.2799
[209 rows x 1 columns]
# processing
# ===============================
start_date = df.index[-1] - pd.tseries.offsets.MonthBegin(5)
# output: Timestamp('2015-03-01 00:00:00')
df[start_date:]
col
2015-03-01 -0.3627
2015-03-02 -0.6725
2015-03-03 -0.3596
2015-03-04 -0.8131
2015-03-05 -1.7263
2015-03-06 0.1774
2015-03-07 -0.4018
2015-03-08 -1.6302
... ...
2015-07-21 -0.2394
2015-07-22 1.0997
2015-07-23 0.6553
2015-07-24 0.6401
2015-07-25 -1.6170
2015-07-26 -0.0243
2015-07-27 -0.7380
2015-07-28 0.2799
[150 rows x 1 columns]