Python Pandas - 获取特定月份第一天和最后一天的行
Python Pandas - Get the rows of first and last day of particular months
我的数据集 df
如下所示:
Date Value
...
2012-07-31 61.9443
2012-07-30 62.1551
2012-07-27 62.3328
... ...
2011-10-04 48.3923
2011-10-03 48.5939
2011-09-30 50.0327
2011-09-29 51.8350
2011-09-28 50.5555
2011-09-27 51.8470
2011-09-26 49.6350
... ...
2011-08-03 61.3948
2011-08-02 61.5476
2011-08-01 64.1407
2011-07-29 65.0364
2011-07-28 65.7065
2011-07-27 66.3463
2011-07-26 67.1508
2011-07-25 67.5577
... ...
2010-10-05 57.3674
2010-10-04 56.3687
2010-10-01 57.6022
2010-09-30 58.0993
2010-09-29 57.9934
以下是两列的数据类型:
Type Column Name Example Value
-----------------------------------------------------------------
datetime64[ns] Date 2020-06-19 00:00:00
float64 Value 108.82
我想要 df
的一个子集,它只包含 10 月第一个条目 和7 月的最后一个条目 被选中:
Date Value
...
2012-07-31 61.9443
2011-10-03 48.5939
2011-07-29 65.0364
2010-10-01 57.6022
知道怎么做吗?
您可以按日期排序,这样您就知道它们是按时间顺序排列的。之后创建两个数据框,一个月份为 7 并获取该组的最后一条记录,另一个月份为 10 的数据框获取该组的第一条记录。
然后你可以将它们连接起来。
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by='Date')
j = df[df['Date'].dt.month == 7].groupby([df.Date.dt.year, df.Date.dt.month]).last()
o = df[df['Date'].dt.month == 10].groupby([df.Date.dt.year, df.Date.dt.month]).first()
pd.concat([j,o]).reset_index(drop=True)
输出
Date Value
0 2011-07-29 65.0364
1 2012-07-31 61.9443
2 2010-10-01 57.6022
3 2011-10-03 48.5939
这是一个仅基于 Pandas 的解决方案:
df = df.sort_values("Date")
october = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).first()
october = october[october.Date.dt.month == 10]
july = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).last()
july = july[july.Date.dt.month == 7]
pd.concat([july, october])
结果是:
Date Value
2 2011-07-29 65.0364
6 2012-07-31 61.9443
1 2010-10-01 57.6022
5 2011-10-03 48.5939
仅使用已排序数据帧中的索引的无组优雅解决方案:
# Sort you data by Date and convert date string to datetime
df['Date']=pd.to_datetime(df['Date'])
df = df.sort_values(by='Date')
# For selecting first row just subset by index where month is 7 and select first index i.e. 0
jul = df.loc[[df.index[df['Date'].dt.month == 7].tolist()[0]]]
# For sleecting last row just subset by index where months is 10 and select last index i.e -1
oct = df.loc[[df.index[df['Date'].dt.month == 10].tolist()[-1]]]
#Finally concatenate both
pd.concat([jul,oct]).reset_index(drop=True)
我的数据集 df
如下所示:
Date Value
...
2012-07-31 61.9443
2012-07-30 62.1551
2012-07-27 62.3328
... ...
2011-10-04 48.3923
2011-10-03 48.5939
2011-09-30 50.0327
2011-09-29 51.8350
2011-09-28 50.5555
2011-09-27 51.8470
2011-09-26 49.6350
... ...
2011-08-03 61.3948
2011-08-02 61.5476
2011-08-01 64.1407
2011-07-29 65.0364
2011-07-28 65.7065
2011-07-27 66.3463
2011-07-26 67.1508
2011-07-25 67.5577
... ...
2010-10-05 57.3674
2010-10-04 56.3687
2010-10-01 57.6022
2010-09-30 58.0993
2010-09-29 57.9934
以下是两列的数据类型:
Type Column Name Example Value
-----------------------------------------------------------------
datetime64[ns] Date 2020-06-19 00:00:00
float64 Value 108.82
我想要 df
的一个子集,它只包含 10 月第一个条目 和7 月的最后一个条目 被选中:
Date Value
...
2012-07-31 61.9443
2011-10-03 48.5939
2011-07-29 65.0364
2010-10-01 57.6022
知道怎么做吗?
您可以按日期排序,这样您就知道它们是按时间顺序排列的。之后创建两个数据框,一个月份为 7 并获取该组的最后一条记录,另一个月份为 10 的数据框获取该组的第一条记录。
然后你可以将它们连接起来。
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(by='Date')
j = df[df['Date'].dt.month == 7].groupby([df.Date.dt.year, df.Date.dt.month]).last()
o = df[df['Date'].dt.month == 10].groupby([df.Date.dt.year, df.Date.dt.month]).first()
pd.concat([j,o]).reset_index(drop=True)
输出
Date Value
0 2011-07-29 65.0364
1 2012-07-31 61.9443
2 2010-10-01 57.6022
3 2011-10-03 48.5939
这是一个仅基于 Pandas 的解决方案:
df = df.sort_values("Date")
october = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).first()
october = october[october.Date.dt.month == 10]
july = df.groupby([df["Date"].dt.year, df["Date"].dt.month], as_index = False).last()
july = july[july.Date.dt.month == 7]
pd.concat([july, october])
结果是:
Date Value
2 2011-07-29 65.0364
6 2012-07-31 61.9443
1 2010-10-01 57.6022
5 2011-10-03 48.5939
仅使用已排序数据帧中的索引的无组优雅解决方案:
# Sort you data by Date and convert date string to datetime
df['Date']=pd.to_datetime(df['Date'])
df = df.sort_values(by='Date')
# For selecting first row just subset by index where month is 7 and select first index i.e. 0
jul = df.loc[[df.index[df['Date'].dt.month == 7].tolist()[0]]]
# For sleecting last row just subset by index where months is 10 and select last index i.e -1
oct = df.loc[[df.index[df['Date'].dt.month == 10].tolist()[-1]]]
#Finally concatenate both
pd.concat([jul,oct]).reset_index(drop=True)