如何从一天的索引匹配中获取全天数据
How to get all day data from index matching one day
我有一个数据框 df1
按日期时间索引,每分钟有一个条目,持续数周
样本:
SAMPLE_TIME Bottom Top Out state
0 2015-07-15 16:41:56 48.625 55.812 43.875 1
1 2015-07-15 16:42:55 48.750 55.812 43.875 1
2 2015-07-15 16:43:55 48.937 55.812 43.875 1
3 2015-07-15 16:44:56 49.125 55.812 43.812 1
4 2015-07-15 16:45:55 49.312 55.812 43.812 1
我想找到 Avg(TempBottom,TempTop) 最低的那一天,然后按分钟获取全天数据,这样我就可以绘制那一天,我试过:
df2 = df1.groupby(pd.TimeGrouper('D')).agg(min) \
.sort(['TempTop','TempBottom'], ascending=[True,True])
这给了我订购的最低温度天数。
样本:
SAMPLE_TIME Bottom Top Out state
2015-10-17 19.994 25.840 21.875 0
2015-08-29 26.182 28.777 25.937 0
2015-11-19 19.244 33.027 28.937 0
2015-11-07 19.744 33.527 28.125 0
然后我只需要从 df2 中获取第一个条目的索引:
df1[df2.index[1]]
但是我收到一个错误:
KeyError: Timestamp('2015-08-29 00:00:00')
来自 docs:
Warning
The following selection will raise a KeyError
; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)
dft['2013-1-15 12:30:00']
To select a single row, use .loc
In [71]: dft.loc['2013-1-15 12:30:00']
Out[71]:
A 0.193284
Name: 2013-01-15 12:30:00, dtype: float64
所以你需要在你的情况下使用loc
方法:
In [103]: df1.loc[df2.index[0]]
Out[103]:
SAMPLE_TIME TempBottom TempTop TempOut State Bypass
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
编辑
当您传递单个参数时,它会尝试使用标签进行访问。但是,当您通过间隔时,它将用作切片。您可以使用技巧来传递值 + 1 天:
In [276]: df2.index[0]
Out[276]: Timestamp('2015-07-15 00:00:00', offset='D')
In [277]: df2.index[0] + 1
Out[277]: Timestamp('2015-07-16 00:00:00', offset='D')
In [278]: df1.loc[df2.index[0]: df2.index[0] + 1]
Out[278]:
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
EDIT2
或者您可以将 Timestamp
的 date
转换为 str
:
In [355]: df2.index[0]
Out[355]: Timestamp('2015-07-15 00:00:00', offset='D')
In [356]: df2.index[0].date()
Out[356]: datetime.date(2015, 7, 15)
In [357]: str(df2.index[0].date())
Out[357]: '2015-07-15'
In [359]: df1[str(df2.index[0].date())]
Out[359]:
TempBottom TempTop TempOut State Bypass
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
所以这是我的思考过程,结合 答案:
In [1]: df1.ix[df2]
# call trace
ValueError: Cannot index with multidimensional key
In [2]: df1.ix[df2.index]
out[2]:
SAMPLE_TIME Bottom Top Out state
2015-10-17 NaN NaN NaN NaN
2015-08-29 NaN NaN NaN NaN
2015-11-19 NaN NaN NaN NaN
2015-11-07 NaN NaN NaN NaN
In [3]: df1.ix[df2.index[4:5]]
Out[3]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 NaN NaN NaN NaN
In [33]: df1.loc[df2.index[4:5]]
KeyError: "None of [DatetimeIndex(['2015-11-04'], dtype='datetime64[ns]', name=u'SAMPLE_TIME', freq=None, tz=None)] are in the [index]"
最后我放弃了 ix
并决定让 loc
工作,因为 Anton
建议我尝试:
In [4]: df1.loc[df2.index[0].date()]
KeyError: 'the label [2015-11-04] is not in the [index]'
让我想到 loc 只接受最终起作用的字符串:
In [5]: df1.loc[df2.index[4].strftime('%Y-%m-%d')]
Out[5]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 00:00:22 56.256 56.300 43.750 0
2015-11-04 00:01:22 56.256 56.300 43.812 0
2015-11-04 00:02:22 56.256 56.300 43.812 0
2015-11-04 00:03:22 56.256 56.300 43.812 0
我有一个数据框 df1
按日期时间索引,每分钟有一个条目,持续数周
样本:
SAMPLE_TIME Bottom Top Out state
0 2015-07-15 16:41:56 48.625 55.812 43.875 1
1 2015-07-15 16:42:55 48.750 55.812 43.875 1
2 2015-07-15 16:43:55 48.937 55.812 43.875 1
3 2015-07-15 16:44:56 49.125 55.812 43.812 1
4 2015-07-15 16:45:55 49.312 55.812 43.812 1
我想找到 Avg(TempBottom,TempTop) 最低的那一天,然后按分钟获取全天数据,这样我就可以绘制那一天,我试过:
df2 = df1.groupby(pd.TimeGrouper('D')).agg(min) \
.sort(['TempTop','TempBottom'], ascending=[True,True])
这给了我订购的最低温度天数。 样本:
SAMPLE_TIME Bottom Top Out state
2015-10-17 19.994 25.840 21.875 0
2015-08-29 26.182 28.777 25.937 0
2015-11-19 19.244 33.027 28.937 0
2015-11-07 19.744 33.527 28.125 0
然后我只需要从 df2 中获取第一个条目的索引:
df1[df2.index[1]]
但是我收到一个错误:
KeyError: Timestamp('2015-08-29 00:00:00')
来自 docs:
Warning
The following selection will raise a
KeyError
; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)
dft['2013-1-15 12:30:00']
To select a single row, use
.loc
In [71]: dft.loc['2013-1-15 12:30:00'] Out[71]: A 0.193284 Name: 2013-01-15 12:30:00, dtype: float64
所以你需要在你的情况下使用loc
方法:
In [103]: df1.loc[df2.index[0]]
Out[103]:
SAMPLE_TIME TempBottom TempTop TempOut State Bypass
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
编辑
当您传递单个参数时,它会尝试使用标签进行访问。但是,当您通过间隔时,它将用作切片。您可以使用技巧来传递值 + 1 天:
In [276]: df2.index[0]
Out[276]: Timestamp('2015-07-15 00:00:00', offset='D')
In [277]: df2.index[0] + 1
Out[277]: Timestamp('2015-07-16 00:00:00', offset='D')
In [278]: df1.loc[df2.index[0]: df2.index[0] + 1]
Out[278]:
TempBottom TempTop TempOut State Bypass
SAMPLE_TIME
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
EDIT2
或者您可以将 Timestamp
的 date
转换为 str
:
In [355]: df2.index[0]
Out[355]: Timestamp('2015-07-15 00:00:00', offset='D')
In [356]: df2.index[0].date()
Out[356]: datetime.date(2015, 7, 15)
In [357]: str(df2.index[0].date())
Out[357]: '2015-07-15'
In [359]: df1[str(df2.index[0].date())]
Out[359]:
TempBottom TempTop TempOut State Bypass
2015-07-15 16:41:56 48.625 55.812 43.875 1 1
2015-07-15 16:42:55 48.750 55.812 43.875 1 1
2015-07-15 16:43:55 48.937 55.812 43.875 1 1
2015-07-15 16:44:56 49.125 55.812 43.812 1 1
2015-07-15 16:45:55 49.312 55.812 43.812 1 1
所以这是我的思考过程,结合
In [1]: df1.ix[df2]
# call trace
ValueError: Cannot index with multidimensional key
In [2]: df1.ix[df2.index]
out[2]:
SAMPLE_TIME Bottom Top Out state
2015-10-17 NaN NaN NaN NaN
2015-08-29 NaN NaN NaN NaN
2015-11-19 NaN NaN NaN NaN
2015-11-07 NaN NaN NaN NaN
In [3]: df1.ix[df2.index[4:5]]
Out[3]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 NaN NaN NaN NaN
In [33]: df1.loc[df2.index[4:5]]
KeyError: "None of [DatetimeIndex(['2015-11-04'], dtype='datetime64[ns]', name=u'SAMPLE_TIME', freq=None, tz=None)] are in the [index]"
最后我放弃了 ix
并决定让 loc
工作,因为 Anton
建议我尝试:
In [4]: df1.loc[df2.index[0].date()]
KeyError: 'the label [2015-11-04] is not in the [index]'
让我想到 loc 只接受最终起作用的字符串:
In [5]: df1.loc[df2.index[4].strftime('%Y-%m-%d')]
Out[5]:
SAMPLE_TIME Bottom Top Out state
2015-11-04 00:00:22 56.256 56.300 43.750 0
2015-11-04 00:01:22 56.256 56.300 43.812 0
2015-11-04 00:02:22 56.256 56.300 43.812 0
2015-11-04 00:03:22 56.256 56.300 43.812 0