提取时间戳在特定范围内的 Python Pandas 条记录
Extracting Python Pandas records where Timestamp is within specific range
我有一个数据框,df,其中一列存储处理时间(TimeStamp 对象)。
示例数据框:
from datetime import datetime, date
import pandas as pd
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
我想实现的是提取所有满足某个条件(或多个条件)的记录。在一种情况下,我想找到 'ProcessingTime' 属性具有 大于 13:10.[=42= 的小时值的所有记录]
在上面的示例数据框中,这种情况下所需的输出将是第一条记录。
将此类条件应用于数据帧记录的正确方法是什么?
P.S。
我尝试使用以下方法,但均无效:
df.loc[ (df['ProcessTime'].time().hour > 14) ]
这会引发“AttributeError”,因为 'Series' 对象没有属性 'time'
和
df.loc[ (df['ProcessTime'] > datetime.time(14, 0, 0) ]
这会引发“TypeError”,因为 dtype=datetime64[ns] 和 time
之间的比较无效
- 问题是 Boolean Indexing isn't being performed properly using the pandas
.dt
accessor
- 给定一个带有 datetime 数据类型的列,可以使用
.dt.
后跟所需的方法(例如 pandas.Series.dt.time
) 访问 datetime 对象的组件
- 这是Time/date components
import pandas as pd
from datetime import date, datetime, time
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
# display(df)
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
2 E-8_10AUR-18-0037_00 2018-08-16 2018-08-16 07:32:06
# single condition
df[df.ProcessingTime.dt.hour > 7]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
# multiple conditions
df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# an entire datetime
df[df.ProcessingTime < '2015-09-17 09:06:07']
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# using .time
df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
我有一个数据框,df,其中一列存储处理时间(TimeStamp 对象)。
示例数据框:
from datetime import datetime, date
import pandas as pd
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
我想实现的是提取所有满足某个条件(或多个条件)的记录。在一种情况下,我想找到 'ProcessingTime' 属性具有 大于 13:10.[=42= 的小时值的所有记录] 在上面的示例数据框中,这种情况下所需的输出将是第一条记录。
将此类条件应用于数据帧记录的正确方法是什么?
P.S。 我尝试使用以下方法,但均无效:
df.loc[ (df['ProcessTime'].time().hour > 14) ]
这会引发“AttributeError”,因为 'Series' 对象没有属性 'time'
和
df.loc[ (df['ProcessTime'] > datetime.time(14, 0, 0) ]
这会引发“TypeError”,因为 dtype=datetime64[ns] 和 time
之间的比较无效- 问题是 Boolean Indexing isn't being performed properly using the pandas
.dt
accessor - 给定一个带有 datetime 数据类型的列,可以使用
.dt.
后跟所需的方法(例如pandas.Series.dt.time
) 访问 datetime 对象的组件
- 这是Time/date components
import pandas as pd
from datetime import date, datetime, time
ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))
df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])
# display(df)
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
2 E-8_10AUR-18-0037_00 2018-08-16 2018-08-16 07:32:06
# single condition
df[df.ProcessingTime.dt.hour > 7]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07
# multiple conditions
df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# an entire datetime
df[df.ProcessingTime < '2015-09-17 09:06:07']
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
# using .time
df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]
[out]:
ID ProcessDate ProcessingTime
0 WO_EW-1_10AUR-15-0031_00 2015-09-14 2015-09-14 13:23:40
1 IW-12_0400-15-0012_00 2015-09-17 2015-09-17 09:06:07