如何在 python3 中的 pandas 数据帧的特定时间范围内 select 列?
How to select column for a specific time range from pandas dataframe in python3?
这是我的 pandas 数据框
time energy
0 2018-01-01 00:15:00 0.0000
1 2018-01-01 00:30:00 0.0000
2 2018-01-01 00:45:00 0.0000
3 2018-01-01 01:00:00 0.0000
4 2018-01-01 01:15:00 0.0000
5 2018-01-01 01:30:00 0.0000
6 2018-01-01 01:45:00 0.0000
7 2018-01-01 02:00:00 0.0000
8 2018-01-01 02:15:00 0.0000
9 2018-01-01 02:30:00 0.0000
10 2018-01-01 02:45:00 0.0000
11 2018-01-01 03:00:00 0.0000
12 2018-01-01 03:15:00 0.0000
13 2018-01-01 03:30:00 0.0000
14 2018-01-01 03:45:00 0.0000
15 2018-01-01 04:00:00 0.0000
16 2018-01-01 04:15:00 0.0000
17 2018-01-01 04:30:00 0.0000
18 2018-01-01 04:45:00 0.0000
19 2018-01-01 05:00:00 0.0000
20 2018-01-01 05:15:00 0.0000
21 2018-01-01 05:30:00 0.9392
22 2018-01-01 05:45:00 2.8788
23 2018-01-01 06:00:00 5.5768
24 2018-01-01 06:15:00 8.6660
25 2018-01-01 06:30:00 15.8648
26 2018-01-01 06:45:00 24.1760
27 2018-01-01 07:00:00 38.5324
28 2018-01-01 07:15:00 49.9292
29 2018-01-01 07:30:00 64.3788
我想 select 使用特定时间范围 01:15:00 - 05:30:00 来自 能量列 的值 并对这些值求和。对于列中的 select 数据,我需要小时和分钟值。我知道如何分别使用小时和分钟从列中 select 数据..
import panadas as pd
from datetime import datetime as dt
energy_data = pd.read_csv("/home/mayukh/Downloads/Northam_january2018/output1.csv", index_col=None)
#Using Hour
sum = energy_data[((energy_data.time.dt.hour < 1) & (energy_data.time.dt.hour >= 5))]['energy'].sum()
#using Minute
sum = energy_data[((energy_data.time.dt.minute < 15) & (energy_data.time.dt.minute >= 30))]['energy'].sum()
但我不知道如何将小时和分钟一起用于 select 数据。请告诉我如何进行。
使用between_time
working with Datetimeindex
created by set_index
:
#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
a = df.set_index('time').between_time('01:15:00','05:30:00')['energy'].sum()
print (a)
0.9392
详情:
print (df.set_index('time').between_time('01:15:00','05:30:00'))
energy
time
2018-01-01 01:15:00 0.0000
2018-01-01 01:30:00 0.0000
2018-01-01 01:45:00 0.0000
2018-01-01 02:00:00 0.0000
2018-01-01 02:15:00 0.0000
2018-01-01 02:30:00 0.0000
2018-01-01 02:45:00 0.0000
2018-01-01 03:00:00 0.0000
2018-01-01 03:15:00 0.0000
2018-01-01 03:30:00 0.0000
2018-01-01 03:45:00 0.0000
2018-01-01 04:00:00 0.0000
2018-01-01 04:15:00 0.0000
2018-01-01 04:30:00 0.0000
2018-01-01 04:45:00 0.0000
2018-01-01 05:00:00 0.0000
2018-01-01 05:15:00 0.0000
2018-01-01 05:30:00 0.9392
您可以将列转换为 datetime
并将 .loc
访问器与 pd.Series.between
:
一起使用
from datetime import datetime
df['time'] = pd.to_datetime(df['time'])
start = datetime.strptime('01:15:00', '%H:%M:%S').time()
end = datetime.strptime('05:30:00', '%H:%M:%S').time()
result = df.loc[df['A'].dt.time.between(start, end), 'energy'].sum()
这是我的 pandas 数据框
time energy
0 2018-01-01 00:15:00 0.0000
1 2018-01-01 00:30:00 0.0000
2 2018-01-01 00:45:00 0.0000
3 2018-01-01 01:00:00 0.0000
4 2018-01-01 01:15:00 0.0000
5 2018-01-01 01:30:00 0.0000
6 2018-01-01 01:45:00 0.0000
7 2018-01-01 02:00:00 0.0000
8 2018-01-01 02:15:00 0.0000
9 2018-01-01 02:30:00 0.0000
10 2018-01-01 02:45:00 0.0000
11 2018-01-01 03:00:00 0.0000
12 2018-01-01 03:15:00 0.0000
13 2018-01-01 03:30:00 0.0000
14 2018-01-01 03:45:00 0.0000
15 2018-01-01 04:00:00 0.0000
16 2018-01-01 04:15:00 0.0000
17 2018-01-01 04:30:00 0.0000
18 2018-01-01 04:45:00 0.0000
19 2018-01-01 05:00:00 0.0000
20 2018-01-01 05:15:00 0.0000
21 2018-01-01 05:30:00 0.9392
22 2018-01-01 05:45:00 2.8788
23 2018-01-01 06:00:00 5.5768
24 2018-01-01 06:15:00 8.6660
25 2018-01-01 06:30:00 15.8648
26 2018-01-01 06:45:00 24.1760
27 2018-01-01 07:00:00 38.5324
28 2018-01-01 07:15:00 49.9292
29 2018-01-01 07:30:00 64.3788
我想 select 使用特定时间范围 01:15:00 - 05:30:00 来自 能量列 的值 并对这些值求和。对于列中的 select 数据,我需要小时和分钟值。我知道如何分别使用小时和分钟从列中 select 数据..
import panadas as pd
from datetime import datetime as dt
energy_data = pd.read_csv("/home/mayukh/Downloads/Northam_january2018/output1.csv", index_col=None)
#Using Hour
sum = energy_data[((energy_data.time.dt.hour < 1) & (energy_data.time.dt.hour >= 5))]['energy'].sum()
#using Minute
sum = energy_data[((energy_data.time.dt.minute < 15) & (energy_data.time.dt.minute >= 30))]['energy'].sum()
但我不知道如何将小时和分钟一起用于 select 数据。请告诉我如何进行。
使用between_time
working with Datetimeindex
created by set_index
:
#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
a = df.set_index('time').between_time('01:15:00','05:30:00')['energy'].sum()
print (a)
0.9392
详情:
print (df.set_index('time').between_time('01:15:00','05:30:00'))
energy
time
2018-01-01 01:15:00 0.0000
2018-01-01 01:30:00 0.0000
2018-01-01 01:45:00 0.0000
2018-01-01 02:00:00 0.0000
2018-01-01 02:15:00 0.0000
2018-01-01 02:30:00 0.0000
2018-01-01 02:45:00 0.0000
2018-01-01 03:00:00 0.0000
2018-01-01 03:15:00 0.0000
2018-01-01 03:30:00 0.0000
2018-01-01 03:45:00 0.0000
2018-01-01 04:00:00 0.0000
2018-01-01 04:15:00 0.0000
2018-01-01 04:30:00 0.0000
2018-01-01 04:45:00 0.0000
2018-01-01 05:00:00 0.0000
2018-01-01 05:15:00 0.0000
2018-01-01 05:30:00 0.9392
您可以将列转换为 datetime
并将 .loc
访问器与 pd.Series.between
:
from datetime import datetime
df['time'] = pd.to_datetime(df['time'])
start = datetime.strptime('01:15:00', '%H:%M:%S').time()
end = datetime.strptime('05:30:00', '%H:%M:%S').time()
result = df.loc[df['A'].dt.time.between(start, end), 'energy'].sum()