如何仅按小时聚合 pandas 日期时间轴系列
How aggregate a pandas date timeline series only by hour
我有一个 pandas 时间轴 table 包含日期对象和分数:
datetime score
2018-11-23 08:33:02 4
2018-11-24 09:43:30 2
2018-11-25 08:21:34 5
2018-11-26 19:33:01 4
2018-11-23 08:50:40 1
2018-11-23 09:03:10 3
我想在不考虑日期的情况下按小时汇总分数,想要的结果是:
08:00:00 10
09:00:00 5
19:00:00 4
所以基本上我必须删除日期-月份-年份,然后按小时分组得分,
我试过这个命令
monthagg = df['score'].resample('H').sum().to_frame()
哪个有效但考虑了日期-月份-年份,如何删除 DD-MM-YYYY 并按小时汇总?
设置生成带有日期时间对象的帧:
import datetime
import pandas as pd
rows = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(100)]
df = pd.DataFrame(rows,columns = ["date"])
您现在可以像这样添加一个小时列,然后按它分组:
df["hour"] = df["date"].dt.hour
df.groupby("hour").sum()
一种可能的解决方案是使用 DatetimeIndex.floor
for set minutes and seconds to 0
and then convert DatetimeIndex
to strings by DatetimeIndex.strftime
,然后聚合 sum
:
a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
#if column datetime
#a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
print (a)
08:00:00 10
09:00:00 5
19:00:00 4
Name: score, dtype: int64
或使用 DatetimeIndex.hour
并聚合 sum
:
a = df.groupby(df.index.hour)['score'].sum()
#if column datetime
#a = df.groupby(df['datetime'].dt.hour)['score'].sum()
print (a)
datetime
8 10
9 5
19 4
Name: score, dtype: int64
import pandas as pd
df = pd.DataFrame({'datetime':['2018-11-23 08:33:02 ','2018-11-24 09:43:30',
'2018-11-25 08:21:34',
'2018-11-26 19:33:01','2018-11-23 08:50:40',
'2018-11-23 09:03:10'],'score':[4,2,5,4,1,3]})
df['datetime']=pd.to_datetime(df['datetime'], errors='coerce')
df["hour"] = df["datetime"].dt.hour
df.groupby("hour").sum()
输出:
8 10
9 5
19 4
我有一个 pandas 时间轴 table 包含日期对象和分数:
datetime score
2018-11-23 08:33:02 4
2018-11-24 09:43:30 2
2018-11-25 08:21:34 5
2018-11-26 19:33:01 4
2018-11-23 08:50:40 1
2018-11-23 09:03:10 3
我想在不考虑日期的情况下按小时汇总分数,想要的结果是:
08:00:00 10
09:00:00 5
19:00:00 4
所以基本上我必须删除日期-月份-年份,然后按小时分组得分,
我试过这个命令
monthagg = df['score'].resample('H').sum().to_frame()
哪个有效但考虑了日期-月份-年份,如何删除 DD-MM-YYYY 并按小时汇总?
设置生成带有日期时间对象的帧:
import datetime
import pandas as pd
rows = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(100)]
df = pd.DataFrame(rows,columns = ["date"])
您现在可以像这样添加一个小时列,然后按它分组:
df["hour"] = df["date"].dt.hour
df.groupby("hour").sum()
一种可能的解决方案是使用 DatetimeIndex.floor
for set minutes and seconds to 0
and then convert DatetimeIndex
to strings by DatetimeIndex.strftime
,然后聚合 sum
:
a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
#if column datetime
#a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
print (a)
08:00:00 10
09:00:00 5
19:00:00 4
Name: score, dtype: int64
或使用 DatetimeIndex.hour
并聚合 sum
:
a = df.groupby(df.index.hour)['score'].sum()
#if column datetime
#a = df.groupby(df['datetime'].dt.hour)['score'].sum()
print (a)
datetime
8 10
9 5
19 4
Name: score, dtype: int64
import pandas as pd
df = pd.DataFrame({'datetime':['2018-11-23 08:33:02 ','2018-11-24 09:43:30',
'2018-11-25 08:21:34',
'2018-11-26 19:33:01','2018-11-23 08:50:40',
'2018-11-23 09:03:10'],'score':[4,2,5,4,1,3]})
df['datetime']=pd.to_datetime(df['datetime'], errors='coerce')
df["hour"] = df["datetime"].dt.hour
df.groupby("hour").sum()
输出:
8 10
9 5
19 4