Python Pandas 库按截断日期重新采样
Python Pandas Library Resample By Truncate Date
使用 python3 库 pandas,我在 excel 文件中有这样的数据
Id | Date | count
----+-------------------------+-----------
1 | '2019/10/01 10:40' | 1
----+-------------------------------------
2 | '2019/10/01 10:43' | 2
----+-------------------------------------
3 | '2019/10/02 10:40' | 3
----+-------------------------------------
4 | '2019/10/05 10:40' | 4
----+-------------------------------------
5 | '2019/10/08 10:40' | 5
----+-------------------------------------
6 | '2019/10/09 10:40' | 6
----+-------------------------------------
7 | '2019/10/15 10:40' | 7
我想按周和时间按此示例分组。例如我需要的结果是:
Id | Week Time | count
----+-------------------------+-----------
1 | 'Tuesday 10:40' | 1
----+-------------------------------------
2 | 'Tuesday 10:43' | 2
----+-------------------------------------
3 | 'Wednesday 10:40' | 3
----+-------------------------------------
4 | 'Saturday 10:40' | 4
----+-------------------------------------
5 | 'Tuesday 10:40' | 5
----+-------------------------------------
6 | 'Wednesday 10:40' | 6
----+-------------------------------------
7 | 'Tuesday 10:40' | 7
在 pandas 重新取样后,我得到了这个结果:
Week Time | sum | count | avg
-------------------------+-------+-------+---------
'Tuesday 10:40' | 14 | 3 | 4.66
-------------------------+-------+-------+---------
'Tuesday 10:43' | 2 | 1 | 2.00
-------------------------+-------+-------+---------
'Wednesday 10:40' | 9 | 2 | 4.50
---------------------------------+-------+---------
'Saturday 10:40' | 4 | 1 | 4.00
我可以从 pandas 库的重采样方法得到这个结果吗?
我相信您需要 Series.dt.strftime
and then aggregate by GroupBy.agg
:
的自定义日期时间格式
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%A %H:%M')
#if necessary remove trailing '
#df['Date'] = pd.to_datetime(df['Date'].str.strip("'")).dt.strftime('%A %H:%M')
df = df.groupby('Date', sort=False)['count'].agg(['sum','count', 'mean'])
print (df)
sum count mean
Date
Tuesday 10:40 13 3 4.333333
Tuesday 10:43 2 1 2.000000
Wednesday 10:40 9 2 4.500000
Saturday 10:40 4 1 4.000000
使用 python3 库 pandas,我在 excel 文件中有这样的数据
Id | Date | count
----+-------------------------+-----------
1 | '2019/10/01 10:40' | 1
----+-------------------------------------
2 | '2019/10/01 10:43' | 2
----+-------------------------------------
3 | '2019/10/02 10:40' | 3
----+-------------------------------------
4 | '2019/10/05 10:40' | 4
----+-------------------------------------
5 | '2019/10/08 10:40' | 5
----+-------------------------------------
6 | '2019/10/09 10:40' | 6
----+-------------------------------------
7 | '2019/10/15 10:40' | 7
我想按周和时间按此示例分组。例如我需要的结果是:
Id | Week Time | count
----+-------------------------+-----------
1 | 'Tuesday 10:40' | 1
----+-------------------------------------
2 | 'Tuesday 10:43' | 2
----+-------------------------------------
3 | 'Wednesday 10:40' | 3
----+-------------------------------------
4 | 'Saturday 10:40' | 4
----+-------------------------------------
5 | 'Tuesday 10:40' | 5
----+-------------------------------------
6 | 'Wednesday 10:40' | 6
----+-------------------------------------
7 | 'Tuesday 10:40' | 7
在 pandas 重新取样后,我得到了这个结果:
Week Time | sum | count | avg
-------------------------+-------+-------+---------
'Tuesday 10:40' | 14 | 3 | 4.66
-------------------------+-------+-------+---------
'Tuesday 10:43' | 2 | 1 | 2.00
-------------------------+-------+-------+---------
'Wednesday 10:40' | 9 | 2 | 4.50
---------------------------------+-------+---------
'Saturday 10:40' | 4 | 1 | 4.00
我可以从 pandas 库的重采样方法得到这个结果吗?
我相信您需要 Series.dt.strftime
and then aggregate by GroupBy.agg
:
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%A %H:%M')
#if necessary remove trailing '
#df['Date'] = pd.to_datetime(df['Date'].str.strip("'")).dt.strftime('%A %H:%M')
df = df.groupby('Date', sort=False)['count'].agg(['sum','count', 'mean'])
print (df)
sum count mean
Date
Tuesday 10:40 13 3 4.333333
Tuesday 10:43 2 1 2.000000
Wednesday 10:40 9 2 4.500000
Saturday 10:40 4 1 4.000000