获取 pandas 中特定日期范围的平均值
get the average for specific range of dates in pandas
我需要按网站对数据进行分组,并获取特定日期范围内的平均浏览量。我的数据如下所示:
date website amount_views
1/1/2021 a 23
1/2/2021 a 17
1/3/2021 a 10
1/4/2021 a 25
1/5/2021 a 2
1/1/2021 b 12
1/2/2021 b 7
1/3/2021 b 5
1/4/2021 b 17
1/5/2021 b 2
所以我需要查看两个日期范围(1/1/2021 - 1/3/2021(之前)和 1/3/2021 - 1/5/)的 a 和 b 网站的平均值是多少2021 (post))
期望的输出是:
date website avg_amount_views
pre a 31.5
post a 35.6
pre b 15.5
post b 22.6
- 每周使用
pandas.Grouper
并将 freq
参数指定为 'W'
。
import pandas as pd
# test dataframe
data = {'date': ['1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021', '1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021'], 'website': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'], 'amount_views': [23, 17, 10, 25, 2, 12, 7, 5, 17, 2]}
df = pd.DataFrame(data)
# set the date column to a datetime format - required
df.date = pd.to_datetime(df.date)
# groupby with pd.Grouper
mean_visits = df.groupby([pd.Grouper(key='date', freq='W'), 'website'])['amount_views'].mean().reset_index(name='mean_visits')
# display(mean_visits)
date website mean_visits
0 2021-01-03 a 16.666667
1 2021-01-03 b 8.000000
2 2021-01-10 a 13.500000
3 2021-01-10 b 9.500000
您可以使用 np.where 和 date.between 分配 pre 和 post 状态并按相同和网站分组并求平均值。
在一行中(虽然不是那么可读):
df['date']=pd.to_datetime(df['date'])
df.groupby([np.where(df['date'].between('1/1/2021','1/3/2021'),'pre'\
,'post'),'website'])['amount_views'].mean().to_frame('mean')
循序渐进(更具可读性):
df['date']=pd.to_datetime(df['date'])
df['status']=np.where(df['date'].between('1/1/2021','1/3/2021'),'pre','post')
df.groupby(['status','website'])['amount_views'].mean().to_frame('mean')
mean
status website
post a 13.500000
b 9.500000
pre a 16.666667
b 8.000000
使用:
dates = pd.to_datetime(df['date'])
new_df = (df.groupby(['website', np.select((dates.between('1/1/2021', '1/3/2021'),
dates.between('1/3/2021', '1/5/2021')),
('pre', 'pos'))])
.amount_views
.mean()
.rename_axis(('website', 'date'))
.reset_index(name='avg_amount_views'))
print(new_df)
website date avg_amount_views
0 a pos 13.500000
1 a pre 16.666667
2 b pos 9.500000
3 b pre 8.000000
您可以使用pd.cut来定义'pre'和'post':
grp = pd.cut(df['date'], bins=[pd.Timestamp(2021, 1, 1),
pd.Timestamp(2021, 1, 3),
pd.Timestamp(2021, 1, 6)], labels=['pre', 'post'],
right=False)
df.groupby([grp, 'website'])['amount_views'].agg(['mean','count']).reset_index()
输出:
date website mean count
0 pre a 20.000000 2
1 pre b 9.500000 2
2 post a 12.333333 3
3 post b 8.000000 3
我需要按网站对数据进行分组,并获取特定日期范围内的平均浏览量。我的数据如下所示:
date website amount_views
1/1/2021 a 23
1/2/2021 a 17
1/3/2021 a 10
1/4/2021 a 25
1/5/2021 a 2
1/1/2021 b 12
1/2/2021 b 7
1/3/2021 b 5
1/4/2021 b 17
1/5/2021 b 2
所以我需要查看两个日期范围(1/1/2021 - 1/3/2021(之前)和 1/3/2021 - 1/5/)的 a 和 b 网站的平均值是多少2021 (post)) 期望的输出是:
date website avg_amount_views
pre a 31.5
post a 35.6
pre b 15.5
post b 22.6
- 每周使用
pandas.Grouper
并将freq
参数指定为'W'
。
import pandas as pd
# test dataframe
data = {'date': ['1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021', '1/1/2021', '1/2/2021', '1/3/2021', '1/4/2021', '1/5/2021'], 'website': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'], 'amount_views': [23, 17, 10, 25, 2, 12, 7, 5, 17, 2]}
df = pd.DataFrame(data)
# set the date column to a datetime format - required
df.date = pd.to_datetime(df.date)
# groupby with pd.Grouper
mean_visits = df.groupby([pd.Grouper(key='date', freq='W'), 'website'])['amount_views'].mean().reset_index(name='mean_visits')
# display(mean_visits)
date website mean_visits
0 2021-01-03 a 16.666667
1 2021-01-03 b 8.000000
2 2021-01-10 a 13.500000
3 2021-01-10 b 9.500000
您可以使用 np.where 和 date.between 分配 pre 和 post 状态并按相同和网站分组并求平均值。
在一行中(虽然不是那么可读):
df['date']=pd.to_datetime(df['date'])
df.groupby([np.where(df['date'].between('1/1/2021','1/3/2021'),'pre'\
,'post'),'website'])['amount_views'].mean().to_frame('mean')
循序渐进(更具可读性):
df['date']=pd.to_datetime(df['date'])
df['status']=np.where(df['date'].between('1/1/2021','1/3/2021'),'pre','post')
df.groupby(['status','website'])['amount_views'].mean().to_frame('mean')
mean
status website
post a 13.500000
b 9.500000
pre a 16.666667
b 8.000000
使用:
dates = pd.to_datetime(df['date'])
new_df = (df.groupby(['website', np.select((dates.between('1/1/2021', '1/3/2021'),
dates.between('1/3/2021', '1/5/2021')),
('pre', 'pos'))])
.amount_views
.mean()
.rename_axis(('website', 'date'))
.reset_index(name='avg_amount_views'))
print(new_df)
website date avg_amount_views
0 a pos 13.500000
1 a pre 16.666667
2 b pos 9.500000
3 b pre 8.000000
您可以使用pd.cut来定义'pre'和'post':
grp = pd.cut(df['date'], bins=[pd.Timestamp(2021, 1, 1),
pd.Timestamp(2021, 1, 3),
pd.Timestamp(2021, 1, 6)], labels=['pre', 'post'],
right=False)
df.groupby([grp, 'website'])['amount_views'].agg(['mean','count']).reset_index()
输出:
date website mean count
0 pre a 20.000000 2
1 pre b 9.500000 2
2 post a 12.333333 3
3 post b 8.000000 3