Python:根据 pandas 数据框中的每日数据计算周开始和周结束?
Python: Calculate week start and week end from daily data in pandas dataframe?
我有不同月份的每日数据集。
我想根据每种产品类型和国家/地区计算周开始(星期日)和周末(星期六),值应该是该特定周的平均值。
dates product country value name
2021-10-01 00:00:00 Voice Lucia 2 A
2021-10-01 00:00:00 TV Jamai 1 A
2021-10-01 00:00:00 TV Trin 5 A
2021-10-01 00:00:00 Voice Gren 5 A
2021-10-01 00:00:00 Broad Vin 7 A
2021-10-01 00:00:00 TV Gren 8 A
2021-10-01 00:00:00 Broad Barb 5 A
2021-10-01 00:00:00 Voice Jamai 23 A
2021-10-01 00:00:00 Voice Trin 6 A
2021-10-01 00:00:00 TV Cur 7 A
2021-10-02 00:00:00 Broad Jamai 2 A
2021-10-03 00:00:00 Broad Trin 8 A
2021-10-04 00:00:00 Broad Lucia 3 A
2021-10-04 00:00:00 TV Anti 1 A
2021-10-04 00:00:00 Broad Cur 8 A
2021-10-04 00:00:00 Voice Barb 0 A
2021-10-04 00:00:00 TV Vin 5 A
2021-10-04 00:00:00 Voice Vin 1 A
2021-10-05 00:00:00 NAN NAN NAN NAN
2021-10-06 00:00:00 NAN NAN NAN NAN
2021-10-07 00:00:00 NAN NAN NAN NAN
2021-10-08 00:00:00 NAN NAN NAN NAN
2021-10-09 00:00:00 NAN NAN NAN NAN
2021-10-10 00:00:00 NAN NAN NAN NAN
2021-10-11 00:00:00 NAN NAN NAN NAN
2021-10-12 00:00:00 NAN NAN NAN NAN
2021-10-13 00:00:00 NAN NAN NAN NAN
2021-10-14 00:00:00 NAN NAN NAN NAN
2021-10-15 00:00:00 NAN NAN NAN NAN
...............
..............................etc
SAMPLE 结果格式:
week_start week_end product country name value(**average of values for each week**)
我尝试使用 groupby,但无法获取每个产品和国家/地区的周开始和结束时间。
此外,值应该是特定一周的平均值(加法)。
关于如何实现这一目标的任何帮助?
第一步是为每行日期的开始日期和结束日期创建一个新列。这可以通过使用 offsets.Week:
来完成
import pandas as pd
df['start'] = df['dates'] - pd.offsets.Week(weekday=6)
df['end'] = df['start'] + pd.offsets.Week(weekday=5)
从那里您可以使用 groupby 按开始、结束、产品和国家列进行分组,并对值列使用均值聚合方法:
df.groupby(['start','end','product','country']).agg({'value': 'mean'}).reset_index()
我有不同月份的每日数据集。 我想根据每种产品类型和国家/地区计算周开始(星期日)和周末(星期六),值应该是该特定周的平均值。
dates product country value name
2021-10-01 00:00:00 Voice Lucia 2 A
2021-10-01 00:00:00 TV Jamai 1 A
2021-10-01 00:00:00 TV Trin 5 A
2021-10-01 00:00:00 Voice Gren 5 A
2021-10-01 00:00:00 Broad Vin 7 A
2021-10-01 00:00:00 TV Gren 8 A
2021-10-01 00:00:00 Broad Barb 5 A
2021-10-01 00:00:00 Voice Jamai 23 A
2021-10-01 00:00:00 Voice Trin 6 A
2021-10-01 00:00:00 TV Cur 7 A
2021-10-02 00:00:00 Broad Jamai 2 A
2021-10-03 00:00:00 Broad Trin 8 A
2021-10-04 00:00:00 Broad Lucia 3 A
2021-10-04 00:00:00 TV Anti 1 A
2021-10-04 00:00:00 Broad Cur 8 A
2021-10-04 00:00:00 Voice Barb 0 A
2021-10-04 00:00:00 TV Vin 5 A
2021-10-04 00:00:00 Voice Vin 1 A
2021-10-05 00:00:00 NAN NAN NAN NAN
2021-10-06 00:00:00 NAN NAN NAN NAN
2021-10-07 00:00:00 NAN NAN NAN NAN
2021-10-08 00:00:00 NAN NAN NAN NAN
2021-10-09 00:00:00 NAN NAN NAN NAN
2021-10-10 00:00:00 NAN NAN NAN NAN
2021-10-11 00:00:00 NAN NAN NAN NAN
2021-10-12 00:00:00 NAN NAN NAN NAN
2021-10-13 00:00:00 NAN NAN NAN NAN
2021-10-14 00:00:00 NAN NAN NAN NAN
2021-10-15 00:00:00 NAN NAN NAN NAN
...............
..............................etc
SAMPLE 结果格式:
week_start week_end product country name value(**average of values for each week**)
我尝试使用 groupby,但无法获取每个产品和国家/地区的周开始和结束时间。
此外,值应该是特定一周的平均值(加法)。 关于如何实现这一目标的任何帮助?
第一步是为每行日期的开始日期和结束日期创建一个新列。这可以通过使用 offsets.Week:
来完成import pandas as pd
df['start'] = df['dates'] - pd.offsets.Week(weekday=6)
df['end'] = df['start'] + pd.offsets.Week(weekday=5)
从那里您可以使用 groupby 按开始、结束、产品和国家列进行分组,并对值列使用均值聚合方法:
df.groupby(['start','end','product','country']).agg({'value': 'mean'}).reset_index()