按周分组数据 Pandas

Group data by week in Pandas

我有这个数据框:

Name        Date      author
Apple   2022-03-15    sahil_1
Orange  2022-03-16    sahil_2
Apple   2022-03-17    sahil_3
Orange  2022-03-18    sahil_1
Apple   2022-03-19    sahil_2
Banana  2022-03-20    sahil_3
Apple   2019-12-19    sahil_3
Orange  2004-01-07    sahil_1

我想按名称和日期(每周)进行汇总以获得记录数。

日期:分组,结果应该是周初(或者刚好是周一)

计数:添加,如果两个或多个记录具有相同的名称并且在同一周日期时间(如果在相同的间隔 7 周内)

所需的输出如下:

Name        Date      count
Apple    2019-12-16    1
Apple    2022-03-14    3

Banana   2022-03-14    1

Orange   2004-01-05    1
Orange   2022-03-14    2

注意 - 结果中的所有日期都是星期一或一周的第一天。
如果可能,结果应按升序(递增)顺序排序,每个名称的日期也应排序。

提前致谢。

不知道如何进行下一步。

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)

感谢@Pedrinho 的快速帮助。我稍微修改了代码并得到了我想要的结果。但不确定这样做是否正确。

解决方案代码-

import pandas as pd 

Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]

df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')

df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()

result = []

for group_id, group_df in df:
    res = {}
    res['Name'] = group_id[0]
    res['Week'] = str(group_id[1])[:-9]  
    res['count'] =  group_df['Name'].count()
    result.append(res)

print(f"Result df is: {result}")

控制台o/p-

Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]