按周分组数据 Pandas
Group data by week in Pandas
我有这个数据框:
Name Date author
Apple 2022-03-15 sahil_1
Orange 2022-03-16 sahil_2
Apple 2022-03-17 sahil_3
Orange 2022-03-18 sahil_1
Apple 2022-03-19 sahil_2
Banana 2022-03-20 sahil_3
Apple 2019-12-19 sahil_3
Orange 2004-01-07 sahil_1
我想按名称和日期(每周)进行汇总以获得记录数。
日期:分组,结果应该是周初(或者刚好是周一)
计数:添加,如果两个或多个记录具有相同的名称并且在同一周日期时间(如果在相同的间隔 7 周内)
所需的输出如下:
Name Date count
Apple 2019-12-16 1
Apple 2022-03-14 3
Banana 2022-03-14 1
Orange 2004-01-05 1
Orange 2022-03-14 2
注意 - 结果中的所有日期都是星期一或一周的第一天。
如果可能,结果应按升序(递增)顺序排序,每个名称的日期也应排序。
提前致谢。
不知道如何进行下一步。
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)
感谢@Pedrinho 的快速帮助。我稍微修改了代码并得到了我想要的结果。但不确定这样做是否正确。
解决方案代码-
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()
result = []
for group_id, group_df in df:
res = {}
res['Name'] = group_id[0]
res['Week'] = str(group_id[1])[:-9]
res['count'] = group_df['Name'].count()
result.append(res)
print(f"Result df is: {result}")
控制台o/p-
Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]
我有这个数据框:
Name Date author Apple 2022-03-15 sahil_1 Orange 2022-03-16 sahil_2 Apple 2022-03-17 sahil_3 Orange 2022-03-18 sahil_1 Apple 2022-03-19 sahil_2 Banana 2022-03-20 sahil_3 Apple 2019-12-19 sahil_3 Orange 2004-01-07 sahil_1
我想按名称和日期(每周)进行汇总以获得记录数。
日期:分组,结果应该是周初(或者刚好是周一)
计数:添加,如果两个或多个记录具有相同的名称并且在同一周日期时间(如果在相同的间隔 7 周内)
所需的输出如下:
Name Date count Apple 2019-12-16 1 Apple 2022-03-14 3 Banana 2022-03-14 1 Orange 2004-01-05 1 Orange 2022-03-14 2
注意 - 结果中的所有日期都是星期一或一周的第一天。
如果可能,结果应按升序(递增)顺序排序,每个名称的日期也应排序。
提前致谢。
不知道如何进行下一步。
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
x = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Name'].count()
print(x)
感谢@Pedrinho 的快速帮助。我稍微修改了代码并得到了我想要的结果。但不确定这样做是否正确。
解决方案代码-
import pandas as pd
Name = ["Apple", "Orange", "Apple", "Orange", "Apple", "Banana", "Apple","Orange"]
Date = ["2022-03-15","2022-03-16","2022-03-17","2022-03-18","2022-03-19","2022-03-20","2019-12-19","2004-01-07"]
author = ["sahil_1","sahil_2","sahil_3","sahil_1","sahil_2","sahil_3","sahil_3","sahil_1"]
df = pd.DataFrame(zip(Name,Date,author), columns=["Name", "Date", "Author"])
df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])#['Name'].count()
result = []
for group_id, group_df in df:
res = {}
res['Name'] = group_id[0]
res['Week'] = str(group_id[1])[:-9]
res['count'] = group_df['Name'].count()
result.append(res)
print(f"Result df is: {result}")
控制台o/p-
Result df is: [{'Name': 'Apple', 'Week': '2019-12-16', 'count': 1}, {'Name': 'Apple', 'Week': '2022-03-14', 'count': 3}, {'Name': 'Banana', 'Week': '2022-03-14', 'count': 1}, {'Name': 'Orange', 'Week': '2004-01-05', 'count': 1}, {'Name': 'Orange', 'Week': '2022-03-14', 'count': 2}]