如何按月和年分组,然后在 Pandas 中求和?
How to groupby Month and Year and then sum total in Pandas?
在我的数据集中,我有 3 列...我希望按月和年分组...但是我也希望按名称分组并对价格求和。
这是我制作的模拟数据集:
import pandas as pd
# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20',
'03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
totalSum = df.groupby([df['Date'].dt.year, df['Date'].df.month]).agg({'Price':sum})
输出应该类似于:
06/2019
A 56
B 120
C 48
03/2020
A 12
B 94
...
等等
很好的尝试,但是列名需要分组,而不是像你试过的那样的值。因此,我更新了 df['Date']' 的形式,例如 '2019-06',并使用 'Date' 和 'Name' 列作为 groupby,如下所示:
import pandas as pd
# initialise data of lists.
data = {
'Name': ['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', '03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]
}
# Create DataFrame
df = pd.DataFrame(data)
# to make 06/01/19 to 06/01/2019
df['Date'] = df['Date'].apply(lambda x: x[:6]+'20'+x[6:])
# to remove Day and leave only Year-Month
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y').apply(lambda x: x.strftime('%Y-%m'))
# Print the output.
totalSum = df.groupby(by=['Date', 'Name']).agg({'Price': sum})
print(totalSum)
# Price
#Date Name
#2019-06 A 56
# B 120
# C 48
#2020-03 A 12
# B 94
#2020-06 B 52
# C 87
#2020-07 A 10
# B 37
# C 39
你就快完成了,但要让它工作,你需要先调用 pandas' to_datetime()
方法两次,以根据 'Date' 和使用 'Name' 作为 groupbby
调用的附加参数:
totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
pd.to_datetime(df['Date']).dt.month,
'Name']).agg({'Price': sum})
totalSum
Out[17]:
Price
Date Date Name
2019 6 A 56
B 120
C 48
2020 3 A 12
B 94
6 B 52
C 87
7 A 10
B 37
C 39
在我的数据集中,我有 3 列...我希望按月和年分组...但是我也希望按名称分组并对价格求和。
这是我制作的模拟数据集:
import pandas as pd
# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20',
'03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
totalSum = df.groupby([df['Date'].dt.year, df['Date'].df.month]).agg({'Price':sum})
输出应该类似于:
06/2019
A 56
B 120
C 48
03/2020
A 12
B 94
...
等等
很好的尝试,但是列名需要分组,而不是像你试过的那样的值。因此,我更新了 df['Date']' 的形式,例如 '2019-06',并使用 'Date' 和 'Name' 列作为 groupby,如下所示:
import pandas as pd
# initialise data of lists.
data = {
'Name': ['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'B'],
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', '03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]
}
# Create DataFrame
df = pd.DataFrame(data)
# to make 06/01/19 to 06/01/2019
df['Date'] = df['Date'].apply(lambda x: x[:6]+'20'+x[6:])
# to remove Day and leave only Year-Month
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y').apply(lambda x: x.strftime('%Y-%m'))
# Print the output.
totalSum = df.groupby(by=['Date', 'Name']).agg({'Price': sum})
print(totalSum)
# Price
#Date Name
#2019-06 A 56
# B 120
# C 48
#2020-03 A 12
# B 94
#2020-06 B 52
# C 87
#2020-07 A 10
# B 37
# C 39
你就快完成了,但要让它工作,你需要先调用 pandas' to_datetime()
方法两次,以根据 'Date' 和使用 'Name' 作为 groupbby
调用的附加参数:
totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
pd.to_datetime(df['Date']).dt.month,
'Name']).agg({'Price': sum})
totalSum
Out[17]:
Price
Date Date Name
2019 6 A 56
B 120
C 48
2020 3 A 12
B 94
6 B 52
C 87
7 A 10
B 37
C 39