如何按月和年分组,然后在 Pandas 中求和?

How to groupby Month and Year and then sum total in Pandas?

在我的数据集中,我有 3 列...我希望按月和年分组...但是我也希望按名称分组并对价格求和。

这是我制作的模拟数据集:

import pandas as pd

# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C', 'A', 'C', 'B'], 
'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', 
'03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'], 
'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)
totalSum = df.groupby([df['Date'].dt.year, df['Date'].df.month]).agg({'Price':sum})

输出应该类似于:

06/2019
    A   56
    B   120
    C   48
03/2020
    A   12
    B   94
...

等等

很好的尝试,但是列名需要分组,而不是像你试过的那样的值。因此,我更新了 df['Date']' 的形式,例如 '2019-06',并使用 'Date' 和 'Name' 列作为 groupby,如下所示:

import pandas as pd

# initialise data of lists.
data = {
    'Name': ['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'A', 'C', 'B'],
    'Date': ['06/01/19', '06/11/19', '06/25/19', '06/05/19', '06/02/19', '06/13/19', '06/21/19', '03/09/20', '03/17/20', '03/22/20', '06/30/20', '06/22/20', '06/10/20', '07/05/20', '07/25/20', '07/21/20'],
    'Price': [10, 27, 8, 10, 38, 38, 93, 12, 55, 39, 52, 62, 25, 10, 39, 37]
}

# Create DataFrame
df = pd.DataFrame(data)

# to make 06/01/19 to 06/01/2019
df['Date'] = df['Date'].apply(lambda x: x[:6]+'20'+x[6:])  

# to remove Day and leave only Year-Month
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y').apply(lambda x: x.strftime('%Y-%m'))  

# Print the output.
totalSum = df.groupby(by=['Date', 'Name']).agg({'Price': sum})
print(totalSum)
#              Price
#Date    Name       
#2019-06 A        56
#        B       120
#        C        48
#2020-03 A        12
#        B        94
#2020-06 B        52
#        C        87
#2020-07 A        10
#        B        37
#        C        39

你就快完成了,但要让它工作,你需要先调用 pandas' to_datetime() 方法两次,以根据 'Date' 和使用 'Name' 作为 groupbby 调用的附加参数:

totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
                       pd.to_datetime(df['Date']).dt.month,
                       'Name']).agg({'Price': sum})
totalSum
Out[17]: 
                Price
Date Date Name       
2019 6    A        56
          B       120
          C        48
2020 3    A        12
          B        94
     6    B        52
          C        87
     7    A        10
          B        37
          C        39