按月分组,按列对行求和,保留其他列
Group by month, sum rows based in column, and keep the other columns
我有一个 DataFrame df 如下:
|size | date | name | type | revenue |
|10 | 13/12/2021 | A | Standard | 0,2 |
|248743 | 15/12/2021 | A | Standard | 0,2 |
|234 | 03/12/2022 | A | Basic | 0,1 |
|8734684 | 31/03/2022 | B | Basic | 0,1 |
|3589749 | 01/04/2021 | C | Basic | 0,4 |
|3356943 | 02/04/2021 | A | Basic | 0,1 |
|6908746 | 21/04/2021 | A | Basic | 0,1 |
|2375940 | 21/02/2022 | D | Premium | 0,7 |
|9387295 | 21/02/2022 | D | Premium | 0,7 |
|286432 | 21/02/2022 | D | Premium | 0,7 |
|192 | 31/03/2022 | D | Premium | 0,7 |
|486 | 18/02/2022 | E | Standard | 0,9 |
|23847 | 24/10/2021 | F | Basic | 0,3 |
|82346 | 12/11/2021 | B | Premium | 0,5 |
|28352 | 03/01/2022 | A | Basic | 0,1 |
我需要按月份对名称和类型相同的行的大小总和进行分组:
|size | date | name | type | revenue |
|28352 | Jan | A | Basic | 0,1 |
|486 | Feb | E | Standard | 0,9 |
|12049667| Feb | D | Premium | 0,7 |
|192 | Mar | D | Premium | 0,7 |
|8734684 | Mar | B | Basic | 0,1 |
|3589749 | Apr | C | Basic | 0,4 |
|10265689| Apr | A | Basic | 0,1 |
|23847 | Oct | F | Basic | 0,3 |
|82346 | Nov | B | Premium | 0,5 |
|248753 | Dec | A | Standard | 0,2 |
|234 | Dec | A | Basic | 0,1 |
我试过这段代码,但没有用:
df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()
我该怎么做?
IIUC,你需要一个groupby
。您需要将“收入”列重新设置为数字。
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
group = df['date'].dt.strftime('%b')
(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
.groupby([group, 'name', 'type'])
.agg('sum')
.reset_index()
)
输出:
date name type size revenue
0 Apr A Basic 6908746 0.1
1 Dec A Standard 248753 0.4
2 Dec B Premium 82346 0.5
3 Feb A Basic 3356943 0.1
4 Feb D Premium 12049667 2.1
5 Feb E Standard 486 0.9
6 Jan C Basic 3589749 0.4
7 Mar A Basic 28586 0.2
8 Mar B Basic 8734684 0.1
9 Mar D Premium 192 0.7
10 Oct F Basic 23847 0.3
请注意,以上是将不同年份的月份汇总到同一组中。如果要将年份分开,请使用句点:
group = df['date'].dt.to_period('M')
输出:
date name type size revenue
0 2021-01 C Basic 3589749 0.4
1 2021-02 A Basic 3356943 0.1
2 2021-04 A Basic 6908746 0.1
3 2021-10 F Basic 23847 0.3
4 2021-12 A Standard 248753 0.4
5 2021-12 B Premium 82346 0.5
6 2022-02 D Premium 12049667 2.1
7 2022-02 E Standard 486 0.9
8 2022-03 A Basic 28586 0.2
9 2022-03 B Basic 8734684 0.1
10 2022-03 D Premium 192 0.7
我有一个 DataFrame df 如下:
|size | date | name | type | revenue |
|10 | 13/12/2021 | A | Standard | 0,2 |
|248743 | 15/12/2021 | A | Standard | 0,2 |
|234 | 03/12/2022 | A | Basic | 0,1 |
|8734684 | 31/03/2022 | B | Basic | 0,1 |
|3589749 | 01/04/2021 | C | Basic | 0,4 |
|3356943 | 02/04/2021 | A | Basic | 0,1 |
|6908746 | 21/04/2021 | A | Basic | 0,1 |
|2375940 | 21/02/2022 | D | Premium | 0,7 |
|9387295 | 21/02/2022 | D | Premium | 0,7 |
|286432 | 21/02/2022 | D | Premium | 0,7 |
|192 | 31/03/2022 | D | Premium | 0,7 |
|486 | 18/02/2022 | E | Standard | 0,9 |
|23847 | 24/10/2021 | F | Basic | 0,3 |
|82346 | 12/11/2021 | B | Premium | 0,5 |
|28352 | 03/01/2022 | A | Basic | 0,1 |
我需要按月份对名称和类型相同的行的大小总和进行分组:
|size | date | name | type | revenue |
|28352 | Jan | A | Basic | 0,1 |
|486 | Feb | E | Standard | 0,9 |
|12049667| Feb | D | Premium | 0,7 |
|192 | Mar | D | Premium | 0,7 |
|8734684 | Mar | B | Basic | 0,1 |
|3589749 | Apr | C | Basic | 0,4 |
|10265689| Apr | A | Basic | 0,1 |
|23847 | Oct | F | Basic | 0,3 |
|82346 | Nov | B | Premium | 0,5 |
|248753 | Dec | A | Standard | 0,2 |
|234 | Dec | A | Basic | 0,1 |
我试过这段代码,但没有用:
df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()
我该怎么做?
IIUC,你需要一个groupby
。您需要将“收入”列重新设置为数字。
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
group = df['date'].dt.strftime('%b')
(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
.groupby([group, 'name', 'type'])
.agg('sum')
.reset_index()
)
输出:
date name type size revenue
0 Apr A Basic 6908746 0.1
1 Dec A Standard 248753 0.4
2 Dec B Premium 82346 0.5
3 Feb A Basic 3356943 0.1
4 Feb D Premium 12049667 2.1
5 Feb E Standard 486 0.9
6 Jan C Basic 3589749 0.4
7 Mar A Basic 28586 0.2
8 Mar B Basic 8734684 0.1
9 Mar D Premium 192 0.7
10 Oct F Basic 23847 0.3
请注意,以上是将不同年份的月份汇总到同一组中。如果要将年份分开,请使用句点:
group = df['date'].dt.to_period('M')
输出:
date name type size revenue
0 2021-01 C Basic 3589749 0.4
1 2021-02 A Basic 3356943 0.1
2 2021-04 A Basic 6908746 0.1
3 2021-10 F Basic 23847 0.3
4 2021-12 A Standard 248753 0.4
5 2021-12 B Premium 82346 0.5
6 2022-02 D Premium 12049667 2.1
7 2022-02 E Standard 486 0.9
8 2022-03 A Basic 28586 0.2
9 2022-03 B Basic 8734684 0.1
10 2022-03 D Premium 192 0.7