R基于列值聚合数据框
R Aggregate data frame based on column values
我有一个如下所示的数据集:
> newex
Name Volume Period
1 oil 29000 Jun 21
2 gold 800 Mar 22
3 oil 21000 Jul 21
4 gold 1100 Sep 21
5 gold 3000 Feb 21
6 depower 3 Q1 21
7 oil 23000 Apr 22
8 czpower 26 Q1 23
9 oil 17000 Q1 21
10 gold 2400 May 21
11 oil 12000 Q2 21
12 gold 1800 Jan 22
13 czpower 21 Oct 21
14 api2coal 6000 Q1 22
15 api2coal 11000 Q1 21
16 depower 11 Jan 22
17 api2coal 16000 Jul 21
18 gold 1300 Mar 21
19 depower 3 Q1 22
20 oil 17000 Cal 21
我想重塑数据集以获得具有以下特征的数据框:
Name
中的值将成为新的变量(列);
Period
中的值将成为索引(应该是唯一的);
Volume
中的值是Name
和Period
的每个组合的值之和。
有人可以给我一个关于如何实现这一点的提示吗?提前谢谢你。
我们可以使用 pivot_wider
:
library(dplyr)
library(tidyr)
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume
)
Period oil gold depower czpower api2coal
<chr> <int> <int> <int> <int> <int>
1 Jun 21 29000 NA NA NA NA
2 Mar 22 NA 800 NA NA NA
3 Jul 21 21000 NA NA NA 16000
4 Sep 21 NA 1100 NA NA NA
5 Feb 21 NA 3000 NA NA NA
6 Q1 21 17000 NA 3 NA 11000
7 Apr 22 23000 NA NA NA NA
8 Q1 23 NA NA NA 26 NA
9 May 21 NA 2400 NA NA NA
10 Q2 21 12000 NA NA NA NA
11 Jan 22 NA 1800 11 NA NA
12 Oct 21 NA NA NA 21 NA
13 Q1 22 NA NA 3 NA 6000
14 Mar 21 NA 1300 NA NA NA
15 Cal 21 17000 NA NA NA NA
数据:
newex <- structure(list(Name = c("oil", "gold", "oil", "gold", "gold",
"depower", "oil", "czpower", "oil", "gold", "oil", "gold", "czpower",
"api2coal", "api2coal", "depower", "api2coal", "gold", "depower",
"oil"), Volume = c(29000L, 800L, 21000L, 1100L, 3000L, 3L, 23000L,
26L, 17000L, 2400L, 12000L, 1800L, 21L, 6000L, 11000L, 11L, 16000L,
1300L, 3L, 17000L), Period = c("Jun 21", "Mar 22", "Jul 21",
"Sep 21", "Feb 21", "Q1 21", "Apr 22", "Q1 23", "Q1 21", "May 21",
"Q2 21", "Jan 22", "Oct 21", "Q1 22", "Q1 21", "Jan 22", "Jul 21",
"Mar 21", "Q1 22", "Cal 21")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))
我在尝试之前的答案后找到了答案:
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume, values_fn = sum)
library(data.table)
dcast(setDT(newex),Period~Name, value.var="Volume",fun.aggregate = sum)
输出:
Period api2coal czpower depower gold oil
1: Apr 22 0 0 0 0 23000
2: Cal 21 0 0 0 0 17000
3: Feb 21 0 0 0 3000 0
4: Jan 22 0 0 11 1800 0
5: Jul 21 16000 0 0 0 21000
6: Jun 21 0 0 0 0 29000
7: Mar 21 0 0 0 1300 0
8: Mar 22 0 0 0 800 0
9: May 21 0 0 0 2400 0
10: Oct 21 0 21 0 0 0
11: Q1 21 11000 0 3 0 17000
12: Q1 22 6000 0 3 0 0
13: Q1 23 0 26 0 0 0
14: Q2 21 0 0 0 0 12000
15: Sep 21 0 0 0 1100 0
我有一个如下所示的数据集:
> newex
Name Volume Period
1 oil 29000 Jun 21
2 gold 800 Mar 22
3 oil 21000 Jul 21
4 gold 1100 Sep 21
5 gold 3000 Feb 21
6 depower 3 Q1 21
7 oil 23000 Apr 22
8 czpower 26 Q1 23
9 oil 17000 Q1 21
10 gold 2400 May 21
11 oil 12000 Q2 21
12 gold 1800 Jan 22
13 czpower 21 Oct 21
14 api2coal 6000 Q1 22
15 api2coal 11000 Q1 21
16 depower 11 Jan 22
17 api2coal 16000 Jul 21
18 gold 1300 Mar 21
19 depower 3 Q1 22
20 oil 17000 Cal 21
我想重塑数据集以获得具有以下特征的数据框:
Name
中的值将成为新的变量(列);Period
中的值将成为索引(应该是唯一的);Volume
中的值是Name
和Period
的每个组合的值之和。
有人可以给我一个关于如何实现这一点的提示吗?提前谢谢你。
我们可以使用 pivot_wider
:
library(dplyr)
library(tidyr)
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume
)
Period oil gold depower czpower api2coal
<chr> <int> <int> <int> <int> <int>
1 Jun 21 29000 NA NA NA NA
2 Mar 22 NA 800 NA NA NA
3 Jul 21 21000 NA NA NA 16000
4 Sep 21 NA 1100 NA NA NA
5 Feb 21 NA 3000 NA NA NA
6 Q1 21 17000 NA 3 NA 11000
7 Apr 22 23000 NA NA NA NA
8 Q1 23 NA NA NA 26 NA
9 May 21 NA 2400 NA NA NA
10 Q2 21 12000 NA NA NA NA
11 Jan 22 NA 1800 11 NA NA
12 Oct 21 NA NA NA 21 NA
13 Q1 22 NA NA 3 NA 6000
14 Mar 21 NA 1300 NA NA NA
15 Cal 21 17000 NA NA NA NA
数据:
newex <- structure(list(Name = c("oil", "gold", "oil", "gold", "gold",
"depower", "oil", "czpower", "oil", "gold", "oil", "gold", "czpower",
"api2coal", "api2coal", "depower", "api2coal", "gold", "depower",
"oil"), Volume = c(29000L, 800L, 21000L, 1100L, 3000L, 3L, 23000L,
26L, 17000L, 2400L, 12000L, 1800L, 21L, 6000L, 11000L, 11L, 16000L,
1300L, 3L, 17000L), Period = c("Jun 21", "Mar 22", "Jul 21",
"Sep 21", "Feb 21", "Q1 21", "Apr 22", "Q1 23", "Q1 21", "May 21",
"Q2 21", "Jan 22", "Oct 21", "Q1 22", "Q1 21", "Jan 22", "Jul 21",
"Mar 21", "Q1 22", "Cal 21")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))
我在尝试之前的答案后找到了答案:
newex %>%
pivot_wider(
names_from = Name,
values_from = Volume, values_fn = sum)
library(data.table)
dcast(setDT(newex),Period~Name, value.var="Volume",fun.aggregate = sum)
输出:
Period api2coal czpower depower gold oil
1: Apr 22 0 0 0 0 23000
2: Cal 21 0 0 0 0 17000
3: Feb 21 0 0 0 3000 0
4: Jan 22 0 0 11 1800 0
5: Jul 21 16000 0 0 0 21000
6: Jun 21 0 0 0 0 29000
7: Mar 21 0 0 0 1300 0
8: Mar 22 0 0 0 800 0
9: May 21 0 0 0 2400 0
10: Oct 21 0 21 0 0 0
11: Q1 21 11000 0 3 0 17000
12: Q1 22 6000 0 3 0 0
13: Q1 23 0 26 0 0 0
14: Q2 21 0 0 0 0 12000
15: Sep 21 0 0 0 1100 0