使用 amount 变量和几个 ID 变量有效地重塑长到宽
Reshape long to wide efficiently with amount variable and several ID variables
我的数据类似于下面第一张图表的大得多的版本。我想将它“解开”到第二张图表中,但我无法有效地做到这一点。在底部,我有我最近的尝试,其中 IDVars 基本上是下面的前三列。它 运行 持续了 15 分钟,然后我才需要杀死它。
Name
ID
Trial
Variable
Amount
Name 1
1
1
FinalSalary
300.00
Name 1
1
1
FinalDCBalance
400.00
Name 1
1
2
FinalSalary
300.00
Name 1
1
2
FinalDCBalance
300.00
Name 2
2
1
FinalSalary
400.00
Name 2
2
1
FinalDCBalance
400.00
Name 2
2
2
FinalSalary
200.00
Name 2
2
2
FinalDCBalance
300.00
Name 3
3
1
FinalSalary
100.00
Name 3
3
2
FinalDCBalance
400.00
Name
ID
Trial
FinalSalary
FinalDCBalance
Name 1
1
1
300
400
Name 1
1
2
300
300
Name 2
2
1
400
400
Name 2
2
2
200
300
Name 3
3
1
100
400
Name 3
3
2
300
100
unmelt <- reshape(dataframe, idvar = IDVars, v.names = 'variable', direction = 'wide', timevar = 'Amount')
我们可以使用pivot_wider
library(tidyr)
pivot_wider(df1, names_from = 'Variable', values_from = 'Amount')
timevar=
应该是 "Variable"
,而不是 "Amount"
。 idvar 列在旁边,timevar 列在顶部,其他所有内容(金额)作为值进入输出正文。 v.names = "Amount"
可以指定,但它会计算出来,因为这是唯一剩下的列,所以我们省略了它。
r <- reshape(dd, dir = "wide", idvar = c("Name", "ID", "Trial"), timevar = "Variable")
names(r) <- sub("Amount.", "", names(r)) # optional
给予:
> r
Name ID Trial FinalSalary FinalDCBalance
1 Name 1 1 1 300 400
3 Name 1 1 2 300 300
5 Name 2 2 1 400 400
7 Name 2 2 2 200 300
9 Name 3 3 1 100 NA
10 Name 3 3 2 NA 400
备注
可重现形式的输入:
dd <- structure(list(Name = c("Name 1", "Name 1", "Name 1", "Name 1",
"Name 2", "Name 2", "Name 2", "Name 2", "Name 3", "Name 3"),
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), Trial = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L), Variable = c("FinalSalary",
"FinalDCBalance", "FinalSalary", "FinalDCBalance", "FinalSalary",
"FinalDCBalance", "FinalSalary", "FinalDCBalance", "FinalSalary",
"FinalDCBalance"), Amount = c(300, 400, 300, 300, 400, 400,
200, 300, 100, 400)), class = "data.frame", row.names = c(NA,
-10L))
我的数据类似于下面第一张图表的大得多的版本。我想将它“解开”到第二张图表中,但我无法有效地做到这一点。在底部,我有我最近的尝试,其中 IDVars 基本上是下面的前三列。它 运行 持续了 15 分钟,然后我才需要杀死它。
Name | ID | Trial | Variable | Amount |
---|---|---|---|---|
Name 1 | 1 | 1 | FinalSalary | 300.00 |
Name 1 | 1 | 1 | FinalDCBalance | 400.00 |
Name 1 | 1 | 2 | FinalSalary | 300.00 |
Name 1 | 1 | 2 | FinalDCBalance | 300.00 |
Name 2 | 2 | 1 | FinalSalary | 400.00 |
Name 2 | 2 | 1 | FinalDCBalance | 400.00 |
Name 2 | 2 | 2 | FinalSalary | 200.00 |
Name 2 | 2 | 2 | FinalDCBalance | 300.00 |
Name 3 | 3 | 1 | FinalSalary | 100.00 |
Name 3 | 3 | 2 | FinalDCBalance | 400.00 |
Name | ID | Trial | FinalSalary | FinalDCBalance |
---|---|---|---|---|
Name 1 | 1 | 1 | 300 | 400 |
Name 1 | 1 | 2 | 300 | 300 |
Name 2 | 2 | 1 | 400 | 400 |
Name 2 | 2 | 2 | 200 | 300 |
Name 3 | 3 | 1 | 100 | 400 |
Name 3 | 3 | 2 | 300 | 100 |
unmelt <- reshape(dataframe, idvar = IDVars, v.names = 'variable', direction = 'wide', timevar = 'Amount')
我们可以使用pivot_wider
library(tidyr)
pivot_wider(df1, names_from = 'Variable', values_from = 'Amount')
timevar=
应该是 "Variable"
,而不是 "Amount"
。 idvar 列在旁边,timevar 列在顶部,其他所有内容(金额)作为值进入输出正文。 v.names = "Amount"
可以指定,但它会计算出来,因为这是唯一剩下的列,所以我们省略了它。
r <- reshape(dd, dir = "wide", idvar = c("Name", "ID", "Trial"), timevar = "Variable")
names(r) <- sub("Amount.", "", names(r)) # optional
给予:
> r
Name ID Trial FinalSalary FinalDCBalance
1 Name 1 1 1 300 400
3 Name 1 1 2 300 300
5 Name 2 2 1 400 400
7 Name 2 2 2 200 300
9 Name 3 3 1 100 NA
10 Name 3 3 2 NA 400
备注
可重现形式的输入:
dd <- structure(list(Name = c("Name 1", "Name 1", "Name 1", "Name 1",
"Name 2", "Name 2", "Name 2", "Name 2", "Name 3", "Name 3"),
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), Trial = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L), Variable = c("FinalSalary",
"FinalDCBalance", "FinalSalary", "FinalDCBalance", "FinalSalary",
"FinalDCBalance", "FinalSalary", "FinalDCBalance", "FinalSalary",
"FinalDCBalance"), Amount = c(300, 400, 300, 300, 400, 400,
200, 300, 100, 400)), class = "data.frame", row.names = c(NA,
-10L))