使用计数将长格式转换为宽格式的简单方法
Easy way to convert long to wide format with counts
我有以下数据集:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
Case = c(1,1,1,1,2,2,3,3,3,4,5),
Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"))
sample.data
Step Case Decision
1 1 1 Referred
2 2 1 Referred
3 3 1 Referred
4 4 1 Approved
5 1 2 Referred
6 2 2 Declined
7 1 3 Referred
8 2 3 Referred
9 3 3 Declined
10 1 4 Approved
11 1 5 Declined
是否可以在 R 中将其转换为宽 table 格式,并根据 header 做出决定,并且每个单元格的值是出现次数,例如:
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
您可以使用简单的 table()
语句完成此操作。您可以设置因子水平,按照您想要的方式获得响应。
sample.data$Decision <- factor(x = sample.data$Decision,
levels = c("Referred","Approved","Declined"))
table(Case = sample.data$Case,sample.data$Decision)
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
reshape2
包的dcast
函数中的聚合参数默认为length
(=计数)。在 data.table
包中实现了 dcast
函数的改进版本。所以在你的情况下,这将是:
library('reshape2') # or library('data.table')
newdf <- dcast(sample.data, Case ~ Decision)
或显式使用参数:
newdf <- dcast(sample.data, Case ~ Decision,
value.var = "Decision", fun.aggregate = length)
这给出了以下数据框:
> newdf
Case Approved Declined Referred
1 1 1 0 3
2 2 0 1 1
3 3 0 1 2
4 4 1 0 0
5 5 0 1 0
If you don't specify an aggregation function, you get a warning telling you that dcast
is using lenght
as a default.
这是一个 dplyr + tidyr 方法:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, tidyr)
sample.data %>%
count(Case, Decision) %>%
spread(Decision, n, fill = 0)
## Case Approved Declined Referred
## (dbl) (dbl) (dbl) (dbl)
## 1 1 1 0 3
## 2 2 0 1 1
## 3 3 0 1 2
## 4 4 1 0 0
## 5 5 0 1 0
我们可以使用base R
xtabs
xtabs(Step~Case+Decision, transform(sample.data, Step=1))
# Decision
# Case Approved Declined Referred
# 1 1 0 3
# 2 0 1 1
# 3 0 1 2
# 4 1 0 0
# 5 0 1 0
我有以下数据集:
sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
Case = c(1,1,1,1,2,2,3,3,3,4,5),
Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"))
sample.data
Step Case Decision
1 1 1 Referred
2 2 1 Referred
3 3 1 Referred
4 4 1 Approved
5 1 2 Referred
6 2 2 Declined
7 1 3 Referred
8 2 3 Referred
9 3 3 Declined
10 1 4 Approved
11 1 5 Declined
是否可以在 R 中将其转换为宽 table 格式,并根据 header 做出决定,并且每个单元格的值是出现次数,例如:
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
您可以使用简单的 table()
语句完成此操作。您可以设置因子水平,按照您想要的方式获得响应。
sample.data$Decision <- factor(x = sample.data$Decision,
levels = c("Referred","Approved","Declined"))
table(Case = sample.data$Case,sample.data$Decision)
Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1
reshape2
包的dcast
函数中的聚合参数默认为length
(=计数)。在 data.table
包中实现了 dcast
函数的改进版本。所以在你的情况下,这将是:
library('reshape2') # or library('data.table')
newdf <- dcast(sample.data, Case ~ Decision)
或显式使用参数:
newdf <- dcast(sample.data, Case ~ Decision,
value.var = "Decision", fun.aggregate = length)
这给出了以下数据框:
> newdf
Case Approved Declined Referred
1 1 1 0 3
2 2 0 1 1
3 3 0 1 2
4 4 1 0 0
5 5 0 1 0
If you don't specify an aggregation function, you get a warning telling you that dcast
is using lenght
as a default.
这是一个 dplyr + tidyr 方法:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, tidyr)
sample.data %>%
count(Case, Decision) %>%
spread(Decision, n, fill = 0)
## Case Approved Declined Referred
## (dbl) (dbl) (dbl) (dbl)
## 1 1 1 0 3
## 2 2 0 1 1
## 3 3 0 1 2
## 4 4 1 0 0
## 5 5 0 1 0
我们可以使用base R
xtabs
xtabs(Step~Case+Decision, transform(sample.data, Step=1))
# Decision
# Case Approved Declined Referred
# 1 1 0 3
# 2 0 1 1
# 3 0 1 2
# 4 1 0 0
# 5 0 1 0