在 R 中操作一个因子和类别
Manipulating a factor and category in R
所以我有一个数据集正在尝试操作,但我似乎无法找到正确的方法来执行此操作。我研究了使用 dcast 和传播但不确定如何获得正确的操作。
所以我有类似的东西:
ID var1 var2 var3 category
--------------------------
1 x x x a
1 x x x b
1 x x x b
2 y y y a
2 y y y b
2 y y y c
3 z z z b
3 z z z b
3 z z z c
我希望它看起来像这样:
ID var1 var2 var3 a b c
--------------------------------
1 x x x 1 1 0
2 y y y 1 1 1
3 z z z 0 1 1
简单的示例数据
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c('x','x','x','y','y','y','z','z','z')
var2 <- c('x','x','x','y','y','y','z','z','z')
var3 <- c('x','x','x','y','y','y','z','z','z')
category <- c('a','b','b','a','b','c','b','b','c')
dat <- data.frame(ID,var1,var2,var3,category)
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c("x","x","x","y","y","y","z","z","z")
var2 <- c("x","x","x","y","y","y","z","z","z")
var3 <- c("x","x","x","y","y","y","z","z","z")
category <- c("a","b","b","a","b","c","b","b","c")
dat <- data.frame(ID,var1,var2,var3,category)
library(tidyr)
library(dplyr)
dat %>%
distinct() %>% # get distinct rows
mutate(value = 1) %>% # create a counter
spread(category, value, fill=0) # reshape dataset
# ID var1 var2 var3 a b c
# 1 1 x x x 1 1 0
# 2 2 y y y 1 1 1
# 3 3 z z z 0 1 1
由于问题带有 dcast
标记,我觉得有必要 post 使用 dcast()
.
的简洁解决方案
OP 没有解释应该如何计算宽格式的列。从预期的结果来看,OP 似乎 不 对计算出现的次数感兴趣,而是指示每个唯一组合的存在或不存在(1
/0
代替 TRUE
/FALSE
)。
因此,整形操作中仅包含唯一行。 length()
仍用作聚合函数,因为它会根据要求用 0 填充空单元格。
library(reshape2)
dcast(unique(dat), ... ~ category, length)
ID var1 var2 var3 a b c
1 1 x x x 1 1 0
2 2 y y y 1 1 1
3 3 z z z 0 1 1
所以我有一个数据集正在尝试操作,但我似乎无法找到正确的方法来执行此操作。我研究了使用 dcast 和传播但不确定如何获得正确的操作。
所以我有类似的东西:
ID var1 var2 var3 category
--------------------------
1 x x x a
1 x x x b
1 x x x b
2 y y y a
2 y y y b
2 y y y c
3 z z z b
3 z z z b
3 z z z c
我希望它看起来像这样:
ID var1 var2 var3 a b c
--------------------------------
1 x x x 1 1 0
2 y y y 1 1 1
3 z z z 0 1 1
简单的示例数据
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c('x','x','x','y','y','y','z','z','z')
var2 <- c('x','x','x','y','y','y','z','z','z')
var3 <- c('x','x','x','y','y','y','z','z','z')
category <- c('a','b','b','a','b','c','b','b','c')
dat <- data.frame(ID,var1,var2,var3,category)
ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c("x","x","x","y","y","y","z","z","z")
var2 <- c("x","x","x","y","y","y","z","z","z")
var3 <- c("x","x","x","y","y","y","z","z","z")
category <- c("a","b","b","a","b","c","b","b","c")
dat <- data.frame(ID,var1,var2,var3,category)
library(tidyr)
library(dplyr)
dat %>%
distinct() %>% # get distinct rows
mutate(value = 1) %>% # create a counter
spread(category, value, fill=0) # reshape dataset
# ID var1 var2 var3 a b c
# 1 1 x x x 1 1 0
# 2 2 y y y 1 1 1
# 3 3 z z z 0 1 1
由于问题带有 dcast
标记,我觉得有必要 post 使用 dcast()
.
OP 没有解释应该如何计算宽格式的列。从预期的结果来看,OP 似乎 不 对计算出现的次数感兴趣,而是指示每个唯一组合的存在或不存在(1
/0
代替 TRUE
/FALSE
)。
因此,整形操作中仅包含唯一行。 length()
仍用作聚合函数,因为它会根据要求用 0 填充空单元格。
library(reshape2)
dcast(unique(dat), ... ~ category, length)
ID var1 var2 var3 a b c 1 1 x x x 1 1 0 2 2 y y y 1 1 1 3 3 z z z 0 1 1