在 dplyr 中改变虚拟变量
Mutating dummy variables in dplyr
我想创建 7 个虚拟变量 - 每天一个,使用 dplyr
到目前为止,我已经设法使用 sjmisc
包和 to_dummy
函数完成了它,但我分两步完成 -1.Create 一个 df of dummies, 2 ) 附加到原来的 df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")
#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)
> mydf
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
我的问题是我是否可以使用 dplyr
在一个工作流程中完成此操作并将 to_dummy
添加到管道工作流程中 - 也许使用 mutate
?
*to_dummy
documentation
使用 dummies()
的替代解决方案我认为会更快
mydf = data.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
library(dummies)
mydf <- cbind(mydf, dummy(mydf$day, sep = "_"))
这会产生
x day mydf_Fri mydf_Mon mydf_Sat mydf_Sun mydf_Thurs mydf_Tues mydf_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
然后你可以使用 gsub()
来获得更简洁的名字
names(mydf) = gsub("mydf_", "", names(mydf))
head(mydf)
x day Fri Mon Sat Sun Thurs Tues Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
如果你想用管道做这个,你可以这样做:
library(dplyr)
library(sjmisc)
mydf %>%
to_dummy(day, suffix = "label") %>%
bind_cols(mydf) %>%
select(x, day, everything())
Returns:
# A tibble: 9 x 9
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a Mon 0. 1. 0. 0. 0. 0. 0.
2 b Tues 0. 0. 0. 0. 0. 1. 0.
3 c Wed 0. 0. 0. 0. 0. 0. 1.
4 d Thurs 0. 0. 0. 0. 1. 0. 0.
5 e Fri 1. 0. 0. 0. 0. 0. 0.
6 f Sat 0. 0. 1. 0. 0. 0. 0.
7 g Sun 0. 0. 0. 1. 0. 0. 0.
8 h Fri 1. 0. 0. 0. 0. 0. 0.
9 i Mon 0. 1. 0. 0. 0. 0. 0.
有了 dplyr
和 tidyr
我们可以做到:
library(dplyr)
library(tidyr)
mydf %>%
mutate(var = 1) %>%
spread(day, var, fill = 0, sep = "_") %>%
left_join(mydf) %>%
select(x, day, everything())
使用 base R 我们可以做类似的事情:
as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
Returns:
Fri Mon Sat Sun Thurs Tues Wed
a 0 1 0 0 0 0 0
b 0 0 0 0 0 1 0
c 0 0 0 0 0 0 1
d 0 0 0 0 1 0 0
e 1 0 0 0 0 0 0
f 0 0 1 0 0 0 0
g 0 0 0 1 0 0 0
h 1 0 0 0 0 0 0
i 0 1 0 0 0 0 0
代替sjmisc::to_dummy
你也可以使用基数Rmodel.matrix
; dplyr
解决方案是:
library(dplyr);
model.matrix(~ 0 + day, mydf) %>%
as.data.frame() %>%
bind_cols(mydf) %>%
select(x, day, everything());
# x day dayFri dayMon daySat daySun dayThurs dayTues dayWed
#1 a Mon 0 1 0 0 0 0 0
#2 b Tues 0 0 0 0 0 1 0
#3 c Wed 0 0 0 0 0 0 1
#4 d Thurs 0 0 0 0 1 0 0
#5 e Fri 1 0 0 0 0 0 0
#6 f Sat 0 0 1 0 0 0 0
#7 g Sun 0 0 0 1 0 0 0
#8 h Fri 1 0 0 0 0 0 0
#9 i Mon 0 1 0 0 0 0 0
我想创建 7 个虚拟变量 - 每天一个,使用 dplyr
到目前为止,我已经设法使用 sjmisc
包和 to_dummy
函数完成了它,但我分两步完成 -1.Create 一个 df of dummies, 2 ) 附加到原来的 df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")
#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)
> mydf
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
我的问题是我是否可以使用 dplyr
在一个工作流程中完成此操作并将 to_dummy
添加到管道工作流程中 - 也许使用 mutate
?
*to_dummy
documentation
使用 dummies()
的替代解决方案我认为会更快
mydf = data.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
library(dummies)
mydf <- cbind(mydf, dummy(mydf$day, sep = "_"))
这会产生
x day mydf_Fri mydf_Mon mydf_Sat mydf_Sun mydf_Thurs mydf_Tues mydf_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
然后你可以使用 gsub()
来获得更简洁的名字
names(mydf) = gsub("mydf_", "", names(mydf))
head(mydf)
x day Fri Mon Sat Sun Thurs Tues Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
如果你想用管道做这个,你可以这样做:
library(dplyr)
library(sjmisc)
mydf %>%
to_dummy(day, suffix = "label") %>%
bind_cols(mydf) %>%
select(x, day, everything())
Returns:
# A tibble: 9 x 9 x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 a Mon 0. 1. 0. 0. 0. 0. 0. 2 b Tues 0. 0. 0. 0. 0. 1. 0. 3 c Wed 0. 0. 0. 0. 0. 0. 1. 4 d Thurs 0. 0. 0. 0. 1. 0. 0. 5 e Fri 1. 0. 0. 0. 0. 0. 0. 6 f Sat 0. 0. 1. 0. 0. 0. 0. 7 g Sun 0. 0. 0. 1. 0. 0. 0. 8 h Fri 1. 0. 0. 0. 0. 0. 0. 9 i Mon 0. 1. 0. 0. 0. 0. 0.
有了 dplyr
和 tidyr
我们可以做到:
library(dplyr)
library(tidyr)
mydf %>%
mutate(var = 1) %>%
spread(day, var, fill = 0, sep = "_") %>%
left_join(mydf) %>%
select(x, day, everything())
使用 base R 我们可以做类似的事情:
as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
Returns:
Fri Mon Sat Sun Thurs Tues Wed a 0 1 0 0 0 0 0 b 0 0 0 0 0 1 0 c 0 0 0 0 0 0 1 d 0 0 0 0 1 0 0 e 1 0 0 0 0 0 0 f 0 0 1 0 0 0 0 g 0 0 0 1 0 0 0 h 1 0 0 0 0 0 0 i 0 1 0 0 0 0 0
代替sjmisc::to_dummy
你也可以使用基数Rmodel.matrix
; dplyr
解决方案是:
library(dplyr);
model.matrix(~ 0 + day, mydf) %>%
as.data.frame() %>%
bind_cols(mydf) %>%
select(x, day, everything());
# x day dayFri dayMon daySat daySun dayThurs dayTues dayWed
#1 a Mon 0 1 0 0 0 0 0
#2 b Tues 0 0 0 0 0 1 0
#3 c Wed 0 0 0 0 0 0 1
#4 d Thurs 0 0 0 0 1 0 0
#5 e Fri 1 0 0 0 0 0 0
#6 f Sat 0 0 1 0 0 0 0
#7 g Sun 0 0 0 1 0 0 0
#8 h Fri 1 0 0 0 0 0 0
#9 i Mon 0 1 0 0 0 0 0