将二项式数据重塑为长伯努利格式
Reshape binomial data to long bernoulli format
我在一年后回到 R 并想将 rpart 用于分类树。
我的数据如下:
Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc.
有什么建议可以改造成下面的形状以便我可以使用 rpart 吗? (我相信 rpart 需要这样的数据)
ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...
谢谢!
您可以使用 reshape2
中的 melt
,然后使用 rep
s=melt(df,id.var=c('Category','Shape','Color'))
s[ rep( 1:nrow(s) , s$value ),]
Category Shape Color variable value
1 A Square Blue Yes 3
1.1 A Square Blue Yes 3
1.2 A Square Blue Yes 3
2 B Triangle Blue Yes 2
2.1 B Triangle Blue Yes 2
3 A Square Blue No 2
3.1 A Square Blue No 2
4 B Triangle Blue No 4
4.1 B Triangle Blue No 4
4.2 B Triangle Blue No 4
4.3 B Triangle Blue No 4
melt
将数据转换为长格式,然后重复变量在值列中出现的次数。
library(data.table)
melt(setDT(dat),1:3)[,rep(variable,value),by=.(Category,Shape,Color)]
Category Shape Color V1
1: A Square Blue Yes
2: A Square Blue Yes
3: A Square Blue Yes
4: A Square Blue No
5: A Square Blue No
6: B Triangle Blue Yes
7: B Triangle Blue Yes
8: B Triangle Blue No
9: B Triangle Blue No
10: B Triangle Blue No
11: B Triangle Blue No
使用:
图书馆(tidyverse)
dat%>%
rowwise()%>%
mutate(var=list(rep(c("Yes","No"),c(Yes,No))))%>%
select(-Yes,-No)%>%
unnest()
Category Shape Color var
<fct> <fct> <fct> <chr>
1 A Square Blue Yes
2 A Square Blue Yes
3 A Square Blue Yes
4 A Square Blue No
5 A Square Blue No
6 B Triangle Blue Yes
7 B Triangle Blue Yes
8 B Triangle Blue No
9 B Triangle Blue No
10 B Triangle Blue No
11 B Triangle Blue No
我在一年后回到 R 并想将 rpart 用于分类树。
我的数据如下:
Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc.
有什么建议可以改造成下面的形状以便我可以使用 rpart 吗? (我相信 rpart 需要这样的数据)
ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...
谢谢!
您可以使用 reshape2
中的 melt
,然后使用 rep
s=melt(df,id.var=c('Category','Shape','Color'))
s[ rep( 1:nrow(s) , s$value ),]
Category Shape Color variable value
1 A Square Blue Yes 3
1.1 A Square Blue Yes 3
1.2 A Square Blue Yes 3
2 B Triangle Blue Yes 2
2.1 B Triangle Blue Yes 2
3 A Square Blue No 2
3.1 A Square Blue No 2
4 B Triangle Blue No 4
4.1 B Triangle Blue No 4
4.2 B Triangle Blue No 4
4.3 B Triangle Blue No 4
melt
将数据转换为长格式,然后重复变量在值列中出现的次数。
library(data.table)
melt(setDT(dat),1:3)[,rep(variable,value),by=.(Category,Shape,Color)]
Category Shape Color V1
1: A Square Blue Yes
2: A Square Blue Yes
3: A Square Blue Yes
4: A Square Blue No
5: A Square Blue No
6: B Triangle Blue Yes
7: B Triangle Blue Yes
8: B Triangle Blue No
9: B Triangle Blue No
10: B Triangle Blue No
11: B Triangle Blue No
使用:
图书馆(tidyverse)
dat%>%
rowwise()%>%
mutate(var=list(rep(c("Yes","No"),c(Yes,No))))%>%
select(-Yes,-No)%>%
unnest()
Category Shape Color var
<fct> <fct> <fct> <chr>
1 A Square Blue Yes
2 A Square Blue Yes
3 A Square Blue Yes
4 A Square Blue No
5 A Square Blue No
6 B Triangle Blue Yes
7 B Triangle Blue Yes
8 B Triangle Blue No
9 B Triangle Blue No
10 B Triangle Blue No
11 B Triangle Blue No