将二项式数据重塑为长伯努利格式

Reshape binomial data to long bernoulli format

我在一年后回到 R 并想将 rpart 用于分类树。

我的数据如下:

Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc. 

有什么建议可以改造成下面的形状以便我可以使用 rpart 吗? (我相信 rpart 需要这样的数据)

ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...

谢谢!

您可以使用 reshape2 中的 melt ,然后使用 rep

s=melt(df,id.var=c('Category','Shape','Color'))
s[ rep( 1:nrow(s) , s$value ),]
              Category     Shape Color variable value
1                    A    Square  Blue      Yes     3
1.1                  A    Square  Blue      Yes     3
1.2                  A    Square  Blue      Yes     3
2                    B  Triangle  Blue      Yes     2
2.1                  B  Triangle  Blue      Yes     2
3                    A    Square  Blue       No     2
3.1                  A    Square  Blue       No     2
4                    B  Triangle  Blue       No     4
4.1                  B  Triangle  Blue       No     4
4.2                  B  Triangle  Blue       No     4
4.3                  B  Triangle  Blue       No     4

melt 将数据转换为长格式,然后重复变量在值列中出现的次数。

library(data.table)
melt(setDT(dat),1:3)[,rep(variable,value),by=.(Category,Shape,Color)]
            Category     Shape Color  V1
 1:                A    Square  Blue Yes
 2:                A    Square  Blue Yes
 3:                A    Square  Blue Yes
 4:                A    Square  Blue  No
 5:                A    Square  Blue  No
 6:                B  Triangle  Blue Yes
 7:                B  Triangle  Blue Yes
 8:                B  Triangle  Blue  No
 9:                B  Triangle  Blue  No
10:                B  Triangle  Blue  No
11:                B  Triangle  Blue  No

使用:

图书馆(tidyverse)

dat%>%
  rowwise()%>%
  mutate(var=list(rep(c("Yes","No"),c(Yes,No))))%>%
  select(-Yes,-No)%>%
  unnest()
 Category   Shape    Color var  
  <fct>    <fct>    <fct> <chr>
 1 A        Square   Blue  Yes  
 2 A        Square   Blue  Yes  
 3 A        Square   Blue  Yes  
 4 A        Square   Blue  No   
 5 A        Square   Blue  No   
 6 B        Triangle Blue  Yes  
 7 B        Triangle Blue  Yes  
 8 B        Triangle Blue  No   
 9 B        Triangle Blue  No   
10 B        Triangle Blue  No   
11 B        Triangle Blue  No