R dummyvars - 单列的虚拟变量

Question

我正在尝试将一堆交易转换为宽矩阵，以运行一些回归模型。

Trans_id     item_id
  123         ABC
  123         DEF
  123         XYZ
  345         ABC
  ...         ...

我想转换成这样：

Trans_id     item_ABC    item_DEF   item_XYZ   
  123            1           1          1
  345            1           0          0

我正在尝试使用插入符号中的 dummyVars 函数来执行此操作，但无法让它执行我需要的操作。

dv1 <- dummyVars(Trans_id ~ item_id , data = res1)
df2 <- predict(dv1, res1)

只是给我一个没有虚拟矩阵的 item_id 列表。

 item_id
   ABC
   DEF
   XYZ
   ABC
   ...

有什么建议吗？

Answer 1

这是一个使用 data.table 的解决方案：

# load the data
data = read.table(
  text = 
"
Trans_id     item_id
  123         ABC
  123         DEF
  123         XYZ
  345         ABC
",
  header = TRUE
);

# load data table
library(data.table);

# make a data table
dt = setDT(x = data)[
  ,
  # make a count column for each item in each group
  .(
    item_ABC = length(x = which(x = item_id == "ABC")),
    item_DEF = length(x = which(x = item_id == "DEF")),
    item_XYZ = length(x = which(x = item_id == "XYZ"))
  ),
  # grouping by Trans_id
  by = Trans_id
];

# display the new table
dt;

       Trans_id item_ABC item_DEF item_XYZ
1:      123        1        1        1
2:      345        1        0        0

希望对您有所帮助！

Answer 2

如果我们使用data.table，那么dcast就可以使用

library(data.table)
dcast(setDT(data), Trans_id ~ paste0("item_", item_id), length)
#   Trans_id item_ABC item_DEF item_XYZ
#1:      123        1        1        1
#2:      345        1        0        0

或者更通用的方法是

dcast(setDT(data), Trans_id ~ paste0("item_", item_id), function(x) as.integer(length(x)>0))

数据

data <- structure(list(Trans_id = c(123L, 123L, 123L, 345L), item_id = structure(c(1L, 
2L, 3L, 1L), .Label = c("ABC", "DEF", "XYZ"), class = "factor")),
 .Names = c("Trans_id", 
"item_id"), class = "data.frame", row.names = c(NA, -4L))

Answer 3

你犯了一个很小的错误。使用像这样的预测命令：

df2 <- predict(dv1, newdata = res1)
View(df2)

这应该有效。

R dummyvars - 单列的虚拟变量

R dummyvars - dummy variables for a single column

r

r-caret

dummy-variable

数据