如何在不使用 R 中的循环的情况下编写此指标矩阵

Question

我有一个由数字序列给出的因子向量。这些因素也存在于单独的数据集 a 中，称为 test_set 和 train_set。以下代码的作用是找到数据集中的因子在因子向量中匹配的位置，并将 1 放在矩阵的位置。将此矩阵 compound_test 乘以 test_set$Compound 应该得到 compare_comp。

test_set <- data.frame(Compound=letters[sample(1:3,10,replace = TRUE)])
train_set <- data.frame(Compound=letters[sample(1:3,10,replace = TRUE)])

compare_comp <- letters[1:3]
compound_test <- matrix(0,nrow(test_set),length(compare_comp)) # test indicator matrix
compound_train <-matrix(0,nrow(train_set),length(compare_comp))

for (i in 1:length(compare_comp)){
  compound_test[which(compare_comp[i]==test_set$Compound),i]=1
  compound_train[which(compare_comp[i]==train_set$Compound),i]=1}

R 中是否有一个函数可以让我创建相同的东西而不需要 for 循环？我试过 model.matrix(~Compound,data=test_set) 但由于参考级别，这不包括列并且还会产生不需要的列名

Answer 1

更简单的选项是 model.matrix 来自 base R

model.matrix(~ Compound-1, train_set)
model.matrix(~ Compound-1, test_set)

或者 table 如果我们 cbind 有一系列的行

也可以使用

table(cbind(nr = seq_len(nrow(train_set)), train_set))

如何在不使用 R 中的循环的情况下编写此指标矩阵

How can I code this indicator matrix without using a loop in R

for-loop

r

matrix

indicator