循环，使用条件创建新变量作为现有变量的函数

Question

我有一些数据包含 400 多列和约 80 个观察值。我想使用 for 循环遍历每一列，如果它包含所需的前缀 exp_，我想创建一个新列，即该值除以引用列，存储为相同的名称但带有后缀 _pp。我也想用其他前缀 rev_ 做一个 else，但我认为只要我能弄清楚第一个问题，我就可以自己解决其余的问题。一些示例数据如下：

exp_alpha     exp_bravo    rev_charlie     rev_delta     pupils
10            28           38              95            2
24            56           39              24            5
94            50           95              45            3
15            93           72              83            9
72            66           10              12            3

我第一次尝试时，循环运行正确但只存储了 if 语句为真的最后一列，而不是存储了 if 语句为真的每一列。我做了一些调整并丢失了该代码，但现在可以正常运行，但根本不会修改数据框。

for (i in colnames(test)) {
  if(grepl("exp_", colnames(test)[i])) {
    test[paste(i,"pp", sep="_")] <- test[i] / test$pupils)
  }
}

我对这是做什么的理解：

遍历列名向量
如果子字符串 "exp_" 在 colnames 向量的第 i 个元素中 == TRUE
在数据集中创建一个新列，它是 colnames 向量的第 i 个元素除以参考类别（学生），并在末尾附加“_pp”
否则什么都不做

我想因为我的代码执行没有错误但没有做任何事情，所以我的问题出在 if() 语句中，但我无法弄清楚我做错了什么。我还尝试在 if() 语句中添加“==TRUE”，但结果相同。

Answer 1

几乎正确，您没有定义循环的长度，所以什么也没发生。试试这个：

for (i in 1:length(colnames(test))) {
  if(grepl("exp_", colnames(test)[i])) {
  test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
  }
}

Answer 2

作为@timfaber 答案的替代方案，您可以保持第一行相同但不将 i 视为索引：

for (i in colnames(test)) {
  if(grepl("exp_", i)) {
    print(i)
    test[paste(i,"pp", sep="_")] <- test[i] / test$pupils
  }
}

Answer 3

线性解：

不要为此使用循环！您可以线性化您的代码，运行它比在列上循环快得多。方法如下：

# Extract column names
cNames <- colnames(test)
# Find exp in column names
foo <- grep("exp", cNames)
# Divide by reference: ALL columns at the SAME time
bar <- test[, foo] / test$pupils
# Rename exp to pp : ALL columns at the SAME time
colnames(bar) <- gsub("exp", "pp", cNames[foo])
# Add to original dataset instead of iteratively appending 
cbind(test, bar)

循环，使用条件创建新变量作为现有变量的函数

Loop, create new variable as function of existing variable with conditional

for-loop

r

grepl