创建一个函数来迭代列并在 R 中的每次迭代中创建一个新列

Create a function to iterate over columns and create a new column each iteration in R

有时我会收到带有李克特量表字符串项目的调查数据,我需要将这些项目更改为数字,以便计算基本的描述性统计数据。为了做到这一点,我通常使用 case_when 函数为每个项目创建一个新列,并为每个数据点分配一个数值。我正在尝试编写一个函数,可以同时对许多不同的列执行此操作,这样我就不必继续复制和粘贴代码了。我对此比较陌生,所以任何帮助将不胜感激:)

这是我之前在 R 中所做的:

#create data frame
df <- data.frame(v1 = c("Definitely True", "Somewhat True","Somewhat False","Definitely False"),
                 v2 = c("Definitely False","Somewhat False","Somewhat True","Definitely True"))

#Use case_when to add numeric columns to dataframe
df$v1n <- case_when((df$v1 == "Definitely True")==TRUE ~ "1",
                         (df$v1 == "Somewhat True")==TRUE ~ "2",
                         (df$v1 == "Somewhat False")==TRUE ~ "3",
                         (df$v1 == "Definitely False")==TRUE ~ "4")
df$v2n <- case_when((df$v2 == "Definitely True")==TRUE ~ "1",
                         (df$v2 == "Somewhat True")==TRUE ~ "2",
                         (df$v2 == "Somewhat False")==TRUE ~ "3",
                         (df$v2 == "Definitely False")==TRUE ~ "4")

如果我想用数值替换每个字符串值并覆盖现有列中的数据,这会起作用:

for(i in colnames(data_x)) {
  data_x[[i]] <- case_when((data_x[,i] == "Definitely True")==TRUE ~ "1",
                         (data_x[,i] == "Somewhat True")==TRUE ~ "2",
                         (data_x[,i] == "Somewhat False")==TRUE ~ "3",
                         (data_x[,i] == "Definitely False")==TRUE ~ "4")
}

但我想找到一种方法来为每次迭代创建一个新列,就像我对复制和粘贴版本所做的那样。这是我尝试过但没有成功的东西。如有任何帮助,我们将不胜感激。

for(i in colnames(df)) {
  df[[var[i]]] <- case_when((df[,i] == "Definitely True")==TRUE ~ "1",
                         (df[,i] == "Somewhat True")==TRUE ~ "2",
                         (df[,i] == "Somewhat False")==TRUE ~ "3",
                         (df[,i] == "Definitely False")==TRUE ~ "4")
}

dplyr

df %>%
  mutate(across(v1:v2, ~ case_when(
    . == "Definitely True" ~ "1", 
    . == "Somewhat True" ~ "2", 
    . == "Somewhat False" ~ "3", 
    TRUE ~ "4"
    ), .names = "{.col}n")
  )
#                 v1               v2 v1n v2n
# 1  Definitely True Definitely False   1   4
# 2    Somewhat True   Somewhat False   2   3
# 3   Somewhat False    Somewhat True   3   2
# 4 Definitely False  Definitely True   4   1
  • across 使我们能够跨多个列做一件事。我们可以使用 v1:v2 语法,或其他 dplyr 选择器函数之一,如 matchesstarts_with
  • 此处 across 的第二个参数是 tilde-function(rlang 样式),其中 . 每次迭代都会替换为列数据。例如,第一次评估此 tilde-function 时,. 引用向量 df$v1.
  • 因为 mutate(across(...)) 的默认操作是 替换 列,我添加 .names= 来控制结果数据的命名。此表示法使用 glue 语法,其中 {.col} 替换为每次迭代中评估的列的名称。

基础 R

我将添加查找映射的可选使用。

lookup <- c("Definitely True" = "1", "Somewhat True" = "2", "Somewhat False" = "3", "Definitely False" = "4")
df <- cbind(df, setNames(lapply(df[,1:2], function(z) lookup[z]), paste0(names(df[,1:2]), "n")))
rownames(df) <- NULL
df
#                 v1               v2 v1n v2n
# 1  Definitely True Definitely False   1   4
# 2    Somewhat True   Somewhat False   2   3
# 3   Somewhat False    Somewhat True   3   2
# 4 Definitely False  Definitely True   4   1

我倾向于采用不同的方式。如果您将李克特量表列转换为 factor,级别顺序正确,您可以使用 as.integer(...) 直接获取数字级别,而无需所有这些 case_when(...) 业务。

这是一个使用 data.table

的例子
library(data.table)
likertScale <- c("Definitely True", "Somewhat True","Somewhat False","Definitely False")
cols        <- names(df)
setDT(df)[, c(cols):=lapply(.SD, factor, levels=likertScale)]
df[, paste0(cols, 'n'):=lapply(.SD, as.integer), .SDcols=cols]
df
##                  v1               v2 v1n v2n
## 1:  Definitely True Definitely False   1   4
## 2:    Somewhat True   Somewhat False   2   3
## 3:   Somewhat False    Somewhat True   3   2
## 4: Definitely False  Definitely True   4   1