创建一个函数来迭代列并在 R 中的每次迭代中创建一个新列
Create a function to iterate over columns and create a new column each iteration in R
有时我会收到带有李克特量表字符串项目的调查数据,我需要将这些项目更改为数字,以便计算基本的描述性统计数据。为了做到这一点,我通常使用 case_when 函数为每个项目创建一个新列,并为每个数据点分配一个数值。我正在尝试编写一个函数,可以同时对许多不同的列执行此操作,这样我就不必继续复制和粘贴代码了。我对此比较陌生,所以任何帮助将不胜感激:)
这是我之前在 R 中所做的:
#create data frame
df <- data.frame(v1 = c("Definitely True", "Somewhat True","Somewhat False","Definitely False"),
v2 = c("Definitely False","Somewhat False","Somewhat True","Definitely True"))
#Use case_when to add numeric columns to dataframe
df$v1n <- case_when((df$v1 == "Definitely True")==TRUE ~ "1",
(df$v1 == "Somewhat True")==TRUE ~ "2",
(df$v1 == "Somewhat False")==TRUE ~ "3",
(df$v1 == "Definitely False")==TRUE ~ "4")
df$v2n <- case_when((df$v2 == "Definitely True")==TRUE ~ "1",
(df$v2 == "Somewhat True")==TRUE ~ "2",
(df$v2 == "Somewhat False")==TRUE ~ "3",
(df$v2 == "Definitely False")==TRUE ~ "4")
如果我想用数值替换每个字符串值并覆盖现有列中的数据,这会起作用:
for(i in colnames(data_x)) {
data_x[[i]] <- case_when((data_x[,i] == "Definitely True")==TRUE ~ "1",
(data_x[,i] == "Somewhat True")==TRUE ~ "2",
(data_x[,i] == "Somewhat False")==TRUE ~ "3",
(data_x[,i] == "Definitely False")==TRUE ~ "4")
}
但我想找到一种方法来为每次迭代创建一个新列,就像我对复制和粘贴版本所做的那样。这是我尝试过但没有成功的东西。如有任何帮助,我们将不胜感激。
for(i in colnames(df)) {
df[[var[i]]] <- case_when((df[,i] == "Definitely True")==TRUE ~ "1",
(df[,i] == "Somewhat True")==TRUE ~ "2",
(df[,i] == "Somewhat False")==TRUE ~ "3",
(df[,i] == "Definitely False")==TRUE ~ "4")
}
dplyr
df %>%
mutate(across(v1:v2, ~ case_when(
. == "Definitely True" ~ "1",
. == "Somewhat True" ~ "2",
. == "Somewhat False" ~ "3",
TRUE ~ "4"
), .names = "{.col}n")
)
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1
across
使我们能够跨多个列做一件事。我们可以使用 v1:v2
语法,或其他 dplyr
选择器函数之一,如 matches
、starts_with
等
- 此处
across
的第二个参数是 tilde-function(rlang
样式),其中 .
每次迭代都会替换为列数据。例如,第一次评估此 tilde-function 时,.
引用向量 df$v1
.
- 因为
mutate(across(...))
的默认操作是 替换 列,我添加 .names=
来控制结果数据的命名。此表示法使用 glue
语法,其中 {.col}
替换为每次迭代中评估的列的名称。
基础 R
我将添加查找映射的可选使用。
lookup <- c("Definitely True" = "1", "Somewhat True" = "2", "Somewhat False" = "3", "Definitely False" = "4")
df <- cbind(df, setNames(lapply(df[,1:2], function(z) lookup[z]), paste0(names(df[,1:2]), "n")))
rownames(df) <- NULL
df
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1
我倾向于采用不同的方式。如果您将李克特量表列转换为 factor
,级别顺序正确,您可以使用 as.integer(...)
直接获取数字级别,而无需所有这些 case_when(...)
业务。
这是一个使用 data.table
的例子
library(data.table)
likertScale <- c("Definitely True", "Somewhat True","Somewhat False","Definitely False")
cols <- names(df)
setDT(df)[, c(cols):=lapply(.SD, factor, levels=likertScale)]
df[, paste0(cols, 'n'):=lapply(.SD, as.integer), .SDcols=cols]
df
## v1 v2 v1n v2n
## 1: Definitely True Definitely False 1 4
## 2: Somewhat True Somewhat False 2 3
## 3: Somewhat False Somewhat True 3 2
## 4: Definitely False Definitely True 4 1
有时我会收到带有李克特量表字符串项目的调查数据,我需要将这些项目更改为数字,以便计算基本的描述性统计数据。为了做到这一点,我通常使用 case_when 函数为每个项目创建一个新列,并为每个数据点分配一个数值。我正在尝试编写一个函数,可以同时对许多不同的列执行此操作,这样我就不必继续复制和粘贴代码了。我对此比较陌生,所以任何帮助将不胜感激:)
这是我之前在 R 中所做的:
#create data frame
df <- data.frame(v1 = c("Definitely True", "Somewhat True","Somewhat False","Definitely False"),
v2 = c("Definitely False","Somewhat False","Somewhat True","Definitely True"))
#Use case_when to add numeric columns to dataframe
df$v1n <- case_when((df$v1 == "Definitely True")==TRUE ~ "1",
(df$v1 == "Somewhat True")==TRUE ~ "2",
(df$v1 == "Somewhat False")==TRUE ~ "3",
(df$v1 == "Definitely False")==TRUE ~ "4")
df$v2n <- case_when((df$v2 == "Definitely True")==TRUE ~ "1",
(df$v2 == "Somewhat True")==TRUE ~ "2",
(df$v2 == "Somewhat False")==TRUE ~ "3",
(df$v2 == "Definitely False")==TRUE ~ "4")
如果我想用数值替换每个字符串值并覆盖现有列中的数据,这会起作用:
for(i in colnames(data_x)) {
data_x[[i]] <- case_when((data_x[,i] == "Definitely True")==TRUE ~ "1",
(data_x[,i] == "Somewhat True")==TRUE ~ "2",
(data_x[,i] == "Somewhat False")==TRUE ~ "3",
(data_x[,i] == "Definitely False")==TRUE ~ "4")
}
但我想找到一种方法来为每次迭代创建一个新列,就像我对复制和粘贴版本所做的那样。这是我尝试过但没有成功的东西。如有任何帮助,我们将不胜感激。
for(i in colnames(df)) {
df[[var[i]]] <- case_when((df[,i] == "Definitely True")==TRUE ~ "1",
(df[,i] == "Somewhat True")==TRUE ~ "2",
(df[,i] == "Somewhat False")==TRUE ~ "3",
(df[,i] == "Definitely False")==TRUE ~ "4")
}
dplyr
df %>%
mutate(across(v1:v2, ~ case_when(
. == "Definitely True" ~ "1",
. == "Somewhat True" ~ "2",
. == "Somewhat False" ~ "3",
TRUE ~ "4"
), .names = "{.col}n")
)
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1
across
使我们能够跨多个列做一件事。我们可以使用v1:v2
语法,或其他dplyr
选择器函数之一,如matches
、starts_with
等- 此处
across
的第二个参数是 tilde-function(rlang
样式),其中.
每次迭代都会替换为列数据。例如,第一次评估此 tilde-function 时,.
引用向量df$v1
. - 因为
mutate(across(...))
的默认操作是 替换 列,我添加.names=
来控制结果数据的命名。此表示法使用glue
语法,其中{.col}
替换为每次迭代中评估的列的名称。
基础 R
我将添加查找映射的可选使用。
lookup <- c("Definitely True" = "1", "Somewhat True" = "2", "Somewhat False" = "3", "Definitely False" = "4")
df <- cbind(df, setNames(lapply(df[,1:2], function(z) lookup[z]), paste0(names(df[,1:2]), "n")))
rownames(df) <- NULL
df
# v1 v2 v1n v2n
# 1 Definitely True Definitely False 1 4
# 2 Somewhat True Somewhat False 2 3
# 3 Somewhat False Somewhat True 3 2
# 4 Definitely False Definitely True 4 1
我倾向于采用不同的方式。如果您将李克特量表列转换为 factor
,级别顺序正确,您可以使用 as.integer(...)
直接获取数字级别,而无需所有这些 case_when(...)
业务。
这是一个使用 data.table
library(data.table)
likertScale <- c("Definitely True", "Somewhat True","Somewhat False","Definitely False")
cols <- names(df)
setDT(df)[, c(cols):=lapply(.SD, factor, levels=likertScale)]
df[, paste0(cols, 'n'):=lapply(.SD, as.integer), .SDcols=cols]
df
## v1 v2 v1n v2n
## 1: Definitely True Definitely False 1 4
## 2: Somewhat True Somewhat False 2 3
## 3: Somewhat False Somewhat True 3 2
## 4: Definitely False Definitely True 4 1