将 recode() 与通过 paste() 生成的变量名一起使用
Using recode() with variable names generated through paste()
我正在尝试将包含答案标签的变量作为字符重新编码为数值。为此,我正在使用 dplyr 的 recode()
。
为了自动执行此操作,我想使用 paste()
生成变量名称,但显然 recode()
无法从 paste
.
获取输出
我已经尝试了 noquote()
和 as.name()
但是对于这两个 R 告诉我重新编码不能使用 class "noquote"/"name" 的对象。
示例:
item1 <- c("Don't agree at all", "Totally agree")
item2 <- c("Indifferent", "Totally agree")
for (i in 1:2) {
recode(paste("item", i, sep=""), "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
}
我会期待
> item1
[1] 3 1
我该如何解决这个问题?
更新
我找到了解决方法,首先将相关列提取到另一个数据框中,然后将 recode()
函数与 sapply()
一起应用。现在我可以重新合并数据框了。
关于“但显然 recode() 无法从粘贴中获取输出。”:这与 recode
无关,但(几乎)any R函数以这种方式工作。 paste
returns 一个字符串,recode
期望一个向量作为它的第一个参数...(值得注意的例外,除其他外:library
我们可以传递一个字符串 或作为对象的库名称)。
如果您坚持使用“for 循环”方法,您可以做的是结合使用 assign
和 eval(sym("a string"))
:
item1 <- c("Don't agree at all", "Totally agree")
item2 <- c("Indifferent", "Totally agree")
library(dplyr)
for (i in 1:2) {
assign(paste("item", i, sep="") , recode(eval(sym(paste("item", i, sep=""))), "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3))
}
这导致:
item1
[1] 3 1
item2
[1] 2 1
编辑:
一种可能更直接、更“dplyr”-y 的方法类似于:
tdd <- data.frame(item1, item2) %>%
mutate_at(vars(starts_with("item")), ~recode(., "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3))
而 tdd
现在是:
tdd
item1 item2
1 3 2
2 1 1
您只需将命名向量 v
与 mget
中的变量列表一起放入 Map
并对其进行子集化。
v <- c("Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
Map(function(x, y) unname(y[x]), mget(ls(pattern="^item")), list(v))
# $item1
# [1] 3 1
#
# $item2
# [1] 2 1
或者,假设您有这样一个数据框,
head(dat1)
# id item1 item2 x
# 1 1 Totally agree Totally agree 0.0356312
# 2 2 Totally agree Totally agree 1.3149588
# 3 3 Totally agree Indifferent 0.9781675
# 4 4 Totally agree Indifferent 0.8817912
# 5 5 Indifferent Indifferent 0.4822047
# 6 6 Indifferent Don't agree at all 0.9657529
然后你可以用类似的方式来做这个。我们甚至可以简化代码,因为我们不再需要 Map
到 return unname
d 个对象。
v1 <- c("Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
item_nm <- c("item1", "item2")
dat1[item_nm] <- Map(`[`, list(v1), dat2[item_nm])
dat1
# id item1 item2 x
# 1 1 1 1 0.0356312
# 2 2 1 1 1.3149588
# 3 3 1 2 0.9781675
# 4 4 1 2 0.8817912
# 5 5 2 2 0.4822047
# 6 6 2 3 0.9657529
# 7 7 2 3 -0.8145709
# 8 8 1 1 0.2839578
# 9 9 3 1 -0.1616986
# 10 10 3 3 1.9355718
每个 Map
迭代都会回收第二个参数(即 list(v1, v1)
也可以)。
更一般地说,对于您想要重新编码的每一列,list
Map
.
的第二个参数中多一个向量
head(dat2)
# id item1 item2 x
# 1 1 Totally agree Always 0.0356312
# 2 2 Totally agree Always 1.3149588
# 3 3 Totally agree Both 0.9781675
# 4 4 Totally agree Both 0.8817912
# 5 5 Indifferent Both 0.4822047
# 6 6 Indifferent Never 0.9657529
v2 <- c("Always"=1, "Both"=2, "Never"=3)
dat2[item_nm] <- Map(`[`, list(v1, v2), dat2[item_nm])
dat2
# id item1 item2 x
# 1 1 1 1 0.0356312
# 2 2 1 1 1.3149588
# 3 3 1 2 0.9781675
# 4 4 1 2 0.8817912
# 5 5 2 2 0.4822047
# 6 6 2 3 0.9657529
# 7 7 2 3 -0.8145709
# 8 8 1 1 0.2839578
# 9 9 3 1 -0.1616986
# 10 10 3 3 1.9355718
数据:
dat1 <- structure(list(id = 1:10, item1 = c("Totally agree", "Totally agree",
"Totally agree", "Totally agree", "Indifferent", "Indifferent",
"Indifferent", "Totally agree", "Don't agree at all", "Don't agree at all"
), item2 = c("Totally agree", "Totally agree", "Indifferent",
"Indifferent", "Indifferent", "Don't agree at all", "Don't agree at all",
"Totally agree", "Totally agree", "Don't agree at all"), x = c(0.0356311982051355,
1.31495884897891, 0.978167526364279, 0.881791226863203, 0.482204688262918,
0.965752878105794, -0.814570938270238, 0.283957806364306, -0.161698647607024,
1.93557176599585)), class = "data.frame", row.names = c(NA, -10L
))
dat2 <- structure(list(id = 1:10, item1 = c("Totally agree", "Totally agree",
"Totally agree", "Totally agree", "Indifferent", "Indifferent",
"Indifferent", "Totally agree", "Don't agree at all", "Don't agree at all"
), item2 = c("Always", "Always", "Both", "Both", "Both", "Never",
"Never", "Always", "Always", "Never"), x = c(0.0356311982051355,
1.31495884897891, 0.978167526364279, 0.881791226863203, 0.482204688262918,
0.965752878105794, -0.814570938270238, 0.283957806364306, -0.161698647607024,
1.93557176599585)), class = "data.frame", row.names = c(NA, -10L
))
我正在尝试将包含答案标签的变量作为字符重新编码为数值。为此,我正在使用 dplyr 的 recode()
。
为了自动执行此操作,我想使用 paste()
生成变量名称,但显然 recode()
无法从 paste
.
我已经尝试了 noquote()
和 as.name()
但是对于这两个 R 告诉我重新编码不能使用 class "noquote"/"name" 的对象。
示例:
item1 <- c("Don't agree at all", "Totally agree")
item2 <- c("Indifferent", "Totally agree")
for (i in 1:2) {
recode(paste("item", i, sep=""), "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
}
我会期待
> item1
[1] 3 1
我该如何解决这个问题?
更新
我找到了解决方法,首先将相关列提取到另一个数据框中,然后将 recode()
函数与 sapply()
一起应用。现在我可以重新合并数据框了。
关于“但显然 recode() 无法从粘贴中获取输出。”:这与 recode
无关,但(几乎)any R函数以这种方式工作。 paste
returns 一个字符串,recode
期望一个向量作为它的第一个参数...(值得注意的例外,除其他外:library
我们可以传递一个字符串 或作为对象的库名称)。
如果您坚持使用“for 循环”方法,您可以做的是结合使用 assign
和 eval(sym("a string"))
:
item1 <- c("Don't agree at all", "Totally agree")
item2 <- c("Indifferent", "Totally agree")
library(dplyr)
for (i in 1:2) {
assign(paste("item", i, sep="") , recode(eval(sym(paste("item", i, sep=""))), "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3))
}
这导致:
item1
[1] 3 1
item2
[1] 2 1
编辑:
一种可能更直接、更“dplyr”-y 的方法类似于:
tdd <- data.frame(item1, item2) %>%
mutate_at(vars(starts_with("item")), ~recode(., "Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3))
而 tdd
现在是:
tdd
item1 item2 1 3 2 2 1 1
您只需将命名向量 v
与 mget
中的变量列表一起放入 Map
并对其进行子集化。
v <- c("Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
Map(function(x, y) unname(y[x]), mget(ls(pattern="^item")), list(v))
# $item1
# [1] 3 1
#
# $item2
# [1] 2 1
或者,假设您有这样一个数据框,
head(dat1)
# id item1 item2 x
# 1 1 Totally agree Totally agree 0.0356312
# 2 2 Totally agree Totally agree 1.3149588
# 3 3 Totally agree Indifferent 0.9781675
# 4 4 Totally agree Indifferent 0.8817912
# 5 5 Indifferent Indifferent 0.4822047
# 6 6 Indifferent Don't agree at all 0.9657529
然后你可以用类似的方式来做这个。我们甚至可以简化代码,因为我们不再需要 Map
到 return unname
d 个对象。
v1 <- c("Totally agree"=1, "Indifferent"=2, "Don't agree at all"=3)
item_nm <- c("item1", "item2")
dat1[item_nm] <- Map(`[`, list(v1), dat2[item_nm])
dat1
# id item1 item2 x
# 1 1 1 1 0.0356312
# 2 2 1 1 1.3149588
# 3 3 1 2 0.9781675
# 4 4 1 2 0.8817912
# 5 5 2 2 0.4822047
# 6 6 2 3 0.9657529
# 7 7 2 3 -0.8145709
# 8 8 1 1 0.2839578
# 9 9 3 1 -0.1616986
# 10 10 3 3 1.9355718
每个 Map
迭代都会回收第二个参数(即 list(v1, v1)
也可以)。
更一般地说,对于您想要重新编码的每一列,list
Map
.
head(dat2)
# id item1 item2 x
# 1 1 Totally agree Always 0.0356312
# 2 2 Totally agree Always 1.3149588
# 3 3 Totally agree Both 0.9781675
# 4 4 Totally agree Both 0.8817912
# 5 5 Indifferent Both 0.4822047
# 6 6 Indifferent Never 0.9657529
v2 <- c("Always"=1, "Both"=2, "Never"=3)
dat2[item_nm] <- Map(`[`, list(v1, v2), dat2[item_nm])
dat2
# id item1 item2 x
# 1 1 1 1 0.0356312
# 2 2 1 1 1.3149588
# 3 3 1 2 0.9781675
# 4 4 1 2 0.8817912
# 5 5 2 2 0.4822047
# 6 6 2 3 0.9657529
# 7 7 2 3 -0.8145709
# 8 8 1 1 0.2839578
# 9 9 3 1 -0.1616986
# 10 10 3 3 1.9355718
数据:
dat1 <- structure(list(id = 1:10, item1 = c("Totally agree", "Totally agree",
"Totally agree", "Totally agree", "Indifferent", "Indifferent",
"Indifferent", "Totally agree", "Don't agree at all", "Don't agree at all"
), item2 = c("Totally agree", "Totally agree", "Indifferent",
"Indifferent", "Indifferent", "Don't agree at all", "Don't agree at all",
"Totally agree", "Totally agree", "Don't agree at all"), x = c(0.0356311982051355,
1.31495884897891, 0.978167526364279, 0.881791226863203, 0.482204688262918,
0.965752878105794, -0.814570938270238, 0.283957806364306, -0.161698647607024,
1.93557176599585)), class = "data.frame", row.names = c(NA, -10L
))
dat2 <- structure(list(id = 1:10, item1 = c("Totally agree", "Totally agree",
"Totally agree", "Totally agree", "Indifferent", "Indifferent",
"Indifferent", "Totally agree", "Don't agree at all", "Don't agree at all"
), item2 = c("Always", "Always", "Both", "Both", "Both", "Never",
"Never", "Always", "Always", "Never"), x = c(0.0356311982051355,
1.31495884897891, 0.978167526364279, 0.881791226863203, 0.482204688262918,
0.965752878105794, -0.814570938270238, 0.283957806364306, -0.161698647607024,
1.93557176599585)), class = "data.frame", row.names = c(NA, -10L
))