如何创建一个新变量并为其分配一个对应于 R 中另一个变量的值?
How to create a new variable and assign it a value corresponding to another variable in R?
这是一些与我使用的真实数据集对应的模拟数据:
模拟数据集
a <- c("a","b","c","d","e","f","g","h","i","j")
b <- 1:10
names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
data <- data.frame(a,b,names)
创建新变量性别
data <- data %>%
mutate(gender = NA)
我想为数据集中的 names
变量分配一个“性别”值。我不想手动执行此操作,因为我正在处理 1000 次观察。但是我确实有这些变量,其中包含与正确性别对应的“名称”值:
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
但是我不知道如何使用它们来分配“性别”值以对应我数据集中的特定“名称”。
这是我尝试过的:
data$gender[data$names== male] <- "Male"
并且:
data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male"
此代码并未将“男性”分配给所有值。我收到一条警告消息:
"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
longer object length is not a multiple of shorter object length"
有谁知道如何为 names
变量对应的 gender
变量赋值?
我们可以创建一个名为 list
然后 stack
它到一个两列的数据集,我们在连接中使用它
new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>%
left_join(new, by = "names")
-输出
data
a b names gender
1 a 1 Alex male
2 b 2 Ale male
3 c 3 Alexandra female
4 d 4 Alexander male
5 e 5 Ali female
6 f 6 Amanda female
7 g 7 Alix noanswer
8 h 8 Ajax noanswer
9 i 9 Aley noanswer
10 j 10 Ajay noanswer
关于 OP 的 warning
,只是 ==
是元素比较,这主要适用于数据集 1 的 length
为 1(被回收) 或与另一个 length
相同。在这里,length
是不同的。因此,它会被回收,并且由于它不是其他向量长度的倍数,因此会出现警告。但是,有时我们没有收到警告,但它仍然是不正确的,因为它所做的与下面的类似。如果第二个向量的长度为 3,第一个向量的长度为 5
v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...
相反,我们可以使用 %in%
data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"
数据
data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander",
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
class = "data.frame", row.names = c(NA,
-10L))
您也可以使用以下解决方案:
library(dplyr)
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
data %>%
mutate(gender = case_when(
names %in% male ~ "Male",
names %in% female ~ "Female",
names %in% noanswer ~ "Noanswer"
))
a b names gender
1 a 1 Alex Male
2 b 2 Ale Male
3 c 3 Alexandra Female
4 d 4 Alexander Male
5 e 5 Ali Female
6 f 6 Amanda Female
7 g 7 Alix Noanswer
8 h 8 Ajax Noanswer
9 i 9 Aley Noanswer
10 j 10 Ajay Noanswer
这是一些与我使用的真实数据集对应的模拟数据:
模拟数据集
a <- c("a","b","c","d","e","f","g","h","i","j")
b <- 1:10
names <-c("Alex","Ale","Alexandra","Alexander","Ali","Amanda","Alix","Ajax","Aley","Ajay")
data <- data.frame(a,b,names)
创建新变量性别
data <- data %>%
mutate(gender = NA)
我想为数据集中的 names
变量分配一个“性别”值。我不想手动执行此操作,因为我正在处理 1000 次观察。但是我确实有这些变量,其中包含与正确性别对应的“名称”值:
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
但是我不知道如何使用它们来分配“性别”值以对应我数据集中的特定“名称”。
这是我尝试过的:
data$gender[data$names== male] <- "Male"
并且:
data$gender[data$names== c("Alex", "Ale", "Alexander")] <- "Male"
此代码并未将“男性”分配给所有值。我收到一条警告消息:
"Warning message:
In data$names == c("Alex", "Ale", "Alexander") :
longer object length is not a multiple of shorter object length"
有谁知道如何为 names
变量对应的 gender
变量赋值?
我们可以创建一个名为 list
然后 stack
它到一个两列的数据集,我们在连接中使用它
new <- stack(list(male = male, female = female, noanswer = noanswer))
names(new) <- c("names", "gender")
data <- data %>%
left_join(new, by = "names")
-输出
data
a b names gender
1 a 1 Alex male
2 b 2 Ale male
3 c 3 Alexandra female
4 d 4 Alexander male
5 e 5 Ali female
6 f 6 Amanda female
7 g 7 Alix noanswer
8 h 8 Ajax noanswer
9 i 9 Aley noanswer
10 j 10 Ajay noanswer
关于 OP 的 warning
,只是 ==
是元素比较,这主要适用于数据集 1 的 length
为 1(被回收) 或与另一个 length
相同。在这里,length
是不同的。因此,它会被回收,并且由于它不是其他向量长度的倍数,因此会出现警告。但是,有时我们没有收到警告,但它仍然是不正确的,因为它所做的与下面的类似。如果第二个向量的长度为 3,第一个向量的长度为 5
v1[1] == v2[1]
v1[2] == v2[2]
v1[3] == v2[3]
v1[4] == v2[1]
...
相反,我们可以使用 %in%
data$gender[data$names %in% male] <- "Male"
data$gender[data$names %in% female] <- "Female"
data$gender[data$names %in% noanswer] <- "noanswer"
数据
data <- structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j"), b = 1:10, names = c("Alex", "Ale", "Alexandra", "Alexander",
"Ali", "Amanda", "Alix", "Ajax", "Aley", "Ajay")),
class = "data.frame", row.names = c(NA,
-10L))
您也可以使用以下解决方案:
library(dplyr)
male <- c("Alex", "Ale", "Alexander")
female <- c("Alexandra", "Ali", "Amanda")
noanswer <- c("Alix", "Ajax", "Aley", "Ajay")
data %>%
mutate(gender = case_when(
names %in% male ~ "Male",
names %in% female ~ "Female",
names %in% noanswer ~ "Noanswer"
))
a b names gender
1 a 1 Alex Male
2 b 2 Ale Male
3 c 3 Alexandra Female
4 d 4 Alexander Male
5 e 5 Ali Female
6 f 6 Amanda Female
7 g 7 Alix Noanswer
8 h 8 Ajax Noanswer
9 i 9 Aley Noanswer
10 j 10 Ajay Noanswer