识别列中的多个字符串,然后在单独的列上输出观察结果; R v3.3.0
Identifying Multiple String in a Column then Outputting the Observation on a Separate Column; R v3.3.0
我有一个示例数据,例如:
当前数据帧:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Bananas")
df <- as.data.frame(cbind(Person,Fruits))
我试图确定字符串中是否包含单个水果,然后将水果的名称放在单独的列中,如果苹果与其他水果一起列出,则 "Apple & Other",或者如果有多个水果(不包括苹果)将其标识为 "Multiple",如下所示:
想要输出:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Apples,Bananas")
Fruits2 <- c("Apples","Apples & Other","Bananas","Multiple")
df2 <- cbind(Person,Fruits)
df2 <- as.data.frame(cbind(df2,Fruits2))
我试过使用以下 ifelse 语句:
df$Fruits2 <- ifelse(grep("\bApples\b",df$Fruits),"Apples",
ifelse(grep(".Apples.|.Apples|Apples.",df$Fruits),"Apples & Other",
ifelse(grep("\bOranges\b",df$Fruits),"Oranges",
ifelse(grep(".Oranges.|.Oranges|Oranges.",df$Fruits),"Multiple",
ifelse(grep("\bBananas\b",df$Fruits),"Bananas",
ifelse(grep(".Bananas.|.Bananas|Bananas.",df$Fruits),"Multiple","TBD"))))))
但是df$Fruits2的输出全部变成了Output。不知道是不是嵌套if语句的逻辑,如果有更好的解决办法,不胜感激。
这个 if-else 对于你的逻辑来说可能更简洁,通常你从大多数特定情况到更一般的情况,此外你将需要 grepl
其中 returns 逻辑值而不是 grep
其中 returns 整数或原始向量中的值:
library(dplyr)
df %>% mutate(Fruits2 = ifelse(grepl(",", Fruits),
ifelse(grepl("Apples", Fruits), "Apples & Other", "Multiple"),
Fruits))
# Person Fruits Fruits2
# 1 John Apples Apples
# 2 Jacob Apples,Oranges Apples & Other
# 3 Jill Bananas Bananas
# 4 Joan Oranges,Bananas Multiple
您可以使用 strsplit() 在“,”上拆分,并使用 ifelse 验证条件并使用您需要的字符串保存在新列中。
df$Fruits2 <- sapply(strsplit(df$Fruits,","),function(x){ifelse(length(x)==1,x[1], ifelse(length(x)>=2 & "Apples" %in% x, "Apples & Other","Multiple"))})
df
Person Fruits Fruits2
1 John Apples Apples
2 Jacob Apples,Oranges Apples & Other
3 Jill Bananas Bananas
4 Joan Oranges,Bananas Multiple
我有一个示例数据,例如:
当前数据帧:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Bananas")
df <- as.data.frame(cbind(Person,Fruits))
我试图确定字符串中是否包含单个水果,然后将水果的名称放在单独的列中,如果苹果与其他水果一起列出,则 "Apple & Other",或者如果有多个水果(不包括苹果)将其标识为 "Multiple",如下所示:
想要输出:
Person <- c("John","Jacob","Jill","Joan")
Fruits <- c("Apples","Apples,Oranges","Bananas","Oranges,Apples,Bananas")
Fruits2 <- c("Apples","Apples & Other","Bananas","Multiple")
df2 <- cbind(Person,Fruits)
df2 <- as.data.frame(cbind(df2,Fruits2))
我试过使用以下 ifelse 语句:
df$Fruits2 <- ifelse(grep("\bApples\b",df$Fruits),"Apples",
ifelse(grep(".Apples.|.Apples|Apples.",df$Fruits),"Apples & Other",
ifelse(grep("\bOranges\b",df$Fruits),"Oranges",
ifelse(grep(".Oranges.|.Oranges|Oranges.",df$Fruits),"Multiple",
ifelse(grep("\bBananas\b",df$Fruits),"Bananas",
ifelse(grep(".Bananas.|.Bananas|Bananas.",df$Fruits),"Multiple","TBD"))))))
但是df$Fruits2的输出全部变成了Output。不知道是不是嵌套if语句的逻辑,如果有更好的解决办法,不胜感激。
这个 if-else 对于你的逻辑来说可能更简洁,通常你从大多数特定情况到更一般的情况,此外你将需要 grepl
其中 returns 逻辑值而不是 grep
其中 returns 整数或原始向量中的值:
library(dplyr)
df %>% mutate(Fruits2 = ifelse(grepl(",", Fruits),
ifelse(grepl("Apples", Fruits), "Apples & Other", "Multiple"),
Fruits))
# Person Fruits Fruits2
# 1 John Apples Apples
# 2 Jacob Apples,Oranges Apples & Other
# 3 Jill Bananas Bananas
# 4 Joan Oranges,Bananas Multiple
您可以使用 strsplit() 在“,”上拆分,并使用 ifelse 验证条件并使用您需要的字符串保存在新列中。
df$Fruits2 <- sapply(strsplit(df$Fruits,","),function(x){ifelse(length(x)==1,x[1], ifelse(length(x)>=2 & "Apples" %in% x, "Apples & Other","Multiple"))})
df
Person Fruits Fruits2
1 John Apples Apples
2 Jacob Apples,Oranges Apples & Other
3 Jill Bananas Bananas
4 Joan Oranges,Bananas Multiple