dplyr mutate stringr str_detect 具有多个条件参数和相应的输出
dplyr mutate stringr str_detect with multiple conditional arguments and corresponding output
我想根据格式以不同方式改变字符串。此示例有 2 种基于包含某些标点符号的格式。向量的每个元素都包含与格式唯一关联的特定词。
我尝试了多种使用 ifelse 和 casewhen 的方法,但没有得到想要的结果,即 "keep" 字符串的最后一部分。
我正在尝试使用简单的动词,但不精通 grex。接受任何有关有效通用方法的建议。
library(dplyr)
library(stringr)
df <- data.frame(KPI = c("xxxxx.x...Alpha...Keep.1",
"xxxxx.x...Alpha..Keep.2",
"Bravo...Keep3",
"Bravo...Keep4",
"xxxxx...Charlie...Keep.5",
"xxxxx...Charlie...Keep.6"))
dot3dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][3]
dot3dot3split("xxxxx.x...Alpha...Keep.1") # returns as expected
"Keep.1"
dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][2]
dot3split("Bravo...Keep3") # returns as expected
"Keep3"
df1 <- df %>% mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = ifelse(str_detect(KPI, paste(c("Alpha", "Charlie"), collapse = '|')), dot3dot3split(KPI),
ifelse(str_detect(KPI, "Bravo"), dot3split(KPI), KPI))) # not working as expected
df1$KPI.v2
"Keep.1" "Keep.1" "Alpha" "Alpha" "Keep.1" "Keep.1"
您设计的函数(dot3dot3split
和 dot3split
)无法向量化操作。例如,如果有多个元素,则只返回第一个。这可能会导致一些问题。
dot3dot3split(c("xxxxx.x...Alpha...Keep.1", "xxxxx.x...Alpha..Keep.2"))
# [1] "Keep.1"
既然你使用的是stringr,我建议你可以使用str_extract
来提取你想要的字符串,而不需要使用ifelse
或者可以进行向量化操作的函数
df <- data.frame(KPI = c("xxxxx.x...Alpha...apples",
"xxxxx.x...Alpha..bananas",
"Bravo...oranges",
"Bravo...grapes",
"xxxxx...Charlie...cherries",
"xxxxx...Charlie...guavas"))
library(dplyr)
library(stringr)
df1 <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = str_extract(KPI, "[A-Za-z]*$"))
df1
# KPI KPI.v2
# 1 xxxxx.x...Alpha...apples apples
# 2 xxxxx.x...Alpha..bananas bananas
# 3 Bravo...oranges oranges
# 4 Bravo...grapes grapes
# 5 xxxxx...Charlie...cherries cherries
# 6 xxxxx...Charlie...guavas guavas
我想根据格式以不同方式改变字符串。此示例有 2 种基于包含某些标点符号的格式。向量的每个元素都包含与格式唯一关联的特定词。
我尝试了多种使用 ifelse 和 casewhen 的方法,但没有得到想要的结果,即 "keep" 字符串的最后一部分。
我正在尝试使用简单的动词,但不精通 grex。接受任何有关有效通用方法的建议。
library(dplyr)
library(stringr)
df <- data.frame(KPI = c("xxxxx.x...Alpha...Keep.1",
"xxxxx.x...Alpha..Keep.2",
"Bravo...Keep3",
"Bravo...Keep4",
"xxxxx...Charlie...Keep.5",
"xxxxx...Charlie...Keep.6"))
dot3dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][3]
dot3dot3split("xxxxx.x...Alpha...Keep.1") # returns as expected
"Keep.1"
dot3split <- function(x) strsplit(x, "..." , fixed = TRUE)[[1]][2]
dot3split("Bravo...Keep3") # returns as expected
"Keep3"
df1 <- df %>% mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = ifelse(str_detect(KPI, paste(c("Alpha", "Charlie"), collapse = '|')), dot3dot3split(KPI),
ifelse(str_detect(KPI, "Bravo"), dot3split(KPI), KPI))) # not working as expected
df1$KPI.v2 "Keep.1" "Keep.1" "Alpha" "Alpha" "Keep.1" "Keep.1"
您设计的函数(dot3dot3split
和 dot3split
)无法向量化操作。例如,如果有多个元素,则只返回第一个。这可能会导致一些问题。
dot3dot3split(c("xxxxx.x...Alpha...Keep.1", "xxxxx.x...Alpha..Keep.2"))
# [1] "Keep.1"
既然你使用的是stringr,我建议你可以使用str_extract
来提取你想要的字符串,而不需要使用ifelse
或者可以进行向量化操作的函数
df <- data.frame(KPI = c("xxxxx.x...Alpha...apples",
"xxxxx.x...Alpha..bananas",
"Bravo...oranges",
"Bravo...grapes",
"xxxxx...Charlie...cherries",
"xxxxx...Charlie...guavas"))
library(dplyr)
library(stringr)
df1 <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(KPI.v2 = str_extract(KPI, "[A-Za-z]*$"))
df1
# KPI KPI.v2
# 1 xxxxx.x...Alpha...apples apples
# 2 xxxxx.x...Alpha..bananas bananas
# 3 Bravo...oranges oranges
# 4 Bravo...grapes grapes
# 5 xxxxx...Charlie...cherries cherries
# 6 xxxxx...Charlie...guavas guavas