使用 grepl 创建基于另一列的列
Create a Column Based on Another Column Using grepl
让我们考虑一个包含两列 word
和 stem
的 df
。我想创建一个新列来检查 stem
中的值是否包含在 word
中,以及它前面或后面是否有更多字符。最终结果应如下所示:
WORD STEM NEW
rerun run prefixed
runner run suffixed
run run none
... ... ...
到目前为止,您可以在下面看到我的代码。但是,它不起作用,因为 grepl
表达式应用于 df
的所有行。不管怎样,我觉得应该把我的想法说清楚了。
df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
您可以使用 mapply
每行使用 grepl
,例如:
ifelse(mapply(grepl, paste0(".+", x$STEM, ".+"), x$WORD), "both",
ifelse(mapply(grepl, paste0(x$STEM, ".+"), x$WORD), "suffixed",
ifelse(mapply(grepl, paste0(".+", x$STEM), x$WORD), "prefixed", "none")))
#"prefixed" "suffixed" "none"
或使用 startsWith
和 endsWith
并使用子集形式向量:
c("none", "both", "prefixed", "suffixed")[1 + (1 + startsWith(x$WORD, x$STEM) +
2*endsWith(x$WORD, x$STEM)) * (nchar(x$WORD) > nchar(x$STEM) &
mapply(grepl, x$STEM, x$WORD))]
#[1] "suffixed" "prefixed" "none"
您可以像这样创建 new
列
df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both')))
或者,在您处于 dplyr
管道中并且您想避免所有烦人的 df$
df %>%
mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both'))))
输出
# word stem new1
# 1 rerun run prefixed
# 2 runner run suffixed
# 3 run run none
# 4 aruna run both
这是 str_locate
来自 stringr
和 dplyr
的方法:
library(dplyr)
library(stringr)
data %>%
mutate_at(vars(WORD,STEM), as.character) %>%
mutate(NEW =
case_when(str_locate(WORD,STEM)[,"start"] > 1 &
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
TRUE ~ "none"))
WORD STEM NEW
1 rerun run prefixed
2 runner run suffixed
3 run run none
我添加了一行以将 WORD
和 STEM
转换为字符,以防它们是因子。
让我们考虑一个包含两列 word
和 stem
的 df
。我想创建一个新列来检查 stem
中的值是否包含在 word
中,以及它前面或后面是否有更多字符。最终结果应如下所示:
WORD STEM NEW
rerun run prefixed
runner run suffixed
run run none
... ... ...
到目前为止,您可以在下面看到我的代码。但是,它不起作用,因为 grepl
表达式应用于 df
的所有行。不管怎样,我觉得应该把我的想法说清楚了。
df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
您可以使用 mapply
每行使用 grepl
,例如:
ifelse(mapply(grepl, paste0(".+", x$STEM, ".+"), x$WORD), "both",
ifelse(mapply(grepl, paste0(x$STEM, ".+"), x$WORD), "suffixed",
ifelse(mapply(grepl, paste0(".+", x$STEM), x$WORD), "prefixed", "none")))
#"prefixed" "suffixed" "none"
或使用 startsWith
和 endsWith
并使用子集形式向量:
c("none", "both", "prefixed", "suffixed")[1 + (1 + startsWith(x$WORD, x$STEM) +
2*endsWith(x$WORD, x$STEM)) * (nchar(x$WORD) > nchar(x$STEM) &
mapply(grepl, x$STEM, x$WORD))]
#[1] "suffixed" "prefixed" "none"
您可以像这样创建 new
列
df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both')))
或者,在您处于 dplyr
管道中并且您想避免所有烦人的 df$
df %>%
mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both'))))
输出
# word stem new1
# 1 rerun run prefixed
# 2 runner run suffixed
# 3 run run none
# 4 aruna run both
这是 str_locate
来自 stringr
和 dplyr
的方法:
library(dplyr)
library(stringr)
data %>%
mutate_at(vars(WORD,STEM), as.character) %>%
mutate(NEW =
case_when(str_locate(WORD,STEM)[,"start"] > 1 &
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
TRUE ~ "none"))
WORD STEM NEW
1 rerun run prefixed
2 runner run suffixed
3 run run none
我添加了一行以将 WORD
和 STEM
转换为字符,以防它们是因子。