查找特定字符串并将该字符串添加到列

Finding specific string and adding that string to column

我想先在向量中找到一个字符串,然后用长度相同或长度为 1 的匹配向量替换它。我使用了具有 multigsub 函数的 qdap 包。虽然它只是取代了一切。所需输出的示例(以及带循环的解决方案)。此外,我不希望找到 "Jabad"。

df1 <- data.frame(string = c("Erik is pretty good", "Fred is regular", "James is bad", "Jabad is extra"))

replacements <- c("good", "regular", "bad")

df1$status <- NA

for(i in 1:3){

  df1[grepl(replacements[i], df1$string), "status"] <- replacements[i]

}

df1

第二个例子

df1$status <- "Status unknown"

for(i in 1:3){

  df1[grepl(replacements[i], df1$string), "status"] <- "Status known"


}

df1

寻找类似于 multigsub 的东西,其中 is 可以指定两个向量,例如 c("... Good ...", "... Best ...", "... Regular ... ", "... Extra" ...) 将被替换为 c("Good", "Good", "Regular", "Best")。在这种情况下,multigsub 将 return me 文本 before/after 单词(在本例中表示为 ...)。

如果我理解你的情况,这就是你想要的。它使用库 stringr 中的 str_extract 函数。

我添加了几个案例来演示

变量 s 将保存您正在搜索的字符串,而 r 将保存找到的值的替换值。

library(stringr)

df = structure(list(string = structure(c(1L, 2L, 5L, 3L, 4L, 6L), .Label = c("Erik is pretty good",
"Fred is regular", "Jabad is extra", "Jabad is unknown", "James is bad",
"John is best"), class = "factor")), .Names = "string", row.names = c(NA,
-6L), class = "data.frame")

s = c('good', 'best', 'regular', 'bad', 'extra')
r = c('Good', 'Good', 'Regular', 'Bad', 'Best')
names(r) <- s

pat = paste0("\b(", paste0(s, collapse = "|"), ")\b")

z = str_extract(df$string, pat)

# Lookup function will return NA when input is NA 
lookup <- function(x, s, r){
    i = match(x, s)
    if(is.na(i)) return(NA)
    r[[i]]
}

df$Status = sapply(z, lookup, s=s, r=r)

df = transform(df, Status2 = ifelse(is.na(Status), "Status Unknown", "Status Known"))

结果 data.frame 是:

               string  Status        Status2
1 Erik is pretty good    Good   Status Known
2     Fred is regular Regular   Status Known
3        James is bad     Bad   Status Known
4      Jabad is extra    Best   Status Known
5    Jabad is unknown    <NA> Status Unknown
6        John is best    Good   Status Known