查找特定字符串并将该字符串添加到列
Finding specific string and adding that string to column
我想先在向量中找到一个字符串,然后用长度相同或长度为 1 的匹配向量替换它。我使用了具有 multigsub 函数的 qdap 包。虽然它只是取代了一切。所需输出的示例(以及带循环的解决方案)。此外,我不希望找到 "Jabad"。
df1 <- data.frame(string = c("Erik is pretty good", "Fred is regular", "James is bad", "Jabad is extra"))
replacements <- c("good", "regular", "bad")
df1$status <- NA
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- replacements[i]
}
df1
第二个例子
df1$status <- "Status unknown"
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- "Status known"
}
df1
寻找类似于 multigsub 的东西,其中 is 可以指定两个向量,例如 c("... Good ...", "... Best ...", "... Regular ... ", "... Extra" ...) 将被替换为
c("Good", "Good", "Regular", "Best")。在这种情况下,multigsub 将 return me 文本 before/after 单词(在本例中表示为 ...)。
如果我理解你的情况,这就是你想要的。它使用库 stringr
中的 str_extract
函数。
我添加了几个案例来演示
变量 s
将保存您正在搜索的字符串,而 r
将保存找到的值的替换值。
library(stringr)
df = structure(list(string = structure(c(1L, 2L, 5L, 3L, 4L, 6L), .Label = c("Erik is pretty good",
"Fred is regular", "Jabad is extra", "Jabad is unknown", "James is bad",
"John is best"), class = "factor")), .Names = "string", row.names = c(NA,
-6L), class = "data.frame")
s = c('good', 'best', 'regular', 'bad', 'extra')
r = c('Good', 'Good', 'Regular', 'Bad', 'Best')
names(r) <- s
pat = paste0("\b(", paste0(s, collapse = "|"), ")\b")
z = str_extract(df$string, pat)
# Lookup function will return NA when input is NA
lookup <- function(x, s, r){
i = match(x, s)
if(is.na(i)) return(NA)
r[[i]]
}
df$Status = sapply(z, lookup, s=s, r=r)
df = transform(df, Status2 = ifelse(is.na(Status), "Status Unknown", "Status Known"))
结果 data.frame 是:
string Status Status2
1 Erik is pretty good Good Status Known
2 Fred is regular Regular Status Known
3 James is bad Bad Status Known
4 Jabad is extra Best Status Known
5 Jabad is unknown <NA> Status Unknown
6 John is best Good Status Known
我想先在向量中找到一个字符串,然后用长度相同或长度为 1 的匹配向量替换它。我使用了具有 multigsub 函数的 qdap 包。虽然它只是取代了一切。所需输出的示例(以及带循环的解决方案)。此外,我不希望找到 "Jabad"。
df1 <- data.frame(string = c("Erik is pretty good", "Fred is regular", "James is bad", "Jabad is extra"))
replacements <- c("good", "regular", "bad")
df1$status <- NA
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- replacements[i]
}
df1
第二个例子
df1$status <- "Status unknown"
for(i in 1:3){
df1[grepl(replacements[i], df1$string), "status"] <- "Status known"
}
df1
寻找类似于 multigsub 的东西,其中 is 可以指定两个向量,例如 c("... Good ...", "... Best ...", "... Regular ... ", "... Extra" ...) 将被替换为 c("Good", "Good", "Regular", "Best")。在这种情况下,multigsub 将 return me 文本 before/after 单词(在本例中表示为 ...)。
如果我理解你的情况,这就是你想要的。它使用库 stringr
中的 str_extract
函数。
我添加了几个案例来演示
变量 s
将保存您正在搜索的字符串,而 r
将保存找到的值的替换值。
library(stringr)
df = structure(list(string = structure(c(1L, 2L, 5L, 3L, 4L, 6L), .Label = c("Erik is pretty good",
"Fred is regular", "Jabad is extra", "Jabad is unknown", "James is bad",
"John is best"), class = "factor")), .Names = "string", row.names = c(NA,
-6L), class = "data.frame")
s = c('good', 'best', 'regular', 'bad', 'extra')
r = c('Good', 'Good', 'Regular', 'Bad', 'Best')
names(r) <- s
pat = paste0("\b(", paste0(s, collapse = "|"), ")\b")
z = str_extract(df$string, pat)
# Lookup function will return NA when input is NA
lookup <- function(x, s, r){
i = match(x, s)
if(is.na(i)) return(NA)
r[[i]]
}
df$Status = sapply(z, lookup, s=s, r=r)
df = transform(df, Status2 = ifelse(is.na(Status), "Status Unknown", "Status Known"))
结果 data.frame 是:
string Status Status2
1 Erik is pretty good Good Status Known
2 Fred is regular Regular Status Known
3 James is bad Bad Status Known
4 Jabad is extra Best Status Known
5 Jabad is unknown <NA> Status Unknown
6 John is best Good Status Known