用操纵模式替换多个模式
Replacing multiple patterns with manipulated pattern
我有一个文本字符串,我想从中转换
text = "end back@drive@o correct back@drive@adjust@cats@do to tok"
到
"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
相反,一般来说,我想替换
"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"
等等。我下面的尝试使用 stringr
包。
patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
pretuples = y[-length(y)]
posttuples = y[-1]
paste(paste0(pretuples, "@", posttuples), collapse = " ")
})
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)
我不认为str_replace_all
是我最后要找的功能,当然它(合理)returns
[1] "end back@drive drive@o correct back@drive@adjust to tok"
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
谁能帮我解决这个问题?
非常感谢。
编辑:到目前为止的回复非常有用,但我正在解析的文件很大,我真的不知道这个 a@b@c@d...
模式将被链接多少次。是否有更通用的解决方案不依赖于模式长度的硬编码(正如我在上面尝试过的那样)?
> gsub(x = text, pattern = '@(.*?)@', replacement = '@\1 \1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
您需要提供更多关于您预计会遇到的情况的示例,但解决方案将与上述方向相同。
回应评论 - 您可能需要 运行 一系列
gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\1 \1@')
在你的文字上,直到它不改变。同样,如果没有更多的测试用例,就无法确定。
我会用 gsub
:
> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\1@\2 \2@\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
尝试
pat <- "(\s|\b)[^@]+\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\2 \2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
对于'str1'
gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c" "a@b b@c c@d"
#[3] "a@b b@c c@d d@e e@f f@g g@h"
数据
text <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")
我有一个文本字符串,我想从中转换
text = "end back@drive@o correct back@drive@adjust@cats@do to tok"
到
"end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
相反,一般来说,我想替换
"a@b@c" with "a@b b@c"
"a@b@c@d" with "a@b b@c c@d"
等等。我下面的尝试使用 stringr
包。
patterns = unlist(str_extract_all(text, "([[:alnum:]]+@){2,}[[:alnum:]]+"))
replacements = strsplit(patterns, "@")
replacements = lapply(replacements, function(y) {
pretuples = y[-length(y)]
posttuples = y[-1]
paste(paste0(pretuples, "@", posttuples), collapse = " ")
})
replacements = do.call(c, replacements)
str_replace_all(text, pattern = patterns, replacement = replacements)
我不认为str_replace_all
是我最后要找的功能,当然它(合理)returns
[1] "end back@drive drive@o correct back@drive@adjust to tok"
[2] "end back@drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
谁能帮我解决这个问题?
非常感谢。
编辑:到目前为止的回复非常有用,但我正在解析的文件很大,我真的不知道这个 a@b@c@d...
模式将被链接多少次。是否有更通用的解决方案不依赖于模式长度的硬编码(正如我在上面尝试过的那样)?
> gsub(x = text, pattern = '@(.*?)@', replacement = '@\1 \1@')
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
您需要提供更多关于您预计会遇到的情况的示例,但解决方案将与上述方向相同。
回应评论 - 您可能需要 运行 一系列
gsub(x = text, pattern = '@([[:alnum:]]{1,})@', replacement = '@\1 \1@')
在你的文字上,直到它不改变。同样,如果没有更多的测试用例,就无法确定。
我会用 gsub
:
> text = "end back@drive@o correct back@drive@adjust to tok"
> gsub(pattern = "([[:alpha:]]+)@([[:alpha:]]+)@([[:alpha:]]+)", replacement = "\1@\2 \2@\3", x = text)
[1] "end back@drive drive@o correct back@drive drive@adjust to tok"
尝试
pat <- "(\s|\b)[^@]+\s(*SKIP)(*FAIL)|(?<=@)([^@]*)(?=@)"
repl <- "\2 \2"
gsub(pat, repl, text, perl=TRUE)
#[1] "end back@drive drive@o correct back@drive drive@adjust adjust@cats cats@do to tok"
对于'str1'
gsub(pat, repl, str1, perl=TRUE)
#[1] "a@b b@c" "a@b b@c c@d"
#[3] "a@b b@c c@d d@e e@f f@g g@h"
数据
text <- "end back@drive@o correct back@drive@adjust@cats@do to tok"
str1 <- c("a@b@c", "a@b@c@d", "a@b@c@d@e@f@g@h")