从向量 2 中删除向量 1 中找到的字符串
Remove strings found in vector 1, from vector 2
我有这两个向量:
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
我正在尝试删除在 sample2 中找到的 sample1 字符串。我得到的最接近的是使用 sapply
进行迭代,这给了我这个:
sapply(sample1, function(i)gsub(i, "", sample2))
.aaa .aarp .abb .abbott .abogado
[1,] "try1.aarp" "try1" "try1.aarp" "try1.aarp" "try1.aarp"
[2,] "www.tryagain" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa"
[3,] "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255"
[4,] "onemoretry.abb.abogado" "onemoretry.abb.abogado" "onemoretry.abogado" "onemoretry.abb.abogado" "onemoretry.abb"
当然预期的输出应该是
[1] "www.tryagain" "try1" "onemoretry" "255.255.255.255"
感谢您的宝贵时间。
我们可以 paste
将 'sample1' 元素放在一起,将其用作 gsub
中的 pattern
参数,将其替换为 ''
.
gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
或使用mgsub
library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
试试这个,
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b")
# [1] "(\.aaa|\.aarp|\.abb|\.abbott|\.abogado)\b"
gsub(paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b"), "", sample2)
# [1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
解释:
sub("\.", "\\.", sample1)
转义所有点。由于点是正则表达式中的特殊字符。
paste(sub("\.", "\\.", sample1), collapse="|")
组合所有元素,以 |
作为分隔符。
paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b")
创建一个正则表达式,就像捕获组中存在的所有元素一样,后跟一个单词边界。 \b
是这里非常需要的。这样它就可以进行精确的单词匹配。
我有这两个向量:
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
我正在尝试删除在 sample2 中找到的 sample1 字符串。我得到的最接近的是使用 sapply
进行迭代,这给了我这个:
sapply(sample1, function(i)gsub(i, "", sample2))
.aaa .aarp .abb .abbott .abogado
[1,] "try1.aarp" "try1" "try1.aarp" "try1.aarp" "try1.aarp"
[2,] "www.tryagain" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa" "www.tryagain.aaa"
[3,] "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255" "255.255.255.255"
[4,] "onemoretry.abb.abogado" "onemoretry.abb.abogado" "onemoretry.abogado" "onemoretry.abb.abogado" "onemoretry.abb"
当然预期的输出应该是
[1] "www.tryagain" "try1" "onemoretry" "255.255.255.255"
感谢您的宝贵时间。
我们可以 paste
将 'sample1' 元素放在一起,将其用作 gsub
中的 pattern
参数,将其替换为 ''
.
gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
或使用mgsub
library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
试试这个,
sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b")
# [1] "(\.aaa|\.aarp|\.abb|\.abbott|\.abogado)\b"
gsub(paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b"), "", sample2)
# [1] "try1" "www.tryagain" "255.255.255.255" "onemoretry"
解释:
sub("\.", "\\.", sample1)
转义所有点。由于点是正则表达式中的特殊字符。paste(sub("\.", "\\.", sample1), collapse="|")
组合所有元素,以|
作为分隔符。paste0("(",paste(sub("\.", "\\.", sample1), collapse="|"),")\b")
创建一个正则表达式,就像捕获组中存在的所有元素一样,后跟一个单词边界。\b
是这里非常需要的。这样它就可以进行精确的单词匹配。