R - 提取匹配和不匹配字符串的部分
R - Extract portions of matching and non matching strings
我需要提取两列之间匹配和不匹配的字符串部分:
x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y)
获得:
yonly <- c("orange, grapes, carrots")
xonly <- c("banana, pine nuts")
both <- c("apple, almond")
k <- data.frame(cbind(x,y,both,yonly,xonly))
我研究了 str_detect、相交等,但这些似乎需要对现有细胞进行大手术才能将它们分成不同的细胞。这是一个包含其他列的相当大的数据集,所以我不想对它进行太多操作。你能帮我想出一个更简单的解决方案吗?
谢谢!
您可以使用 setdiff
和 intersect
> j <- data.frame(x,y, stringsAsFactors = FALSE)
> X <- strsplit(j$x, ",\s*")[[1]]
> Y <- strsplit(j$y, ",\s*")[[1]]
>
> #Yonly
> setdiff(Y, X)
[1] "orange" "grapes" "carrots"
>
> #Xonly
> setdiff(X, Y)
[1] "banana" "pine nuts"
>
> #Both
> intersect(X, Y)
[1] "apple" "almond"
要按照您的描述创建更长数据帧的额外列 j
,您可以使用 mapply
和 Jilber Urbina 的回答中使用的方法...
#set up data
x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y,stringsAsFactors = FALSE)
j[,c("yonly","xonly","both")] <- mapply(function(x,y) {
x2 <- unlist(strsplit(x, ",\s*"))
y2 <- unlist(strsplit(y, ",\s*"))
yonly <- paste(setdiff(y2, x2), collapse=", ")
xonly <- paste(setdiff(x2, y2), collapse=", ")
both <- paste(intersect(x2, y2), collapse=", ")
return(c(yonly, xonly, both)) },
j$x,j$y)
j
x y yonly xonly both
1 apple, banana, pine nuts, almond orange, apple, almond, grapes, carrots orange, grapes, carrots banana, pine nuts apple, almond
我需要提取两列之间匹配和不匹配的字符串部分:
x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y)
获得:
yonly <- c("orange, grapes, carrots")
xonly <- c("banana, pine nuts")
both <- c("apple, almond")
k <- data.frame(cbind(x,y,both,yonly,xonly))
我研究了 str_detect、相交等,但这些似乎需要对现有细胞进行大手术才能将它们分成不同的细胞。这是一个包含其他列的相当大的数据集,所以我不想对它进行太多操作。你能帮我想出一个更简单的解决方案吗?
谢谢!
您可以使用 setdiff
和 intersect
> j <- data.frame(x,y, stringsAsFactors = FALSE)
> X <- strsplit(j$x, ",\s*")[[1]]
> Y <- strsplit(j$y, ",\s*")[[1]]
>
> #Yonly
> setdiff(Y, X)
[1] "orange" "grapes" "carrots"
>
> #Xonly
> setdiff(X, Y)
[1] "banana" "pine nuts"
>
> #Both
> intersect(X, Y)
[1] "apple" "almond"
要按照您的描述创建更长数据帧的额外列 j
,您可以使用 mapply
和 Jilber Urbina 的回答中使用的方法...
#set up data
x <- c("apple, banana, pine nuts, almond")
y <- c("orange, apple, almond, grapes, carrots")
j <- data.frame(x,y,stringsAsFactors = FALSE)
j[,c("yonly","xonly","both")] <- mapply(function(x,y) {
x2 <- unlist(strsplit(x, ",\s*"))
y2 <- unlist(strsplit(y, ",\s*"))
yonly <- paste(setdiff(y2, x2), collapse=", ")
xonly <- paste(setdiff(x2, y2), collapse=", ")
both <- paste(intersect(x2, y2), collapse=", ")
return(c(yonly, xonly, both)) },
j$x,j$y)
j
x y yonly xonly both
1 apple, banana, pine nuts, almond orange, apple, almond, grapes, carrots orange, grapes, carrots banana, pine nuts apple, almond