R - 如果顺序不同,如何比较两个单词的句子?
R - How to compare two words sentence if order is not same?
举个例子
z <- "Dikesh Faldu"
y <- "Faldu Dikesh"
我想处理这两个变量并给我与 "DikeshFaldu"
相同的输出
再举个例子
我有一个变量具有所有这些值,例如
[1] dikesh faldu
[2] xyz abc
[3] faldu dikesh
[4] anything like
[5] but only
[6] two words
[7] only but
.........
然后我想要像
这样的输出
[1] dikeshfaldu
[2] xyzabc
[3] dikeshfaldu
[4] anythinglike
[5] butonly
[6] twowords
[7] butonly
或
[1] faldudikesh
[2] xyzabc
[3] faldudikesh
[4] anythinglike
[5] onlybut
[6] twowords
[7] onlybut
这是您要查找的 R 代码。与zx8754提到的相同。
z <- "Dikesh Faldu"
y <- "Faldu Dikesh"
sort(unlist(strsplit(z,split=' '))) == sort(unlist(strsplit(y,split=' ')))
[1] TRUE TRUE
第一种情况,匹配一个或多个space(\s+
),替换为sub
中的""
sub("\s+", "", z)
#[1] "DikeshFaldu"
对于第二种情况,将非白色 space 捕获为一组并重新排列反向引用。
sub("(\S+)\s+(\S+)", "\2\1", y)
#[1] "DikeshFaldu"
如果sort
是按字母顺序排列的,那么我们使用stringi
中的stri_extract
来提取单词,sort
它和paste
一起
library(stringi)
vapply(stri_extract_all(c(z,y), regex = "\w+"), function(x)
paste(sort(x), collapse=""), character(1))
#[1] "DikeshFaldu" "DikeshFaldu"
使用 OP post
中的更新向量
vapply(stri_extract_all(charvec, regex = "\w+"),
function(x) paste(sort(x), collapse=""), character(1))
#[1] "dikeshfaldu" "abcxyz" "dikeshfaldu" "anythinglike"
#[5] "butonly" "twowords" "butonly"
数据
charvec <- c("dikesh faldu", "xyz abc", "faldu dikesh", "anything like",
"but only", "two words", "only but")
根据@zx8754、@Dirty Sock Sniffer 和@RHertel 的输入,您可以尝试,
sapply(strsplit(charvec, " "), function(x) paste(sort(x), collapse=""))
#[1] "dikeshfaldu" "abcxyz" "dikeshfaldu" "anythinglike" "butonly"
#[6] "twowords" "butonly"
其中
charvec <- c("dikesh faldu", "xyz abc", "faldu dikesh", "anything like",
"but only", "two words", "only but")
举个例子
z <- "Dikesh Faldu"
y <- "Faldu Dikesh"
我想处理这两个变量并给我与 "DikeshFaldu"
相同的输出再举个例子
我有一个变量具有所有这些值,例如
[1] dikesh faldu
[2] xyz abc
[3] faldu dikesh
[4] anything like
[5] but only
[6] two words
[7] only but
.........
然后我想要像
这样的输出[1] dikeshfaldu
[2] xyzabc
[3] dikeshfaldu
[4] anythinglike
[5] butonly
[6] twowords
[7] butonly
或
[1] faldudikesh
[2] xyzabc
[3] faldudikesh
[4] anythinglike
[5] onlybut
[6] twowords
[7] onlybut
这是您要查找的 R 代码。与zx8754提到的相同。
z <- "Dikesh Faldu"
y <- "Faldu Dikesh"
sort(unlist(strsplit(z,split=' '))) == sort(unlist(strsplit(y,split=' ')))
[1] TRUE TRUE
第一种情况,匹配一个或多个space(\s+
),替换为sub
""
sub("\s+", "", z)
#[1] "DikeshFaldu"
对于第二种情况,将非白色 space 捕获为一组并重新排列反向引用。
sub("(\S+)\s+(\S+)", "\2\1", y)
#[1] "DikeshFaldu"
如果sort
是按字母顺序排列的,那么我们使用stringi
中的stri_extract
来提取单词,sort
它和paste
一起
library(stringi)
vapply(stri_extract_all(c(z,y), regex = "\w+"), function(x)
paste(sort(x), collapse=""), character(1))
#[1] "DikeshFaldu" "DikeshFaldu"
使用 OP post
中的更新向量vapply(stri_extract_all(charvec, regex = "\w+"),
function(x) paste(sort(x), collapse=""), character(1))
#[1] "dikeshfaldu" "abcxyz" "dikeshfaldu" "anythinglike"
#[5] "butonly" "twowords" "butonly"
数据
charvec <- c("dikesh faldu", "xyz abc", "faldu dikesh", "anything like",
"but only", "two words", "only but")
根据@zx8754、@Dirty Sock Sniffer 和@RHertel 的输入,您可以尝试,
sapply(strsplit(charvec, " "), function(x) paste(sort(x), collapse=""))
#[1] "dikeshfaldu" "abcxyz" "dikeshfaldu" "anythinglike" "butonly"
#[6] "twowords" "butonly"
其中
charvec <- c("dikesh faldu", "xyz abc", "faldu dikesh", "anything like",
"but only", "two words", "only but")