R中两列之间的部分字符串匹配

Partial string matching between two columns in R

我正在尝试验证列表的电子邮件是否正确。我在想我可以在电子邮件和名称列之间进行部分字符串匹配,并且 return 是新列中的逻辑向量 (TRUE/FALSE)。

在下面的示例中,只有第 3 行和第 5 行具有正确的电子邮件,这些行的输出将为 'TRUE'。我尝试了以下方法,但没有用:

>for (i in Test$LastName) {
 Test$Match <- agrepl(i, Test$Email, ignore.case = TRUE)
}

>Test$Email %in% Test$LastName

也欢迎任何其他建议。谢谢!

试试这样的东西?你快到了,只需要将 TRUE/FALSE 存储在一个向量中。我使用了 sapply,遍历行名并比较相应的列。在 sapply 中,结果存储在一个向量中,因此您可以将其用作 TRUE/FALSE:

test = data.frame(FirstName=c("Audrey","Tammy","Stacey","Judson","Kellie"),
LastName=c("Low","Rose","Lock","Porter","Sims"),
Email=c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))

matches = sapply(1:nrow(test),function(i)agrepl(test$LastName[i],test$Email[i]))

test[matches,]

  FirstName LastName                Email
3    Stacey     Lock stacy.lock@gmail.com
5    Kellie     Sims     k.sims@gmail.com

试试这个:

DF <- data.frame(FirstName = c("Audrey","Tammy","Stacey","Judson","Kellie"),
                 LastName = c("Low","Rose","Lock","Porter","Sims"),
                 Email = c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))
library(dplyr)

DF %>% 
  rowwise() %>%
  mutate(isMatch = grepl(LastName, Email, ignore.case = T))

输出:

  FirstName LastName Email                    isMatch    
  <fct>     <fct>    <fct>                    <lgl>
1 Audrey    Low      T.Rose@gmail.com         FALSE
2 Tammy     Rose     A.Low@gmail.com          FALSE
3 Stacey    Lock     stacy.lock@gmail.com     TRUE 
4 Judson    Porter   beth.mccormick@gmail.com FALSE
5 Kellie    Sims     k.sims@gmail.com         TRUE 

基础 R 选项是使用 grepl + mapply

Test <- within(Test, Match <- mapply(grepl,paste(FirstNmae,LastName,sep = "|"),Email,ignore.case = TRUE))

这样

> Test
  FirstNmae LastName                    Email Match
1    Audrey      Low         T.Rose@gmail.com FALSE
2     Tammy     Rose          A.Low@gmail.com FALSE
3    Stacey     Lock     stacy.lock@gmail.com  TRUE
4    Judson   Porter beth.mccormick@gmail.com FALSE
5    Kellie     Sims         k.sims@gmail.com  TRUE

数据

Test <- data.frame(FirstNmae = c("Audrey","Tammy","Stacey","Judson","Kellie"),
                 LastName = c("Low","Rose","Lock","Porter","Sims"),
                 Email = c("T.Rose@gmail.com","A.Low@gmail.com","stacy.lock@gmail.com","beth.mccormick@gmail.com","k.sims@gmail.com"))