r- 多重匹配中的部分匹配

partial matching in r- multiple matches

我正在利用下面的代码与 1 个匹配进行部分匹配,但有一个后续问题:假设我们有一个额外的鱼标准,并且我们希望 "dog fish" 被归类为鱼和犬。这可能吗?

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                 "black panther", "short cat", "red bird",
                 "short bird stuffed", "big eagle", "bad sparrow",
                 "dog fish", "head dog", "brown yorkie",
                 "lab short bulldog"), label=1:14)

在代码开头定义正则表达式

regexes <- list(c("(cat|lion|tiger|panther)","feline"),
            c("(bird|eagle|sparrow)","avian"),
            c("(dog|yorkie|bulldog)","canine"))

创建一个向量,长度与 df 相同

output_vector <- character(nrow(d))

对于每个正则表达式..

for(i in seq_along(regexes)){

#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]} 

将现在填充的输出向量插入数据帧

d$species <- output_vector

期望的输出

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine, fish
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

原栈溢出题在这里:partial string matching r

我会通过交叉连接来完成。

library(dplyr)
library(stringi)

key = data_frame(partial = c("cat", "lion", "tiger", "panther",
                             "bird", "eagle", "sparrow",
                             "dog", "yorkie", "bulldog"),
                  category = c("feline", "feline", "feline", "feline",
                               "avian", "avian", "avian",
                               "canine", "canine", "canine"))

d %>%
  merge(key) %>%
  filter(name %>% stri_detect_fixed(partial) )