如何将特定长度的数据帧的一列与另一个具有特定关键字匹配的向量相匹配?

How do I match a column of a dataframe of a particular length with another vector which has certain key-words to match to?

我的数据框Expenses如下所示:

date        name           expenditure      type
23MAR2013   KOSH ENTRP     4000             COMPANY
23MAR2013   JOHN DOE       800              INDIVIDUAL
24MAR2013   S KHAN         300              INDIVIDUAL
24MAR2013   JASINT PVT LTD 8000             COMPANY
25MAR2013   KOSH ENTRPRISE 2000             COMPANY
25MAR2013   JOHN S DOE     220              INDIVIDUAL
25MAR2013   S KHAN         300              INDIVIDUAL
26MAR2013   S KHAN         300              INDIVIDUAL

早些时候,我从 name 列中识别出重复名称和模式的存在,并将其存储在向量 NameVector 中,如下所示。

KOSH    JOHN DOE    KHAN    JASINT

我的问题是,如何将 Expenses$name 的每个字符串模式与向量 NameVector 匹配并在主数据框中以分类方式打印出来?

date        name           expenditure      type           category 
23MAR2013   KOSH ENTRP     4000             COMPANY        KOSH
23MAR2013   JOHN DOE       800              INDIVIDUAL     JOHN DOE
24MAR2013   S KHAN         300              INDIVIDUAL     KHAN          
24MAR2013   JASINT PVT LTD 8000             COMPANY        JASINT
25MAR2013   KOSH ENTRPRISE 2000             COMPANY        KOSH
25MAR2013   JOHN S DOE     220              INDIVIDUAL     JOHN DOE
25MAR2013   SALM KHAN      300              INDIVIDUAL     KHAN
26MAR2013   S KHAN         300              INDIVIDUAL     KHAN

我尝试使用 strsplit() 将名称的不同部分分成不同的列并尝试匹配使用 agrep() 的模式,但我 没有 获得所需的输出。进一步反省数据,我注意到有前导空格并去掉了它们,仍然不知道为什么我没有得到如上所示的输出。


上述 table 的 csv :

"Date","name","expenditure","type"
"23MAR2013","KOSH ENTRP",4000,"COMPANY"
"23MAR2013 ","JOHN DOE",800,"INDIVIDUAL"
"24MAR2013","S KHAN",300,"INDIVIDUAL"
"24MAR2013","JASINT PVT LTD",8000,"COMPANY"
"25MAR2013","KOSH ENTRPRISE",2000,"COMPANY"
"25MAR2013","JOHN S DOE",220,"INDIVIDUAL"
"25MAR2013","S KHAN",300,"INDIVIDUAL"
"26MAR2013","S KHAN",300,"INDIVIDUAL"

以及已 calculated/identifies 为

的名称向量
NameVector <- c("KOSH","JOHN DOE","KHAN","JASINT")

你可以试试

library(stringi)
pat <- paste(unlist(strsplit(NameVector, ' ')), collapse="|")
Expenses$category <- vapply(stri_extract_all_regex(Expenses$name, pat), 
           paste, collapse=' ', character(1L))
Expenses
#       date           name expenditure       type category
#1 23MAR2013     KOSH ENTRP        4000    COMPANY     KOSH
#2 23MAR2013       JOHN DOE         800 INDIVIDUAL JOHN DOE
#3 24MAR2013         S KHAN         300 INDIVIDUAL     KHAN
#4 24MAR2013 JASINT PVT LTD        8000    COMPANY   JASINT
#5 25MAR2013 KOSH ENTRPRISE        2000    COMPANY     KOSH
#6 25MAR2013     JOHN S DOE         220 INDIVIDUAL JOHN DOE
#7 25MAR2013         S KHAN         300 INDIVIDUAL     KHAN
#8 26MAR2013         S KHAN         300 INDIVIDUAL     KHAN