根据 r 中另一个数据框中的列填充数据框中的列

Question

我有一个评论数据框，看起来像这样 (df1)

Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine 
Android phones are great these days
I lost my visa card today

我有另一个商家名称数据框，如下所示 (df2)：

Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart

如果 df2 中的 merchant_name 出现在 df 1 中的评论中，则将该商家名称附加到 df1 中 R.The 中的第二列匹配不需要是精确的 match.An 近似值required.Also 是什么，df1 包含大约 500K 行！我的最终 ooutput df 可能看起来像这样

Comments                                                        Merchant
Apple laptops are really good for work,we should buy them       Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine                         Google
Android phones are great these days                             Android
I lost my visa card today                                       Visa

我如何在 R 中高效地执行此操作？？谢谢

Answer 1

这是 regex 的工作。查看 lapply.

中的 grepl 命令

comments = c(
   'Apple laptops are really good for work,we should buy them',
   'Apple Iphones are too costly,we can resort to some other brands',
   'Google search is the best search engine ',
   'Android phones are great these days',
   'I lost my visa card today'
)

brands = c(
   'Google',
   'Android',
   'Geoni',
   'Visa',
   'Apple',
   'MC',
   'WallMart'
)

brandinpattern = lapply(
   brands,
   function(brand) {
      commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
      if ( sum(commentswithbrand) > 0) {
         data.frame(
            comment = comments[commentswithbrand],
            brand = brand
         )
      } else {
         data.frame()
      }
   }
)

brandinpattern = do.call(rbind, brandinpattern)


> do.call(rbind, brandinpattern)
                                                          comment   brand
1                        Google search is the best search engine   Google
2                             Android phones are great these days Android
3                                       I lost my visa card today    Visa
4       Apple laptops are really good for work,we should buy them   Apple
5 Apple Iphones are too costly,we can resort to some other brands   Apple

Answer 2

试试这个

final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)

for(i in df1$Comments){
    for(j in df2$Merchant_Name){ 
        if(grepl(tolower(j),tolower(i))){ 
            final_df[nrow(final_df) + 1,] <- c(i, j)
            break
        }
    }
}


final_df

##                                                        comments  brands
##1       Apple laptops are really good for work,we should buy them   Apple
##2 Apple Iphones are too costly,we can resort to some other brands   Apple
##3                        Google search is the best search engine   Google
##4                             Android phones are great these days Android
##5                                       I lost my visa card today    Visa

根据 r 中另一个数据框中的列填充数据框中的列

Filling a column in a dataframe based on a column in another dataframe in r

r

data-analysis