根据 r 中另一个数据框中的列填充数据框中的列
Filling a column in a dataframe based on a column in another dataframe in r
我有一个评论数据框,看起来像这样 (df1)
Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine
Android phones are great these days
I lost my visa card today
我有另一个商家名称数据框,如下所示 (df2):
Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart
如果 df2 中的 merchant_name 出现在 df 1 中的评论中,则将该商家名称附加到 df1 中 R.The 中的第二列匹配不需要是精确的 match.An 近似值required.Also 是什么,df1 包含大约 500K 行!
我的最终 ooutput df 可能看起来像这样
Comments Merchant
Apple laptops are really good for work,we should buy them Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine Google
Android phones are great these days Android
I lost my visa card today Visa
我如何在 R 中高效地执行此操作??
谢谢
这是 regex
的工作。查看 lapply
.
中的 grepl
命令
comments = c(
'Apple laptops are really good for work,we should buy them',
'Apple Iphones are too costly,we can resort to some other brands',
'Google search is the best search engine ',
'Android phones are great these days',
'I lost my visa card today'
)
brands = c(
'Google',
'Android',
'Geoni',
'Visa',
'Apple',
'MC',
'WallMart'
)
brandinpattern = lapply(
brands,
function(brand) {
commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
if ( sum(commentswithbrand) > 0) {
data.frame(
comment = comments[commentswithbrand],
brand = brand
)
} else {
data.frame()
}
}
)
brandinpattern = do.call(rbind, brandinpattern)
> do.call(rbind, brandinpattern)
comment brand
1 Google search is the best search engine Google
2 Android phones are great these days Android
3 I lost my visa card today Visa
4 Apple laptops are really good for work,we should buy them Apple
5 Apple Iphones are too costly,we can resort to some other brands Apple
试试这个
final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)
for(i in df1$Comments){
for(j in df2$Merchant_Name){
if(grepl(tolower(j),tolower(i))){
final_df[nrow(final_df) + 1,] <- c(i, j)
break
}
}
}
final_df
## comments brands
##1 Apple laptops are really good for work,we should buy them Apple
##2 Apple Iphones are too costly,we can resort to some other brands Apple
##3 Google search is the best search engine Google
##4 Android phones are great these days Android
##5 I lost my visa card today Visa
我有一个评论数据框,看起来像这样 (df1)
Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine
Android phones are great these days
I lost my visa card today
我有另一个商家名称数据框,如下所示 (df2):
Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart
如果 df2 中的 merchant_name 出现在 df 1 中的评论中,则将该商家名称附加到 df1 中 R.The 中的第二列匹配不需要是精确的 match.An 近似值required.Also 是什么,df1 包含大约 500K 行! 我的最终 ooutput df 可能看起来像这样
Comments Merchant
Apple laptops are really good for work,we should buy them Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine Google
Android phones are great these days Android
I lost my visa card today Visa
我如何在 R 中高效地执行此操作?? 谢谢
这是 regex
的工作。查看 lapply
.
grepl
命令
comments = c(
'Apple laptops are really good for work,we should buy them',
'Apple Iphones are too costly,we can resort to some other brands',
'Google search is the best search engine ',
'Android phones are great these days',
'I lost my visa card today'
)
brands = c(
'Google',
'Android',
'Geoni',
'Visa',
'Apple',
'MC',
'WallMart'
)
brandinpattern = lapply(
brands,
function(brand) {
commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
if ( sum(commentswithbrand) > 0) {
data.frame(
comment = comments[commentswithbrand],
brand = brand
)
} else {
data.frame()
}
}
)
brandinpattern = do.call(rbind, brandinpattern)
> do.call(rbind, brandinpattern)
comment brand
1 Google search is the best search engine Google
2 Android phones are great these days Android
3 I lost my visa card today Visa
4 Apple laptops are really good for work,we should buy them Apple
5 Apple Iphones are too costly,we can resort to some other brands Apple
试试这个
final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)
for(i in df1$Comments){
for(j in df2$Merchant_Name){
if(grepl(tolower(j),tolower(i))){
final_df[nrow(final_df) + 1,] <- c(i, j)
break
}
}
}
final_df
## comments brands
##1 Apple laptops are really good for work,we should buy them Apple
##2 Apple Iphones are too costly,we can resort to some other brands Apple
##3 Google search is the best search engine Google
##4 Android phones are great these days Android
##5 I lost my visa card today Visa