如何使我的 str_detect 代码在 R 中更高效?
How to make my str_detect code more efficient in R?
我想检查在我的观察 (1867) 中是否存在 6900 个名称列表中的一只股票,如果存在,请在不同的列中写“是”。
这是我的代码:
for (i in 1:length(df$upvotes)){
if (str_detect(df$text[i], pattern = paste("[:space:](\$?)",stocks$Stocks,"(\$?)[:space:]",collapse ="|"))){
df$call[i] <- "yes"
}}
问题是 20 分钟后仍然是 运行,而我的计算机却很热。如果我删除所有正则表达式,它会在几分钟内完成任务。
如何改进代码以使其更高效?
中提供的解决方案似乎适用于您提供的数据。
例如:
stocks <- data.frame(stock = c("AAPL","AACG","AACQ","AACQU","AACQW","AAIC"))
stocksearch <- paste0("[\s$]",stocks$stock,"+[\s$]",collapse ="|")
which(str_detect(df$text, stocksearch))
[1] 352
df$text[352]
[1] "Blew up my account with AMD and AAPL calls expiring after earnings.\n\nNow I'm going bear mode and hitting up that VIX and spy Puts"
#Dataframe update
df$call[which(str_detect(df$text, stocksearch))] <- "yes"
我想检查在我的观察 (1867) 中是否存在 6900 个名称列表中的一只股票,如果存在,请在不同的列中写“是”。
这是我的代码:
for (i in 1:length(df$upvotes)){
if (str_detect(df$text[i], pattern = paste("[:space:](\$?)",stocks$Stocks,"(\$?)[:space:]",collapse ="|"))){
df$call[i] <- "yes"
}}
问题是 20 分钟后仍然是 运行,而我的计算机却很热。如果我删除所有正则表达式,它会在几分钟内完成任务。 如何改进代码以使其更高效?
例如:
stocks <- data.frame(stock = c("AAPL","AACG","AACQ","AACQU","AACQW","AAIC"))
stocksearch <- paste0("[\s$]",stocks$stock,"+[\s$]",collapse ="|")
which(str_detect(df$text, stocksearch))
[1] 352
df$text[352]
[1] "Blew up my account with AMD and AAPL calls expiring after earnings.\n\nNow I'm going bear mode and hitting up that VIX and spy Puts"
#Dataframe update
df$call[which(str_detect(df$text, stocksearch))] <- "yes"