每当一行包含字符串中的字符时,如何添加一个计数器?

How can I add a counter for each time a row contains a character in a string?

我有一个数据框,其中主要包含我要与模式列表匹配的字符。我试图在末尾添加一列来计算该行中列表的命中次数。基本上,我要做的是

patterns <- c("Yes", "No", "Maybe")
df <- data.frame (first_column  = c("Why", "Sure", "But", ...),
                  second_column = c("Yes", "Okay", "If Only" ...),
                  third_column = c("No", "When", "Maybe so" ...),
                  fourth_column = c("But", "I won't", "Truth" ...)
                  )

在 运行 代码之后,在标有“计数器”的第五列下方,您会看到 2、0、1,... 现在我用一对嵌套的 for 循环来完成这个,里面有一个 if 语句。它适用于玩具数据集,但我认为如果我在全尺寸数据上尝试它会崩溃。使用 dplyr、grepl 或 lapply 有更好的方法吗?我的直觉是 dplyr,但我不确定该怎么做。我的代码如下:

filename = choose.files(caption='Select File')
cases = read.csv(filename)
cases = cbind(cases, counter=0)
l = nrow(cases)
col = ncol(cases)
for (i in 1:l){
  for (j in 1:col){
    if(cases[i,j] %in% patterns)
    {
      cases$counter[i]=cases$counter[i]+1
      }
    }
  
}

试试这个

df$counter <- rowSums(vapply(df, function(x, p) grepl(p, x), integer(nrow(df)), paste0(patterns, collapse = "|")))

输出

> df
  first_column second_column third_column fourth_column counter
1          Why           Yes           No           But       2
2         Sure          Okay         When       I won't       0
3          But       If Only     Maybe so         Truth       1

dplyr 中我们可以使用 rowwisec_across :

library(dplyr)
df %>% rowwise() %>% mutate(counter = sum(c_across() %in% patterns))

#  first_column second_column third_column fourth_column counter
#  <chr>        <chr>         <chr>        <chr>           <int>
#1 Why          Yes           No           But                 2
#2 Sure         Okay          When         I won't             0
#3 But          If Only       Maybe so     Truth               0

我们可以使用 mapreduce

按列执行此操作
library(dplyr)
library(purrr)
df %>% 
  mutate(counter = map(patterns, ~ rowSums(cur_data() == .x)) %>% 
                                reduce(`+`))
#  first_column second_column third_column fourth_column counter
#1          Why           Yes           No           But       2
#2         Sure          Okay         When       I won't       0
#3          But       If Only     Maybe so         Truth       0

数据

df <- structure(list(first_column = c("Why", "Sure", "But"), 
    second_column = c("Yes", 
"Okay", "If Only"), third_column = c("No", "When", "Maybe so"
), fourth_column = c("But", "I won't", "Truth")), class = "data.frame",
row.names = c(NA, 
-3L))