根据其他列中的现有单词计算字符串行中单词的出现次数

Question

我有一个包含多行字符串的数据框。我想根据列中出现的单词来计算行中单词的出现次数。我怎样才能用下面的代码实现这一点？ 能否以某种方式修改以下代码以实现此目的，或者任何人都可以建议另一段不需要循环的代码？提前致谢！

df <- data.frame(
  words = c("I want want to compare each ",
            "column to the values in",
            "If any word from the list any",
            "replace the word in the respective the word want"),
  want= c("want", "want", "want", "want"),
  word= c("word", "word", "word", "word"),
  any= c("any", "any", "any", "any"))

#add 1 for match and 0 for no match
for (i in 2:ncol(df))
{
  for (j in 1:nrow(df))
  {                 
    df[j,i] <- ifelse (grepl (df[j,i] , df$words[j]) %in% "TRUE", 1, 0)
  }
  print(i)
}

*'data.frame':  4 obs. of  4 variables:
 $ words: chr  "I want want to compare each " "column to the values in " "If any word from the words any" "replace the word in the respective the word"
 $ want : chr  "want" "want" "want" "want"
 $ word : chr  "word" "word" "word" "word"
 $ any  : chr  "any" "any" "any" "any"*

输出应如下所示：

    words                                                 want word any
1   I want want to compare each                            2    0   0
2   column to the values in                                0    0   0
3   If any word from the list any                          0    1   2
4   replace the word in the respective the word want       1    2   0

现有代码的当前输出如下所示：

    words                                                 want word any
1   I want want to compare each                            1    0   0
2   column to the values in                                0    0   0
3   If any word from the list any                          0    1   1
4   replace the word in the respective the word want       1    1   0

Answer 1

这里有个思路，循环遍历不同的词来统计，用stringr包中的str_count来统计，即

sapply(unique(unlist(df[-1])), function(i) stringr::str_count(df$words, i))

#     want word any
#[1,]    2    0   0
#[2,]    0    0   0
#[3,]    0    1   2
#[4,]    1    2   0

Answer 2

使用 tidyverse（使用 $ 稍微违反语法）：

library(tidyverse)

df %>% 
     mutate_at(vars(-words),function(x) str_count(df$words,x))
                                             words want word any
1                     I want want to compare each     2    0   0
2                          column to the values in    0    0   0
3                    If any word from the list any    0    1   2
4 replace the word in the respective the word want    1    2   0

或者使用 modify_at 并且按照@Sotos 的建议，我们可以使用 . 来维护 tidyverse 语法。

df %>% 
      modify_at(2:ncol(.),function(x) str_count(.$words,x))
                                             words want word any
1                     I want want to compare each     2    0   0
2                          column to the values in    0    0   0
3                    If any word from the list any    0    1   2
4 replace the word in the respective the word want    1    2   0

根据其他列中的现有单词计算字符串行中单词的出现次数

Count the occurrences of words in a string row wise based on existing words in other columns

nlp

r

text-mining