根据其他列中的现有单词计算字符串行中单词的出现次数
Count the occurrences of words in a string row wise based on existing words in other columns
我有一个包含多行字符串的数据框。我想根据列中出现的单词来计算行中单词的出现次数。我怎样才能用下面的代码实现这一点? 能否以某种方式修改以下代码以实现此目的,或者任何人都可以建议另一段不需要循环的代码?提前致谢!
df <- data.frame(
words = c("I want want to compare each ",
"column to the values in",
"If any word from the list any",
"replace the word in the respective the word want"),
want= c("want", "want", "want", "want"),
word= c("word", "word", "word", "word"),
any= c("any", "any", "any", "any"))
#add 1 for match and 0 for no match
for (i in 2:ncol(df))
{
for (j in 1:nrow(df))
{
df[j,i] <- ifelse (grepl (df[j,i] , df$words[j]) %in% "TRUE", 1, 0)
}
print(i)
}
*'data.frame': 4 obs. of 4 variables:
$ words: chr "I want want to compare each " "column to the values in " "If any word from the words any" "replace the word in the respective the word"
$ want : chr "want" "want" "want" "want"
$ word : chr "word" "word" "word" "word"
$ any : chr "any" "any" "any" "any"*
输出应如下所示:
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0
现有代码的当前输出如下所示:
words want word any
1 I want want to compare each 1 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 1
4 replace the word in the respective the word want 1 1 0
这里有个思路,循环遍历不同的词来统计,用stringr
包中的str_count
来统计,即
sapply(unique(unlist(df[-1])), function(i) stringr::str_count(df$words, i))
# want word any
#[1,] 2 0 0
#[2,] 0 0 0
#[3,] 0 1 2
#[4,] 1 2 0
使用 tidyverse
(使用 $
稍微违反语法):
library(tidyverse)
df %>%
mutate_at(vars(-words),function(x) str_count(df$words,x))
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0
或者使用 modify_at
并且按照@Sotos 的建议,我们可以使用 .
来维护 tidyverse
语法。
df %>%
modify_at(2:ncol(.),function(x) str_count(.$words,x))
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0
我有一个包含多行字符串的数据框。我想根据列中出现的单词来计算行中单词的出现次数。我怎样才能用下面的代码实现这一点? 能否以某种方式修改以下代码以实现此目的,或者任何人都可以建议另一段不需要循环的代码?提前致谢!
df <- data.frame(
words = c("I want want to compare each ",
"column to the values in",
"If any word from the list any",
"replace the word in the respective the word want"),
want= c("want", "want", "want", "want"),
word= c("word", "word", "word", "word"),
any= c("any", "any", "any", "any"))
#add 1 for match and 0 for no match
for (i in 2:ncol(df))
{
for (j in 1:nrow(df))
{
df[j,i] <- ifelse (grepl (df[j,i] , df$words[j]) %in% "TRUE", 1, 0)
}
print(i)
}
*'data.frame': 4 obs. of 4 variables:
$ words: chr "I want want to compare each " "column to the values in " "If any word from the words any" "replace the word in the respective the word"
$ want : chr "want" "want" "want" "want"
$ word : chr "word" "word" "word" "word"
$ any : chr "any" "any" "any" "any"*
输出应如下所示:
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0
现有代码的当前输出如下所示:
words want word any
1 I want want to compare each 1 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 1
4 replace the word in the respective the word want 1 1 0
这里有个思路,循环遍历不同的词来统计,用stringr
包中的str_count
来统计,即
sapply(unique(unlist(df[-1])), function(i) stringr::str_count(df$words, i))
# want word any
#[1,] 2 0 0
#[2,] 0 0 0
#[3,] 0 1 2
#[4,] 1 2 0
使用 tidyverse
(使用 $
稍微违反语法):
library(tidyverse)
df %>%
mutate_at(vars(-words),function(x) str_count(df$words,x))
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0
或者使用 modify_at
并且按照@Sotos 的建议,我们可以使用 .
来维护 tidyverse
语法。
df %>%
modify_at(2:ncol(.),function(x) str_count(.$words,x))
words want word any
1 I want want to compare each 2 0 0
2 column to the values in 0 0 0
3 If any word from the list any 0 1 2
4 replace the word in the respective the word want 1 2 0