R - 如何为唯一的词频计数创建循环

Question

我有以下数据框：

df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))


        q           combined
1       a, b, c     big big sentence
2       a, b, d     I like sentences

我想计算每个唯一 q 的每个唯一单词的频率。所需的输出如下所示：

      words freq V1 V2 V3
1       big    2  a  b  c
2  sentence    1  a  b  c
3         I    1  a  b  d
4      like    1  a  b  d
5 sentences    1  a  b  d

我设法编写了一些代码来仅对 df 的第一行执行此操作。我如何将此代码传输到循环中，以便 df 中每一行的数据操作步骤？

我为 1 行编写的有效代码：

df_1 <- df[1,]

countdf <- data.frame(table(unlist(strsplit(tolower(df_1$combined), " "))))

countsplit <- str_split_fixed(df_1$q, ",", 3)
countsplit <- as.data.frame(countsplit)

countdf$V1 <- countsplit$V1
countdf$V2 <- countsplit$V2
countdf$V3 <- countsplit$V3

Answer 1

library(tidyverse)

df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))

df %>%
  as_tibble() %>%
  transmute(q, words = combined %>% map(~ .x %>% str_split(" ") %>% simplify)) %>%
  unnest(words) %>%
  separate(q, into = c("V1", "V2", "V3")) %>%
  count(V1, V2, V3, words, name = "freq")
#> # A tibble: 5 x 5
#>   V1    V2    V3    words      freq
#>   <chr> <chr> <chr> <chr>     <int>
#> 1 a     b     c     big           2
#> 2 a     b     c     sentence      1
#> 3 a     b     d     I             1
#> 4 a     b     d     like          1
#> 5 a     b     d     sentences     1

^{由 reprex package (v2.0.0)}

创建于 2022-02-22

Answer 2

您可以使用 separate_rows 和 separate:

library(tidyr)
library(dplyr)

df %>% 
  separate_rows(combined) %>% 
  group_by(q, words = combined) %>% 
  summarise(freq = n()) %>% 
  separate(q, into = c("V1", "V2", "V3"))

# A tibble: 5 x 5
  V1    V2    V3    words      freq
  <chr> <chr> <chr> <chr>     <int>
1 a     b     c     big           2
2 a     b     c     sentence      1
3 a     b     d     I             1
4 a     b     d     like          1
5 a     b     d     sentences     1

R - 如何为唯一的词频计数创建循环

R - how to create a loop for a unique word frequency count

loops

r

count

dataframe