R - 如何为唯一的词频计数创建循环
R - how to create a loop for a unique word frequency count
我有以下数据框:
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
q combined
1 a, b, c big big sentence
2 a, b, d I like sentences
我想计算每个唯一 q 的每个唯一单词的频率。所需的输出如下所示:
words freq V1 V2 V3
1 big 2 a b c
2 sentence 1 a b c
3 I 1 a b d
4 like 1 a b d
5 sentences 1 a b d
我设法编写了一些代码来仅对 df 的第一行执行此操作。我如何将此代码传输到循环中,以便 df 中每一行的数据操作步骤?
我为 1 行编写的有效代码:
df_1 <- df[1,]
countdf <- data.frame(table(unlist(strsplit(tolower(df_1$combined), " "))))
countsplit <- str_split_fixed(df_1$q, ",", 3)
countsplit <- as.data.frame(countsplit)
countdf$V1 <- countsplit$V1
countdf$V2 <- countsplit$V2
countdf$V3 <- countsplit$V3
library(tidyverse)
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
df %>%
as_tibble() %>%
transmute(q, words = combined %>% map(~ .x %>% str_split(" ") %>% simplify)) %>%
unnest(words) %>%
separate(q, into = c("V1", "V2", "V3")) %>%
count(V1, V2, V3, words, name = "freq")
#> # A tibble: 5 x 5
#> V1 V2 V3 words freq
#> <chr> <chr> <chr> <chr> <int>
#> 1 a b c big 2
#> 2 a b c sentence 1
#> 3 a b d I 1
#> 4 a b d like 1
#> 5 a b d sentences 1
由 reprex package (v2.0.0)
创建于 2022-02-22
您可以使用 separate_rows
和 separate
:
library(tidyr)
library(dplyr)
df %>%
separate_rows(combined) %>%
group_by(q, words = combined) %>%
summarise(freq = n()) %>%
separate(q, into = c("V1", "V2", "V3"))
# A tibble: 5 x 5
V1 V2 V3 words freq
<chr> <chr> <chr> <chr> <int>
1 a b c big 2
2 a b c sentence 1
3 a b d I 1
4 a b d like 1
5 a b d sentences 1
我有以下数据框:
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
q combined
1 a, b, c big big sentence
2 a, b, d I like sentences
我想计算每个唯一 q 的每个唯一单词的频率。所需的输出如下所示:
words freq V1 V2 V3
1 big 2 a b c
2 sentence 1 a b c
3 I 1 a b d
4 like 1 a b d
5 sentences 1 a b d
我设法编写了一些代码来仅对 df 的第一行执行此操作。我如何将此代码传输到循环中,以便 df 中每一行的数据操作步骤?
我为 1 行编写的有效代码:
df_1 <- df[1,]
countdf <- data.frame(table(unlist(strsplit(tolower(df_1$combined), " "))))
countsplit <- str_split_fixed(df_1$q, ",", 3)
countsplit <- as.data.frame(countsplit)
countdf$V1 <- countsplit$V1
countdf$V2 <- countsplit$V2
countdf$V3 <- countsplit$V3
library(tidyverse)
df <- data.frame(q = c("a, b, c", "a, b, d"), combined = c("big big sentence","I like sentences"))
df %>%
as_tibble() %>%
transmute(q, words = combined %>% map(~ .x %>% str_split(" ") %>% simplify)) %>%
unnest(words) %>%
separate(q, into = c("V1", "V2", "V3")) %>%
count(V1, V2, V3, words, name = "freq")
#> # A tibble: 5 x 5
#> V1 V2 V3 words freq
#> <chr> <chr> <chr> <chr> <int>
#> 1 a b c big 2
#> 2 a b c sentence 1
#> 3 a b d I 1
#> 4 a b d like 1
#> 5 a b d sentences 1
由 reprex package (v2.0.0)
创建于 2022-02-22您可以使用 separate_rows
和 separate
:
library(tidyr)
library(dplyr)
df %>%
separate_rows(combined) %>%
group_by(q, words = combined) %>%
summarise(freq = n()) %>%
separate(q, into = c("V1", "V2", "V3"))
# A tibble: 5 x 5
V1 V2 V3 words freq
<chr> <chr> <chr> <chr> <int>
1 a b c big 2
2 a b c sentence 1
3 a b d I 1
4 a b d like 1
5 a b d sentences 1