计算 R 中分类变量的出现次数
count the occurrence of categorical variables in R
我有一个由三个分类变量组成的数据框,我想找到每个组合的频率并按频率降序对结果进行排序,如下所示:
我的数据:
A LEVEL1 PASS
A LEVEL1 FAIL
B LEVEL2 PASS
A LEVEL1 PASS
B LEVEL2 PASS
A LEVEL1 PASS
结果应该如下:
A LEVEL1 PASS 3
B LEVEL2 PASS 2
A LEVEL1 FAIL 1
我使用 plyr 库,
myfreq<-count(myresult,vars = NULL, wt_var = NULL)
myfreq<-myfreq[order-myfreq$freq,]
一开始,它有效,但后来它给我这个错误:
Error in grouped_df_impl(data, unname(vars), drop) :
Column vars
is unknown
我使用的其他库是 rJava
和 dplyr
谢谢
您可以使用table
函数。
ex <- data.frame("letter" = c("A", "A", "B", "A", "B", "A"),
"level" = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
"test" = c("PASS", "FAIL", rep("PASS", 4)))
ex
res <- data.frame(table(ex$level, ex$test))
colnames(res) <- c("level", "test", "freq")
您稍后可以将结果 data.frame 与原始结果合并。
我建议使用 dplyr
,它包含在 tidyverse
包中。
我不知道你的数据框中列的名称是什么,所以我在下面的例子中将它们命名为 col1
、col2
和 col3
。
library(tidyverse)
df <- tribble(
~ col1, ~col2, ~col3,
"A", "LEVEL1", "PASS",
"A", "LEVEL1", "FAIL",
"A", "LEVEL1", "PASS",
"B", "LEVEL2", "PASS",
"A", "LEVEL1", "PASS")
# here is where the magic happens
df %>% count(col1, col2, col3, sort = TRUE)
你可以在 dplyr 中使用 group_by:
library(dplyr)
x <- data.frame(letter = c("A", "A", "B", "A", "B", "A"), level = c("LEVEL 1", "LEVEL 1", "LEVEL 2", "LEVEL 1", "LEVEL 2", "LEVEL 1"), text = c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
df <- x %>%
group_by_all() %>%
count()
或者你可以这样做:
df <- x %>%
group_by(letter, level, text) %>%
count()
输出:
> df <- x %>% group_by_all() %>% count()
> df
# A tibble: 3 x 4
# Groups: x, y, z [3]
x y z n
<fctr> <fctr> <fctr> <int>
1 A LEVEL 1 FAIL 1
2 A LEVEL 1 PASS 3
3 B LEVEL 2 PASS 2
这里是 tidyverse 和 n()
df <- tibble(
id = c("A", "A", "B", "A", "B", "A"),
level = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
type = factor(c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
)
df %>%
group_by(id, level, type) %>%
summarise(n = n()) %>%
arrange(desc(n))
# A tibble: 3 x 4
# Groups: id, level [?]
id level type n
<chr> <chr> <fctr> <int>
1 A LEVEL1 FAIL 1
2 A LEVEL1 PASS 3
3 B LEVEL2 PASS 2
我有一个由三个分类变量组成的数据框,我想找到每个组合的频率并按频率降序对结果进行排序,如下所示:
我的数据:
A LEVEL1 PASS
A LEVEL1 FAIL
B LEVEL2 PASS
A LEVEL1 PASS
B LEVEL2 PASS
A LEVEL1 PASS
结果应该如下:
A LEVEL1 PASS 3
B LEVEL2 PASS 2
A LEVEL1 FAIL 1
我使用 plyr 库,
myfreq<-count(myresult,vars = NULL, wt_var = NULL)
myfreq<-myfreq[order-myfreq$freq,]
一开始,它有效,但后来它给我这个错误:
Error in grouped_df_impl(data, unname(vars), drop) : Column
vars
is unknown
我使用的其他库是 rJava
和 dplyr
谢谢
您可以使用table
函数。
ex <- data.frame("letter" = c("A", "A", "B", "A", "B", "A"),
"level" = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
"test" = c("PASS", "FAIL", rep("PASS", 4)))
ex
res <- data.frame(table(ex$level, ex$test))
colnames(res) <- c("level", "test", "freq")
您稍后可以将结果 data.frame 与原始结果合并。
我建议使用 dplyr
,它包含在 tidyverse
包中。
我不知道你的数据框中列的名称是什么,所以我在下面的例子中将它们命名为 col1
、col2
和 col3
。
library(tidyverse)
df <- tribble(
~ col1, ~col2, ~col3,
"A", "LEVEL1", "PASS",
"A", "LEVEL1", "FAIL",
"A", "LEVEL1", "PASS",
"B", "LEVEL2", "PASS",
"A", "LEVEL1", "PASS")
# here is where the magic happens
df %>% count(col1, col2, col3, sort = TRUE)
你可以在 dplyr 中使用 group_by:
library(dplyr)
x <- data.frame(letter = c("A", "A", "B", "A", "B", "A"), level = c("LEVEL 1", "LEVEL 1", "LEVEL 2", "LEVEL 1", "LEVEL 2", "LEVEL 1"), text = c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
df <- x %>%
group_by_all() %>%
count()
或者你可以这样做:
df <- x %>%
group_by(letter, level, text) %>%
count()
输出:
> df <- x %>% group_by_all() %>% count()
> df
# A tibble: 3 x 4
# Groups: x, y, z [3]
x y z n
<fctr> <fctr> <fctr> <int>
1 A LEVEL 1 FAIL 1
2 A LEVEL 1 PASS 3
3 B LEVEL 2 PASS 2
这里是 tidyverse 和 n()
df <- tibble(
id = c("A", "A", "B", "A", "B", "A"),
level = c("LEVEL1", "LEVEL1", "LEVEL2", "LEVEL1", "LEVEL2", "LEVEL1"),
type = factor(c("PASS", "FAIL", "PASS", "PASS", "PASS", "PASS"))
)
df %>%
group_by(id, level, type) %>%
summarise(n = n()) %>%
arrange(desc(n))
# A tibble: 3 x 4
# Groups: id, level [?]
id level type n
<chr> <chr> <fctr> <int>
1 A LEVEL1 FAIL 1
2 A LEVEL1 PASS 3
3 B LEVEL2 PASS 2