计算几个分类变量的组合

Calculate combinations of several categorical variables

我有一个主要包含分类变量的数据框。我想查看在其中三列中找到的具有分类变量的变量组合的数量。 列中的数据如下所示:

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

我正在阅读另一个 post 以在比较两列时使用以下代码:

library(dplyr)
library(stringr)
rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements), ' - ',
       pmax(number_arms, arrangements)), name = "count") 

这是结果:

combination   count
12 - single    1            
16 - single    1            
4 - paired     3            
4 - single     4            
5 - paired     4            
5 - single     2            
6 - ornament   1            
6 - paired    81    

但是,如果我添加第三列,代码不会给我想要的结果,如下所示:

rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements, approx_position), ' - ',
       pmax(number_arms, arrangements, approx_position)), name = "count") 

它仍然可以毫无错误地运行代码,但我得到了错误的结果。 我需要不同的代码来计算三个变量的组合吗?

如果您要查找每个变量组合的计数(不包括 0),您可以这样做:

subset(data.frame(table(rg2)), Freq > 0)

   number_arms arrangements approx_position Freq
1           12     ornament          bottom    1
15           8       paired          middle    1
26           6       single             top    1

或组合:

subset(data.frame(table(rg2)), Freq > 0) |>
  tidyr::unite("combn", -Freq, sep = " - ")

                    combn Freq
1  12 - ornament - bottom    1
15    8 - paired - middle    1
26       6 - single - top    1

数据

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

Tidyverse 选项(已更新以删除 group_by):

library(dplyr)

rg2 %>%
  count(number_arms, arrangements, approx_position)

结果:

 number_arms arrangements approx_position     n
  <chr>       <chr>        <chr>           <int>
1 12          ornament     bottom              1
2 6           single       top                 1
3 8           paired       middle              1

你可以试试 dplyr::count() + paste():

library(dplyr)

rg2 %>%
  count(combination = paste(number_arms, arrangements, approx_position, sep = " - "), name = "count")

#              combination count
# 1 12 - ornament - bottom     1
# 2       6 - single - top     1
# 3    8 - paired - middle     1