计算几个分类变量的组合

Question

我有一个主要包含分类变量的数据框。我想查看在其中三列中找到的具有分类变量的变量组合的数量。列中的数据如下所示：

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

我正在阅读另一个 post 以在比较两列时使用以下代码：

library(dplyr)
library(stringr)
rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements), ' - ',
       pmax(number_arms, arrangements)), name = "count")

这是结果：

combination   count
12 - single    1            
16 - single    1            
4 - paired     3            
4 - single     4            
5 - paired     4            
5 - single     2            
6 - ornament   1            
6 - paired    81

但是，如果我添加第三列，代码不会给我想要的结果，如下所示：

rg2 %>%
     count(combination = str_c(pmin(number_arms, arrangements, approx_position), ' - ',
       pmax(number_arms, arrangements, approx_position)), name = "count")

它仍然可以毫无错误地运行代码，但我得到了错误的结果。我需要不同的代码来计算三个变量的组合吗？

Answer 1

如果您要查找每个变量组合的计数（不包括 0），您可以这样做：

subset(data.frame(table(rg2)), Freq > 0)

   number_arms arrangements approx_position Freq
1           12     ornament          bottom    1
15           8       paired          middle    1
26           6       single             top    1

或组合：

subset(data.frame(table(rg2)), Freq > 0) |>
  tidyr::unite("combn", -Freq, sep = " - ")

                    combn Freq
1  12 - ornament - bottom    1
15    8 - paired - middle    1
26       6 - single - top    1

数据

number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)

Answer 2

Tidyverse 选项（已更新以删除 group_by）：

library(dplyr)

rg2 %>%
  count(number_arms, arrangements, approx_position)

结果：

 number_arms arrangements approx_position     n
  <chr>       <chr>        <chr>           <int>
1 12          ornament     bottom              1
2 6           single       top                 1
3 8           paired       middle              1

Answer 3

你可以试试 dplyr::count() + paste():

library(dplyr)

rg2 %>%
  count(combination = paste(number_arms, arrangements, approx_position, sep = " - "), name = "count")

#              combination count
# 1 12 - ornament - bottom     1
# 2       6 - single - top     1
# 3    8 - paired - middle     1

计算几个分类变量的组合

Calculate combinations of several categorical variables

combinations

r

categorical-data