计算几个分类变量的组合
Calculate combinations of several categorical variables
我有一个主要包含分类变量的数据框。我想查看在其中三列中找到的具有分类变量的变量组合的数量。
列中的数据如下所示:
number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)
我正在阅读另一个 post 以在比较两列时使用以下代码:
library(dplyr)
library(stringr)
rg2 %>%
count(combination = str_c(pmin(number_arms, arrangements), ' - ',
pmax(number_arms, arrangements)), name = "count")
这是结果:
combination count
12 - single 1
16 - single 1
4 - paired 3
4 - single 4
5 - paired 4
5 - single 2
6 - ornament 1
6 - paired 81
但是,如果我添加第三列,代码不会给我想要的结果,如下所示:
rg2 %>%
count(combination = str_c(pmin(number_arms, arrangements, approx_position), ' - ',
pmax(number_arms, arrangements, approx_position)), name = "count")
它仍然可以毫无错误地运行代码,但我得到了错误的结果。
我需要不同的代码来计算三个变量的组合吗?
如果您要查找每个变量组合的计数(不包括 0),您可以这样做:
subset(data.frame(table(rg2)), Freq > 0)
number_arms arrangements approx_position Freq
1 12 ornament bottom 1
15 8 paired middle 1
26 6 single top 1
或组合:
subset(data.frame(table(rg2)), Freq > 0) |>
tidyr::unite("combn", -Freq, sep = " - ")
combn Freq
1 12 - ornament - bottom 1
15 8 - paired - middle 1
26 6 - single - top 1
数据
number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)
Tidyverse 选项(已更新以删除 group_by
):
library(dplyr)
rg2 %>%
count(number_arms, arrangements, approx_position)
结果:
number_arms arrangements approx_position n
<chr> <chr> <chr> <int>
1 12 ornament bottom 1
2 6 single top 1
3 8 paired middle 1
你可以试试 dplyr::count()
+ paste()
:
library(dplyr)
rg2 %>%
count(combination = paste(number_arms, arrangements, approx_position, sep = " - "), name = "count")
# combination count
# 1 12 - ornament - bottom 1
# 2 6 - single - top 1
# 3 8 - paired - middle 1
我有一个主要包含分类变量的数据框。我想查看在其中三列中找到的具有分类变量的变量组合的数量。 列中的数据如下所示:
number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)
我正在阅读另一个 post 以在比较两列时使用以下代码:
library(dplyr)
library(stringr)
rg2 %>%
count(combination = str_c(pmin(number_arms, arrangements), ' - ',
pmax(number_arms, arrangements)), name = "count")
这是结果:
combination count
12 - single 1
16 - single 1
4 - paired 3
4 - single 4
5 - paired 4
5 - single 2
6 - ornament 1
6 - paired 81
但是,如果我添加第三列,代码不会给我想要的结果,如下所示:
rg2 %>%
count(combination = str_c(pmin(number_arms, arrangements, approx_position), ' - ',
pmax(number_arms, arrangements, approx_position)), name = "count")
它仍然可以毫无错误地运行代码,但我得到了错误的结果。 我需要不同的代码来计算三个变量的组合吗?
如果您要查找每个变量组合的计数(不包括 0),您可以这样做:
subset(data.frame(table(rg2)), Freq > 0)
number_arms arrangements approx_position Freq
1 12 ornament bottom 1
15 8 paired middle 1
26 6 single top 1
或组合:
subset(data.frame(table(rg2)), Freq > 0) |>
tidyr::unite("combn", -Freq, sep = " - ")
combn Freq
1 12 - ornament - bottom 1
15 8 - paired - middle 1
26 6 - single - top 1
数据
number_arms <- c("6","8","12")
arrangements <- c("single", "paired", "ornament")
approx_position <- c("top", "middle", "bottom")
rg2 <- data.frame(number_arms, arrangements, approx_position)
Tidyverse 选项(已更新以删除 group_by
):
library(dplyr)
rg2 %>%
count(number_arms, arrangements, approx_position)
结果:
number_arms arrangements approx_position n
<chr> <chr> <chr> <int>
1 12 ornament bottom 1
2 6 single top 1
3 8 paired middle 1
你可以试试 dplyr::count()
+ paste()
:
library(dplyr)
rg2 %>%
count(combination = paste(number_arms, arrangements, approx_position, sep = " - "), name = "count")
# combination count
# 1 12 - ornament - bottom 1
# 2 6 - single - top 1
# 3 8 - paired - middle 1