在 R 中使用 aggregate/group_by 对数据进行分组并给出每个因子变量的计数?

Using aggregate/group_by in R to group data and give a count for each factor variable?

我有一个看起来像这样的数据框。为简单起见,我显示了前 6 行,但总行数为 8236。等级范围为 0-2。我刚刚在下面的示例中显示了 0 级和 1 级:

 Telangiectasia_time      grade
  <chr>                    <int>
1 telangiectasia_tumour_0      0
2 telangiectasia_tumour_1      0
3 telangiectasia_tumour_12     0
4 telangiectasia_tumour_24     0
5 telangiectasia_tumour_0      1
6 telangiectasia_tumour_1      1

我想按 Telangiectasia_Time(第一列)分组,然后计算每组的成绩数。因此,以前 6 行为例,它应该是这样的:

       Telangiectasia_time grade0    grade1    grade2 
1  telangiectasia_tumour_0    1      1          0
2  telangiectasia_tumour_1    1      1          0
3 telangiectasia_tumour_12    1      0          0
4 telangiectasia_tumour_24    1      0          0  

在末尾有三列分别表示各个等级和每个变量的每个等级的计数。我尝试使用聚合函数:

**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)** 

但我不确定在括号的最后一位放什么,以便返回每个成绩的总和。当我输入 sum 时,它只是将数字相加,并不将变量视为单独的变量(0,1 和 2)。使用我的完整数据集,我得到了错误的输出:

      Telangiectasia_time grade
1  telangiectasia_tumour_0    18
2  telangiectasia_tumour_1    11
3 telangiectasia_tumour_12    38
4 telangiectasia_tumour_24    87

我也试过 group_by() 但这只是给了我一个总数

telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
  Telangiectasia_time      count
* <chr>                    <int>
1 telangiectasia_tumour_0   2059
2 telangiectasia_tumour_1   2059
3 telangiectasia_tumour_12  2059
4 telangiectasia_tumour_24  2059

使用 dpylr::counttidyr::pivot_wider 你可以:

library(dplyr)
library(tidyr)

telangiectasia_tumour_data %>% 
  count(Telangiectasia_time, grade) %>% 
  pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#>   Telangiectasia_time      grade0 grade1
#>   <chr>                     <int>  <int>
#> 1 telangiectasia_tumour_0       1      1
#> 2 telangiectasia_tumour_1       1      1
#> 3 telangiectasia_tumour_12      1      0
#> 4 telangiectasia_tumour_24      1      0

数据

telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
  "telangiectasia_tumour_0",
  "telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
  "telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
  0L,
  0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
  "1",
  "2", "3", "4", "5", "6"
))