为每个组和唯一元素(R)将频率添加到数据框中

Add frequency into dataframe for each group and unique element (R)

我有一个table比如

Group Family Nb 
1     A      15
2     B      20
3     A      2
3     B      1
3     C      1
4     D      10
4     A      5
5     B      1
5     D      1

我想转换该数据框,以便我在列中包含每个唯一的 Family 元素,并且对于每个 GroupNb 元素的频率,我应该然后得到:

 Group  A    B    C    D    E  F
 1      1    0    0    0    0  0
 2      0    1    0    0    0  0
 3      0.5  0.25 0.25 0    0  0
 4      0.33 0    0    0.67 0  0
 5      0    0.5  0    0.5  0  0

如果有帮助的话,这里是表格的输出格式:

Family = c("A", "B", "A", "B", "C", "D", "A", "B", "D"), 
    Nb = c(15L, 20L, 2L, 1L, 1L, 10L, 5L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-9L))

您可以先 group_by Group 列,然后计算频率,最后 pivot 将数据转换为“宽”格式。

library(tidyverse)

df %>% 
  group_by(Group) %>% 
  mutate(Nb = Nb/sum(Nb)) %>% 
  pivot_wider(Group, names_from = "Family", values_from = "Nb", values_fill = 0)

# A tibble: 5 × 5
# Groups:   Group [5]
  Group     A     B     C     D
  <int> <dbl> <dbl> <dbl> <dbl>
1     1 1      0     0    0    
2     2 0      1     0    0    
3     3 0.5    0.25  0.25 0    
4     4 0.333  0     0    0.667
5     5 0      0.5   0    0.5  

另一个可能的解决方案:

library(tidyverse)

df %>% 
  pivot_wider(names_from = Family, values_from = Nb, values_fill = 0) %>% 
  mutate(aux = rowSums(.[-1]), across(-Group, ~ .x / aux), aux = NULL)

#> # A tibble: 5 × 5
#>   Group     A     B     C     D
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1 1      0     0    0    
#> 2     2 0      1     0    0    
#> 3     3 0.5    0.25  0.25 0    
#> 4     4 0.333  0     0    0.667
#> 5     5 0      0.5   0    0.5

在基础 R 中:

 prop.table(xtabs(Nb ~ ., df), 1)

#     Family
#Group         A         B         C         D
#    1 1.0000000 0.0000000 0.0000000 0.0000000
#    2 0.0000000 1.0000000 0.0000000 0.0000000
#    3 0.5000000 0.2500000 0.2500000 0.0000000
#    4 0.3333333 0.0000000 0.0000000 0.6666667
#    5 0.0000000 0.5000000 0.0000000 0.5000000

如果您需要它作为 data.frame,只需将结果包装在 as.data.frame.matrix