为每个组和唯一元素(R)将频率添加到数据框中
Add frequency into dataframe for each group and unique element (R)
我有一个table比如
Group Family Nb
1 A 15
2 B 20
3 A 2
3 B 1
3 C 1
4 D 10
4 A 5
5 B 1
5 D 1
我想转换该数据框,以便我在列中包含每个唯一的 Family
元素,并且对于每个 Group
,Nb
元素的频率,我应该然后得到:
Group A B C D E F
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0.5 0.25 0.25 0 0 0
4 0.33 0 0 0.67 0 0
5 0 0.5 0 0.5 0 0
如果有帮助的话,这里是表格的输出格式:
Family = c("A", "B", "A", "B", "C", "D", "A", "B", "D"),
Nb = c(15L, 20L, 2L, 1L, 1L, 10L, 5L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-9L))
您可以先 group_by
Group
列,然后计算频率,最后 pivot
将数据转换为“宽”格式。
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Nb = Nb/sum(Nb)) %>%
pivot_wider(Group, names_from = "Family", values_from = "Nb", values_fill = 0)
# A tibble: 5 × 5
# Groups: Group [5]
Group A B C D
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0.5 0.25 0.25 0
4 4 0.333 0 0 0.667
5 5 0 0.5 0 0.5
另一个可能的解决方案:
library(tidyverse)
df %>%
pivot_wider(names_from = Family, values_from = Nb, values_fill = 0) %>%
mutate(aux = rowSums(.[-1]), across(-Group, ~ .x / aux), aux = NULL)
#> # A tibble: 5 × 5
#> Group A B C D
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0
#> 2 2 0 1 0 0
#> 3 3 0.5 0.25 0.25 0
#> 4 4 0.333 0 0 0.667
#> 5 5 0 0.5 0 0.5
在基础 R 中:
prop.table(xtabs(Nb ~ ., df), 1)
# Family
#Group A B C D
# 1 1.0000000 0.0000000 0.0000000 0.0000000
# 2 0.0000000 1.0000000 0.0000000 0.0000000
# 3 0.5000000 0.2500000 0.2500000 0.0000000
# 4 0.3333333 0.0000000 0.0000000 0.6666667
# 5 0.0000000 0.5000000 0.0000000 0.5000000
如果您需要它作为 data.frame,只需将结果包装在 as.data.frame.matrix
中
我有一个table比如
Group Family Nb
1 A 15
2 B 20
3 A 2
3 B 1
3 C 1
4 D 10
4 A 5
5 B 1
5 D 1
我想转换该数据框,以便我在列中包含每个唯一的 Family
元素,并且对于每个 Group
,Nb
元素的频率,我应该然后得到:
Group A B C D E F
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0.5 0.25 0.25 0 0 0
4 0.33 0 0 0.67 0 0
5 0 0.5 0 0.5 0 0
如果有帮助的话,这里是表格的输出格式:
Family = c("A", "B", "A", "B", "C", "D", "A", "B", "D"),
Nb = c(15L, 20L, 2L, 1L, 1L, 10L, 5L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-9L))
您可以先 group_by
Group
列,然后计算频率,最后 pivot
将数据转换为“宽”格式。
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Nb = Nb/sum(Nb)) %>%
pivot_wider(Group, names_from = "Family", values_from = "Nb", values_fill = 0)
# A tibble: 5 × 5
# Groups: Group [5]
Group A B C D
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 2 0 1 0 0
3 3 0.5 0.25 0.25 0
4 4 0.333 0 0 0.667
5 5 0 0.5 0 0.5
另一个可能的解决方案:
library(tidyverse)
df %>%
pivot_wider(names_from = Family, values_from = Nb, values_fill = 0) %>%
mutate(aux = rowSums(.[-1]), across(-Group, ~ .x / aux), aux = NULL)
#> # A tibble: 5 × 5
#> Group A B C D
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0
#> 2 2 0 1 0 0
#> 3 3 0.5 0.25 0.25 0
#> 4 4 0.333 0 0 0.667
#> 5 5 0 0.5 0 0.5
在基础 R 中:
prop.table(xtabs(Nb ~ ., df), 1)
# Family
#Group A B C D
# 1 1.0000000 0.0000000 0.0000000 0.0000000
# 2 0.0000000 1.0000000 0.0000000 0.0000000
# 3 0.5000000 0.2500000 0.2500000 0.0000000
# 4 0.3333333 0.0000000 0.0000000 0.6666667
# 5 0.0000000 0.5000000 0.0000000 0.5000000
如果您需要它作为 data.frame,只需将结果包装在 as.data.frame.matrix