在 R 数据框中创建多个比例 table
Creating multiple proportion table in R dataframe
我有以下table
Result_Group
Review
A
1
B
4
A
1
C
1
D
5
D
4
E
5
C
1
C
2
A
2
B
3
E
2
df = structure(list(Result_Group = structure(c(1L, 2L, 1L, 3L, 4L, 4L, 5L, 3L, 3L, 1L, 2L, 5L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), Review = c(1L, 4L, 1L, 1L, 5L, 4L, 5L, 1L, 2L, 2L, 3L, 2L)),
class = "data.frame", row.names = c(NA, -12L))
有谁知道如何为每个组的评论比例创建table?目前我正在逐组进行,仅对数据进行子集化就需要相当长的时间。
即table如下:
Review
A
B
C
D
E
1
0.67
0
0.67
0
0
2
0.33
0
0.33
0
0.50
3
0
0.50
0
0
0
4
0
0.50
0
0.5
0
5
0
0
0
0.5
0.50
谢谢!
你可以这样做:
library(tidyverse)
df |>
group_by(Result_Group) |>
count(Review) |>
mutate(prop = n/sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = Result_Group,
values_from = prop,
values_fill = 0)
# A tibble: 5 x 6
Review A B C D E
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.667 0 0.667 0 0
2 2 0.333 0 0.333 0 0.5
3 3 0 0.5 0 0 0
4 4 0 0.5 0 0.5 0
5 5 0 0 0 0.5 0.5
这是使用 dplyr 和 tidyr 的整洁方法
library(dplyr)
df %>%
# Add count values (all equal to 1)
mutate(count = 1) %>%
# Pivot wider to get A, B, C.. as column names, and sum of count as values
tidyr::pivot_wider(
id_cols = Review,
names_from = Result_Group,
values_from = count,
values_fn = sum,
values_fill = 0 # NAs are turned into 0
) %>%
# Mutate to get fractions instead of count
mutate(
across(
-Review,
~ .x / sum(.x)
)
) %>%
# Sort by review
arrange(Review)
#> # A tibble: 5 × 6
#> Review A B C D E
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.667 0 0.667 0 0
#> 2 2 0.333 0 0.333 0 0.5
#> 3 3 0 0.5 0 0 0
#> 4 4 0 0.5 0 0.5 0
#> 5 5 0 0 0 0.5 0.5
由 reprex package (v2.0.1)
创建于 2022-03-22
我有以下table
Result_Group | Review |
---|---|
A | 1 |
B | 4 |
A | 1 |
C | 1 |
D | 5 |
D | 4 |
E | 5 |
C | 1 |
C | 2 |
A | 2 |
B | 3 |
E | 2 |
df = structure(list(Result_Group = structure(c(1L, 2L, 1L, 3L, 4L, 4L, 5L, 3L, 3L, 1L, 2L, 5L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), Review = c(1L, 4L, 1L, 1L, 5L, 4L, 5L, 1L, 2L, 2L, 3L, 2L)),
class = "data.frame", row.names = c(NA, -12L))
有谁知道如何为每个组的评论比例创建table?目前我正在逐组进行,仅对数据进行子集化就需要相当长的时间。
即table如下:
Review | A | B | C | D | E |
---|---|---|---|---|---|
1 | 0.67 | 0 | 0.67 | 0 | 0 |
2 | 0.33 | 0 | 0.33 | 0 | 0.50 |
3 | 0 | 0.50 | 0 | 0 | 0 |
4 | 0 | 0.50 | 0 | 0.5 | 0 |
5 | 0 | 0 | 0 | 0.5 | 0.50 |
谢谢!
你可以这样做:
library(tidyverse)
df |>
group_by(Result_Group) |>
count(Review) |>
mutate(prop = n/sum(n)) |>
ungroup() |>
select(-n) |>
pivot_wider(names_from = Result_Group,
values_from = prop,
values_fill = 0)
# A tibble: 5 x 6
Review A B C D E
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.667 0 0.667 0 0
2 2 0.333 0 0.333 0 0.5
3 3 0 0.5 0 0 0
4 4 0 0.5 0 0.5 0
5 5 0 0 0 0.5 0.5
这是使用 dplyr 和 tidyr 的整洁方法
library(dplyr)
df %>%
# Add count values (all equal to 1)
mutate(count = 1) %>%
# Pivot wider to get A, B, C.. as column names, and sum of count as values
tidyr::pivot_wider(
id_cols = Review,
names_from = Result_Group,
values_from = count,
values_fn = sum,
values_fill = 0 # NAs are turned into 0
) %>%
# Mutate to get fractions instead of count
mutate(
across(
-Review,
~ .x / sum(.x)
)
) %>%
# Sort by review
arrange(Review)
#> # A tibble: 5 × 6
#> Review A B C D E
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.667 0 0.667 0 0
#> 2 2 0.333 0 0.333 0 0.5
#> 3 3 0 0.5 0 0 0
#> 4 4 0 0.5 0 0.5 0
#> 5 5 0 0 0 0.5 0.5
由 reprex package (v2.0.1)
创建于 2022-03-22