使用表格在 r 中重现嵌套 Excel 数据透视表 table
Reproduce nested Excel Pivot table in r with tabular
我正在尝试在 r 中重现以下 excel 枢轴 table:
使用表格:
library(vcd)
library(tables)
tabular(Sex*(Treatment+1)+1~(count=ID + Percent("col")), data=Arthritis)
生产:
count
Sex Treatment ID Percent
Female Placebo 32 38.10
Treated 27 32.14
All 59 70.24
Male Placebo 11 13.10
Treated 14 16.67
All 25 29.76
All 84 100.00
有没有办法像 excel 枢轴 table 一样让每个性别的治疗百分比加起来达到 100?
除了最后的 All
行之外的所有内容都可以通过以下方式完成。
library(dplyr)
library(tidyr)
df <- Arthritis %>%
group_by(Sex, Treatment) %>%
summarise(cnt = n()) %>%
ungroup() %>%
spread(Treatment, cnt) %>%
mutate(All = Placebo + Treated) %>%
gather(Treatment, ID , -Sex) %>%
group_by(Sex) %>%
mutate(percent = ID / (sum(ID) / 2)) %>%
arrange(Sex, desc(Treatment)) #forces "Treated" to top of Treatment column for each group
> df
Source: local data frame [6 x 4]
Groups: Sex [2]
Sex Treatment ID percent
<fctr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Female Placebo 32 0.5423729
3 Female All 59 1.0000000
4 Male Treated 14 0.5600000
5 Male Placebo 11 0.4400000
6 Male All 25 1.0000000
如果你想要一个总行,你可以使用下面的,但它不是很漂亮。
grand_total <- data.frame(Sex = "Total" , "Treatment" = "All",
ID = nrow(Arthritis), percent = 1,
stringsAsFactors = FALSE)
df_final <- bind_rows(df, grand_total)
现在,如果您想清空 Sex
列,但第一个出现的列除外,您可以这样做。由于我们在 Treatment
列上按降序排序,我们知道 Treated
它作为每个组的顶部。因此,当 Treatment
列不等于 Treated
时,我们只需将 Sex
列替换为空白即可。我们也不会清空我们创建的 All
。
df_final$Sex[df_final$Treatment != "Treated" &
df_final$Sex %in% c("Female", "Male")] <- ""
Source: local data frame [7 x 4]
Groups: Sex [3]
Sex Treatment ID percent
<chr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Placebo 32 0.5423729
3 All 59 1.0000000
4 Male Treated 14 0.5600000
5 Placebo 11 0.4400000
6 All 25 1.0000000
7 Total All 84 1.0000000
我正在尝试在 r 中重现以下 excel 枢轴 table:
使用表格:
library(vcd)
library(tables)
tabular(Sex*(Treatment+1)+1~(count=ID + Percent("col")), data=Arthritis)
生产:
count
Sex Treatment ID Percent
Female Placebo 32 38.10
Treated 27 32.14
All 59 70.24
Male Placebo 11 13.10
Treated 14 16.67
All 25 29.76
All 84 100.00
有没有办法像 excel 枢轴 table 一样让每个性别的治疗百分比加起来达到 100?
除了最后的 All
行之外的所有内容都可以通过以下方式完成。
library(dplyr)
library(tidyr)
df <- Arthritis %>%
group_by(Sex, Treatment) %>%
summarise(cnt = n()) %>%
ungroup() %>%
spread(Treatment, cnt) %>%
mutate(All = Placebo + Treated) %>%
gather(Treatment, ID , -Sex) %>%
group_by(Sex) %>%
mutate(percent = ID / (sum(ID) / 2)) %>%
arrange(Sex, desc(Treatment)) #forces "Treated" to top of Treatment column for each group
> df
Source: local data frame [6 x 4]
Groups: Sex [2]
Sex Treatment ID percent
<fctr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Female Placebo 32 0.5423729
3 Female All 59 1.0000000
4 Male Treated 14 0.5600000
5 Male Placebo 11 0.4400000
6 Male All 25 1.0000000
如果你想要一个总行,你可以使用下面的,但它不是很漂亮。
grand_total <- data.frame(Sex = "Total" , "Treatment" = "All",
ID = nrow(Arthritis), percent = 1,
stringsAsFactors = FALSE)
df_final <- bind_rows(df, grand_total)
现在,如果您想清空 Sex
列,但第一个出现的列除外,您可以这样做。由于我们在 Treatment
列上按降序排序,我们知道 Treated
它作为每个组的顶部。因此,当 Treatment
列不等于 Treated
时,我们只需将 Sex
列替换为空白即可。我们也不会清空我们创建的 All
。
df_final$Sex[df_final$Treatment != "Treated" &
df_final$Sex %in% c("Female", "Male")] <- ""
Source: local data frame [7 x 4]
Groups: Sex [3]
Sex Treatment ID percent
<chr> <chr> <int> <dbl>
1 Female Treated 27 0.4576271
2 Placebo 32 0.5423729
3 All 59 1.0000000
4 Male Treated 14 0.5600000
5 Placebo 11 0.4400000
6 All 25 1.0000000
7 Total All 84 1.0000000