如何为每个年龄组使用摘要和剪切命令构建新数据
how to built a new data with summarize and cut command for each age group
我想构建新数据 (age_summary),其中包含按年龄段划分的总人数。我想使用“cut”,我的代码是:
set.seed(12345)
#create a numeric variable Age
AGE <- sample(0:110, 100, replace = TRUE)
# Creat Data fame
Sample.data <-data.frame(AGE)
age_summary <- Sample.data %>% summarize(group_by(Sample.data,
cut(
AGE,
breaks=c(0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE,
labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
),"Total people" = n())
)
但是我的代码不起作用。我不确定出了什么问题。关于如何解决这个问题有什么建议吗?
添加:
我能够得到如下所示的结果:
我是否有可能实现如下所示的目标:
这是我在新数据集上使用 adorn_totals(.) 得到的结果。总人数看起来不错,但平均年龄看起来很奇怪。
有什么想法吗?
如果我们删除 group_by
周围的 summarise
环绕,我们可以更容易地找到问题。这里的cut
labels
和breaks
有不同的lengths
,可以在[=16=中加上-Inf
或Inf
来改变]
library(dplyr)
Sample.data %>%
group_by(grp = cut(AGE,
breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE,
labels = c("Foetus(0 yr)",
"Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)",
"Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
)) %>%
summarise(TotalPeople = n())
如果我们需要在不同的列上应用不同的函数来创建行,请使用 add_row
library(tibble)
library(tidyr)
Sample.data %>%
group_by(grp = cut( AGE, breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE, labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)",
"Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)") )) %>%
summarise(TotalPeople = n(), Ave_age=mean(AGE))%>%
complete(grp = levels(grp), fill = list(TotalPeople = 0)) %>%
add_row(grp = "Total", TotalPeople = sum(.$TotalPeople),
Ave_age = mean(.$Ave_age, na.rm = TRUE))
我想构建新数据 (age_summary),其中包含按年龄段划分的总人数。我想使用“cut”,我的代码是:
set.seed(12345)
#create a numeric variable Age
AGE <- sample(0:110, 100, replace = TRUE)
# Creat Data fame
Sample.data <-data.frame(AGE)
age_summary <- Sample.data %>% summarize(group_by(Sample.data,
cut(
AGE,
breaks=c(0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE,
labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
),"Total people" = n())
)
但是我的代码不起作用。我不确定出了什么问题。关于如何解决这个问题有什么建议吗?
添加: 我能够得到如下所示的结果:
我是否有可能实现如下所示的目标:
这是我在新数据集上使用 adorn_totals(.) 得到的结果。总人数看起来不错,但平均年龄看起来很奇怪。
有什么想法吗?
如果我们删除 group_by
周围的 summarise
环绕,我们可以更容易地找到问题。这里的cut
labels
和breaks
有不同的lengths
,可以在[=16=中加上-Inf
或Inf
来改变]
library(dplyr)
Sample.data %>%
group_by(grp = cut(AGE,
breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE,
labels = c("Foetus(0 yr)",
"Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)", "Adolescent(13-17.999 yrs)",
"Adult(18-64.999 yrs.)","Elderly(65-199 yrs)")
)) %>%
summarise(TotalPeople = n())
如果我们需要在不同的列上应用不同的函数来创建行,请使用 add_row
library(tibble)
library(tidyr)
Sample.data %>%
group_by(grp = cut( AGE, breaks=c(-Inf, 0,0.001, 0.083, 2, 13, 65,1000),
right=TRUE, labels = c("Foetus(0 yr)","Neonate (0.001 - 0.082 yr)","Infant(0.083-1.999 yrs)","Child(2-12.999 yrs)",
"Adolescent(13-17.999 yrs)","Adult(18-64.999 yrs.)","Elderly(65-199 yrs)") )) %>%
summarise(TotalPeople = n(), Ave_age=mean(AGE))%>%
complete(grp = levels(grp), fill = list(TotalPeople = 0)) %>%
add_row(grp = "Total", TotalPeople = sum(.$TotalPeople),
Ave_age = mean(.$Ave_age, na.rm = TRUE))