如何在 R 中创建汇总人口统计 table
How to create a summarized demographic table in R
我有来自阿尔茨海默病患者队列的数据。我想创建一个摘要 table(或应急 table)来显示此 table 中的所有信息。这就是我希望在这个队列中看到的:有多少男性和女性、平均发病年龄、最后一次访问的平均年龄、平均死亡年龄、带有 apoe4any 的样本数 (IID)。我应该用什么方法在 R 中创建这样的 table?
dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875",
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592",
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"),
cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L,
`2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"),
status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L,
`2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L,
`8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"),
Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "0", class = "factor"), age_onset = structure(c(NA,
NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67",
" 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L,
`2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L,
`8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74",
"77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA,
NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88",
" 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L,
`2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
R 使用 factor
class 作为分类数据。如果您将年龄(当前为因素)更改为 numeric
,那么 summary(dat)
将满足您的大部分需求。
convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
summary(dat)
# IID cohort sex status Race Ethnicity age_onset age_last_visit
# NACC000875:1 ADC8_AA:10 1:2 1:6 2:10 0:10 Min. :63 Min. :70.00
# NACC003779:1 2:8 2:4 1st Qu.:66 1st Qu.:70.25
# NACC006805:1 Median :69 Median :75.50
# NACC008215:1 Mean :70 Mean :76.70
# NACC010067:1 3rd Qu.:73 3rd Qu.:81.00
# NACC010592:1 Max. :79 Max. :89.00
# (Other) :4 NA's :6
# age_death apoe4any
# Min. :72.00 0:3
# 1st Qu.:80.00 1:7
# Median :88.00
# Mean :83.33
# 3rd Qu.:89.00
# Max. :90.00
# NA's :7
请参阅 this common FAQ 以了解我的因子到数字转换的说明。
如果您只想汇总您提到的列,您也可以对数据进行子集化:
summary(dat[c("sex", convert_to_numeric, "apoe4any")])
# sex age_onset age_last_visit age_death apoe4any
# 1:2 Min. :63 Min. :70.00 Min. :72.00 0:3
# 2:8 1st Qu.:66 1st Qu.:70.25 1st Qu.:80.00 1:7
# Median :69 Median :75.50 Median :88.00
# Mean :70 Mean :76.70 Mean :83.33
# 3rd Qu.:73 3rd Qu.:81.00 3rd Qu.:89.00
# Max. :79 Max. :89.00 Max. :90.00
# NA's :6 NA's :7
我有来自阿尔茨海默病患者队列的数据。我想创建一个摘要 table(或应急 table)来显示此 table 中的所有信息。这就是我希望在这个队列中看到的:有多少男性和女性、平均发病年龄、最后一次访问的平均年龄、平均死亡年龄、带有 apoe4any 的样本数 (IID)。我应该用什么方法在 R 中创建这样的 table?
dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875",
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592",
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"),
cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L,
`2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"),
status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L,
`2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L,
`8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"),
Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L,
`5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
), .Label = "0", class = "factor"), age_onset = structure(c(NA,
NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67",
" 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L,
`2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L,
`8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74",
"77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA,
NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88",
" 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L,
`2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L,
`8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
R 使用 factor
class 作为分类数据。如果您将年龄(当前为因素)更改为 numeric
,那么 summary(dat)
将满足您的大部分需求。
convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
summary(dat)
# IID cohort sex status Race Ethnicity age_onset age_last_visit
# NACC000875:1 ADC8_AA:10 1:2 1:6 2:10 0:10 Min. :63 Min. :70.00
# NACC003779:1 2:8 2:4 1st Qu.:66 1st Qu.:70.25
# NACC006805:1 Median :69 Median :75.50
# NACC008215:1 Mean :70 Mean :76.70
# NACC010067:1 3rd Qu.:73 3rd Qu.:81.00
# NACC010592:1 Max. :79 Max. :89.00
# (Other) :4 NA's :6
# age_death apoe4any
# Min. :72.00 0:3
# 1st Qu.:80.00 1:7
# Median :88.00
# Mean :83.33
# 3rd Qu.:89.00
# Max. :90.00
# NA's :7
请参阅 this common FAQ 以了解我的因子到数字转换的说明。
如果您只想汇总您提到的列,您也可以对数据进行子集化:
summary(dat[c("sex", convert_to_numeric, "apoe4any")])
# sex age_onset age_last_visit age_death apoe4any
# 1:2 Min. :63 Min. :70.00 Min. :72.00 0:3
# 2:8 1st Qu.:66 1st Qu.:70.25 1st Qu.:80.00 1:7
# Median :69 Median :75.50 Median :88.00
# Mean :70 Mean :76.70 Mean :83.33
# 3rd Qu.:73 3rd Qu.:81.00 3rd Qu.:89.00
# Max. :79 Max. :89.00 Max. :90.00
# NA's :6 NA's :7