如何在 R 中创建汇总人口统计 table

How to create a summarized demographic table in R

我有来自阿尔茨海默病患者队列的数据。我想创建一个摘要 table(或应急 table)来显示此 table 中的所有信息。这就是我希望在这个队列中看到的:有多少男性和女性、平均发病年龄、最后一次访问的平均年龄、平均死亡年龄、带有 apoe4any 的样本数 (IID)。我应该用什么方法在 R 中创建这样的 table?

dat <- structure(list(IID = structure(1:10, .Names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"), .Label = c("NACC000875", 
"NACC003779", "NACC006805", "NACC008215", "NACC010067", "NACC010592", 
"NACC011413", "NACC015383", "NACC017476", "NACC017538"), class = "factor"), 
    cohort = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "ADC8_AA", class = "factor"), sex = structure(c(`1` = 2L, 
    `2` = 2L, `3` = 2L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 1L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("1", "2"), class = "factor"), 
    status = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 2L, `6` = 1L, `7` = 2L, `8` = 1L, `9` = 2L, `10` = 2L
    ), .Label = c("1", "2"), class = "factor"), Race = structure(c(`1` = 1L, 
    `2` = 1L, `3` = 1L, `4` = 1L, `5` = 1L, `6` = 1L, `7` = 1L, 
    `8` = 1L, `9` = 1L, `10` = 1L), .Label = "2", class = "factor"), 
    Ethnicity = structure(c(`1` = 1L, `2` = 1L, `3` = 1L, `4` = 1L, 
    `5` = 1L, `6` = 1L, `7` = 1L, `8` = 1L, `9` = 1L, `10` = 1L
    ), .Label = "0", class = "factor"), age_onset = structure(c(NA, 
    NA, NA, NA, 1L, NA, 4L, NA, 2L, 3L), .Label = c(" 63", " 67", 
    " 71", " 79", "888"), class = "factor"), age_last_visit = structure(c(`1` = 6L, 
    `2` = 4L, `3` = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 8L, 
    `8` = 7L, `9` = 1L, `10` = 5L), .Label = c("70", "71", "74", 
    "77", "78", "82", "86", "89"), class = "factor"), age_death = structure(c(NA, 
    NA, NA, 1L, NA, NA, 3L, 2L, NA, NA), .Label = c(" 72", " 88", 
    " 90", "888"), class = "factor"), apoe4any = structure(c(`1` = 1L, 
    `2` = 2L, `3` = 1L, `4` = 2L, `5` = 2L, `6` = 1L, `7` = 2L, 
    `8` = 2L, `9` = 2L, `10` = 2L), .Label = c("0", "1"), class = "factor")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")

R 使用 factor class 作为分类数据。如果您将年龄(当前为因素)更改为 numeric,那么 summary(dat) 将满足您的大部分需求。

convert_to_numeric = c("age_onset", "age_last_visit", "age_death")
dat[convert_to_numeric] = lapply(dat[convert_to_numeric], function(x) as.numeric(as.character(x)))
summary(dat)
 #         IID        cohort   sex   status Race   Ethnicity   age_onset  age_last_visit 
 # NACC000875:1   ADC8_AA:10   1:2   1:6    2:10   0:10      Min.   :63   Min.   :70.00  
 # NACC003779:1                2:8   2:4                     1st Qu.:66   1st Qu.:70.25  
 # NACC006805:1                                              Median :69   Median :75.50  
 # NACC008215:1                                              Mean   :70   Mean   :76.70  
 # NACC010067:1                                              3rd Qu.:73   3rd Qu.:81.00  
 # NACC010592:1                                              Max.   :79   Max.   :89.00  
 # (Other)   :4                                              NA's   :6                   
 #   age_death     apoe4any
 # Min.   :72.00   0:3     
 # 1st Qu.:80.00   1:7     
 # Median :88.00           
 # Mean   :83.33           
 # 3rd Qu.:89.00           
 # Max.   :90.00           
 # NA's   :7            

请参阅 this common FAQ 以了解我的因子到数字转换的说明。

如果您只想汇总您提到的列,您也可以对数据进行子集化:

summary(dat[c("sex", convert_to_numeric, "apoe4any")])
 # sex     age_onset  age_last_visit    age_death     apoe4any
 # 1:2   Min.   :63   Min.   :70.00   Min.   :72.00   0:3     
 # 2:8   1st Qu.:66   1st Qu.:70.25   1st Qu.:80.00   1:7     
 #       Median :69   Median :75.50   Median :88.00           
 #       Mean   :70   Mean   :76.70   Mean   :83.33           
 #       3rd Qu.:73   3rd Qu.:81.00   3rd Qu.:89.00           
 #       Max.   :79   Max.   :89.00   Max.   :90.00           
 #       NA's   :6                    NA's   :7