将来自多个数据帧的相同变量汇总到一个 table

Summarize the same variables from multiple dataframes in one table

我有来自几个数据集的选民和政党数据,我进一步将它们分成不同的数据框和列表以使其具有可比性。我可以单独对它们中的每一个使用 summary 命令,然后手动比较,但我想知道是否有一种方法可以将它们全部组合成一个 table?

这是我所拥有的示例:

> summary(eco$rilenew)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      3       4       4       4       4       5 
> summary(ecovoters)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   3.000   4.000   3.744   5.000  10.000      26 
> summary(lef$rilenew)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   3.000   3.000   3.692   4.000   7.000 
> summary(lefvoters)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   2.000   3.000   3.612   5.000  10.000     332
> summary(soc$rilenew)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   4.000   4.000   4.143   5.000   6.000 
> summary(socvoters)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   3.000   4.000   3.674   5.000  10.000     346 

有没有一种方法可以将这些列表(ecovoters、lefvoters、socvoters 等)和数据框变量(eco$rilenew、lef$rilenew、soc$rilenew 等)汇总在一起,并将它们合二为一table?

您可以将所有内容放入一个列表中,然后用一个小的自定义函数进行总结。

L <- list(eco$rilenew, ecovoters, lef$rilenew, 
          lefvoters, soc$rilenew, socvoters)

t(sapply(L, function(x) {
  s <- summary(x)
  length(s) <- 7
  names(s)[7] <- "NA's"
  s[7] <- ifelse(!any(is.na(x)), 0, s[7])
  return(s)
  }))
           Min.   1st Qu.   Median     Mean  3rd Qu.      Max. NA's
[1,]  0.9820673 3.3320662 3.958665 3.949512 4.625109  7.229069    0
[2,] -4.8259384 0.5028293 3.220546 3.301452 6.229384  9.585749   26
[3,] -0.3717391 2.3280366 3.009360 3.013908 3.702156  6.584659    0
[4,] -2.6569493 1.6674330 3.069440 3.015325 4.281100  8.808432  332
[5,] -2.3625651 2.4964361 3.886673 3.912009 5.327401 10.349040    0
[6,] -2.4719404 1.3635785 2.790523 2.854812 4.154936  8.491347  346

数据

set.seed(42)
eco <- data.frame(rilenew=rnorm(800, 4, 1))
ecovoters <- rnorm(75, 4, 4)
ecovoters[sample(length(ecovoters), 26)] <- NA
lef <- data.frame(rilenew=rnorm(900, 3, 1))
lefvoters <- rnorm(700, 3, 2)
lefvoters[sample(length(lefvoters), 332)] <- NA
soc <- data.frame(rilenew=rnorm(900, 4, 2))
socvoters <- rnorm(700, 3, 2)
socvoters[sample(length(socvoters), 346)] <- NA

可以使用maptidyverse得到汇总列表,然后如果你想要结果作为dataframe,那么plyr::ldply可以帮助将列表转换为dataframe:

ll = map(L, summary)

ll

plyr::ldply(ll, rbind)

> ll = map(L, summary)
> ll
[[1]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9821  3.3321  3.9587  3.9495  4.6251  7.2291 

[[2]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -4.331   1.347   3.726   3.793   6.653  16.845      26 

[[3]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3717  2.3360  3.0125  3.0174  3.7022  6.5847 

[[4]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -2.657   1.795   3.039   3.013   4.395   9.942     332 

[[5]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -2.363   2.503   3.909   3.920   5.327  10.349 

[[6]]
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 -3.278   1.449   2.732   2.761   4.062   8.171     346 

> plyr::ldply(ll, rbind)
        Min.  1st Qu.   Median     Mean  3rd Qu.      Max. NA's
1  0.9820673 3.332066 3.958665 3.949512 4.625109  7.229069   NA
2 -4.3312551 1.346532 3.725708 3.793431 6.652917 16.844796   26
3 -0.3717391 2.335959 3.012507 3.017438 3.702156  6.584659   NA
4 -2.6569493 1.795307 3.038905 3.012928 4.395338  9.941819  332
5 -2.3625651 2.503324 3.908727 3.920050 5.327401 10.349040   NA
6 -3.2779863 1.448814 2.732515 2.760569 4.061854  8.170793  346