扩展 R 的汇总函数(或创建具有类似输出的新函数)以将因子显示为总数的百分比

Extending R's summary function (or creating a new function with similar output) to display factors as percent of total

有没有办法轻松扩展 R 的 summary() 函数(或创建具有类似输出的新函数)以将因子显示为总数的百分比?

summary(chickwts)
#     weight             feed   
# Min.   :108.0   casein   :12  
# 1st Qu.:204.5   horsebean:10  
# Median :258.0   linseed  :12  
# Mean   :261.3   meatmeal :11  
# 3rd Qu.:323.5   soybean  :14  
# Max.   :423.0   sunflower:12  

期望的输出:

pct_summary(chickwts)
#     weight             feed   
# Min.   :108.0   casein   :17%  
# 1st Qu.:204.5   horsebean:14% 
# Median :258.0   linseed  :17% 
# Mean   :261.3   meatmeal :15%  
# 3rd Qu.:323.5   soybean  :20%  
# Max.   :423.0   sunflower:17%  

# Or even this...
#     weight             feed   
# Min.   :108.0   casein   :12 17%  
# 1st Qu.:204.5   horsebean:10 14% 
# Median :258.0   linseed  :12 17% 
# Mean   :261.3   meatmeal :11 15%  
# 3rd Qu.:323.5   soybean  :14 20%  
# Max.   :423.0   sunflower:12 17%  

我找到的最接近的是Hmisc::describe()

您可以非常奇怪地重新编码部分函数体。

## Rework a piece of the body
mysummary <- summary.factor
body(mysummary)[[5]] <- quote(
    tbl <- round(table(object)/sum(table(object))*100)
)

summary.factor(chickwts$feed)
#   casein horsebean   linseed  meatmeal   soybean sunflower 
#       12        10        12        11        14        12 
mysummary(chickwts$feed)
#   casein horsebean   linseed  meatmeal   soybean sunflower 
#       17        14        17        15        20        17 

这可能是一个比您正在寻找的更复杂的解决方案,但您可以对 summary.data.frame 做类似的事情并告诉它在您的示例中使用修改后的 summary.factor

所以,它看起来像这样

mysumm <- summary.data.frame
body(mysumm)[[3]] <- quote(
    z <- lapply(X=as.list(object), FUN=function(x) if (is.factor(x)) mysummary(x) else summary(x))
)
mysumm(chickwts)
#      weight             feed   
#  Min.   :108.0   casein   :17  
#  1st Qu.:204.5   horsebean:14  
#  Median :258.0   linseed  :17  
#  Mean   :261.3   meatmeal :15  
#  3rd Qu.:323.5   soybean  :20  
#  Max.   :423.0   sunflower:17  

注意:为了缩短代码,我忽略了 summary 的其他参数,但您可以添加这些参数以通过通用 summary 方法传递。

我们可以借鉴现有的摘要例程,并通过为因子提供瞬态额外 class 属性来减少侵入性。

summary.my.factor<-function(object,...) {
  x<-prop.table(table(object))
  setNames(sprintf("%1.2f%%",100*x),names(x))
}

my.summary<-function(object,...) {
  f<-function(x) if(inherits(x,"factor")) structure(x,class=c("my.factor",class(x))) else x
  summary(as.data.frame(lapply(object,f)),...)
}

my.summary(chickwts)
     weight             feed       
 Min.   :108.0   casein   :16.90%  
 1st Qu.:204.5   horsebean:14.08%  
 Median :258.0   linseed  :16.90%  
 Mean   :261.3   meatmeal :15.49%  
 3rd Qu.:323.5   soybean  :19.72%  
 Max.   :423.0   sunflower:16.90%  

在格式化 my.factor.

时,我没有费心去尊重 digits 等任何选项

糟糕而危险的方式:

# backup original summary.factor
original_summary_factor = base::summary.factor

# our new summary.factor
summary.factor = function(object,maxsum = 100, ...){
    res = original_summary_factor(object = object, maxsum = maxsum, ...)
    pct = round(res/length(object)*100)
    setNames(paste0(res, " ", pct, "%"),names(res))
}

# DANGEROUS CODE. USE IT AT YOUR OWN RISK.
# Here we replace original summary.factor with the new one
unlockBinding("summary.factor", as.environment("package:base"))
assignInNamespace("summary.factor", summary.factor, ns="base", envir=as.environment("package:base"))
assign("summary.factor", summary.factor, as.environment("package:base"))
lockBinding("summary.factor", as.environment("package:base"))

summary(chickwts)
# weight             feed       
# Min.   :108.0   casein   :12 17%  
# 1st Qu.:204.5   horsebean:10 14%  
# Median :258.0   linseed  :12 17%  
# Mean   :261.3   meatmeal :11 15%  
# 3rd Qu.:323.5   soybean  :14 20%  
# Max.   :423.0   sunflower:12 17%