扩展 R 的汇总函数(或创建具有类似输出的新函数)以将因子显示为总数的百分比
Extending R's summary function (or creating a new function with similar output) to display factors as percent of total
有没有办法轻松扩展 R 的 summary()
函数(或创建具有类似输出的新函数)以将因子显示为总数的百分比?
summary(chickwts)
# weight feed
# Min. :108.0 casein :12
# 1st Qu.:204.5 horsebean:10
# Median :258.0 linseed :12
# Mean :261.3 meatmeal :11
# 3rd Qu.:323.5 soybean :14
# Max. :423.0 sunflower:12
期望的输出:
pct_summary(chickwts)
# weight feed
# Min. :108.0 casein :17%
# 1st Qu.:204.5 horsebean:14%
# Median :258.0 linseed :17%
# Mean :261.3 meatmeal :15%
# 3rd Qu.:323.5 soybean :20%
# Max. :423.0 sunflower:17%
# Or even this...
# weight feed
# Min. :108.0 casein :12 17%
# 1st Qu.:204.5 horsebean:10 14%
# Median :258.0 linseed :12 17%
# Mean :261.3 meatmeal :11 15%
# 3rd Qu.:323.5 soybean :14 20%
# Max. :423.0 sunflower:12 17%
我找到的最接近的是Hmisc::describe()
。
您可以非常奇怪地重新编码部分函数体。
## Rework a piece of the body
mysummary <- summary.factor
body(mysummary)[[5]] <- quote(
tbl <- round(table(object)/sum(table(object))*100)
)
summary.factor(chickwts$feed)
# casein horsebean linseed meatmeal soybean sunflower
# 12 10 12 11 14 12
mysummary(chickwts$feed)
# casein horsebean linseed meatmeal soybean sunflower
# 17 14 17 15 20 17
这可能是一个比您正在寻找的更复杂的解决方案,但您可以对 summary.data.frame
做类似的事情并告诉它在您的示例中使用修改后的 summary.factor
。
所以,它看起来像这样
mysumm <- summary.data.frame
body(mysumm)[[3]] <- quote(
z <- lapply(X=as.list(object), FUN=function(x) if (is.factor(x)) mysummary(x) else summary(x))
)
mysumm(chickwts)
# weight feed
# Min. :108.0 casein :17
# 1st Qu.:204.5 horsebean:14
# Median :258.0 linseed :17
# Mean :261.3 meatmeal :15
# 3rd Qu.:323.5 soybean :20
# Max. :423.0 sunflower:17
注意:为了缩短代码,我忽略了 summary 的其他参数,但您可以添加这些参数以通过通用 summary 方法传递。
我们可以借鉴现有的摘要例程,并通过为因子提供瞬态额外 class 属性来减少侵入性。
summary.my.factor<-function(object,...) {
x<-prop.table(table(object))
setNames(sprintf("%1.2f%%",100*x),names(x))
}
my.summary<-function(object,...) {
f<-function(x) if(inherits(x,"factor")) structure(x,class=c("my.factor",class(x))) else x
summary(as.data.frame(lapply(object,f)),...)
}
my.summary(chickwts)
weight feed
Min. :108.0 casein :16.90%
1st Qu.:204.5 horsebean:14.08%
Median :258.0 linseed :16.90%
Mean :261.3 meatmeal :15.49%
3rd Qu.:323.5 soybean :19.72%
Max. :423.0 sunflower:16.90%
在格式化 my.factor
.
时,我没有费心去尊重 digits
等任何选项
糟糕而危险的方式:
# backup original summary.factor
original_summary_factor = base::summary.factor
# our new summary.factor
summary.factor = function(object,maxsum = 100, ...){
res = original_summary_factor(object = object, maxsum = maxsum, ...)
pct = round(res/length(object)*100)
setNames(paste0(res, " ", pct, "%"),names(res))
}
# DANGEROUS CODE. USE IT AT YOUR OWN RISK.
# Here we replace original summary.factor with the new one
unlockBinding("summary.factor", as.environment("package:base"))
assignInNamespace("summary.factor", summary.factor, ns="base", envir=as.environment("package:base"))
assign("summary.factor", summary.factor, as.environment("package:base"))
lockBinding("summary.factor", as.environment("package:base"))
summary(chickwts)
# weight feed
# Min. :108.0 casein :12 17%
# 1st Qu.:204.5 horsebean:10 14%
# Median :258.0 linseed :12 17%
# Mean :261.3 meatmeal :11 15%
# 3rd Qu.:323.5 soybean :14 20%
# Max. :423.0 sunflower:12 17%
有没有办法轻松扩展 R 的 summary()
函数(或创建具有类似输出的新函数)以将因子显示为总数的百分比?
summary(chickwts)
# weight feed
# Min. :108.0 casein :12
# 1st Qu.:204.5 horsebean:10
# Median :258.0 linseed :12
# Mean :261.3 meatmeal :11
# 3rd Qu.:323.5 soybean :14
# Max. :423.0 sunflower:12
期望的输出:
pct_summary(chickwts)
# weight feed
# Min. :108.0 casein :17%
# 1st Qu.:204.5 horsebean:14%
# Median :258.0 linseed :17%
# Mean :261.3 meatmeal :15%
# 3rd Qu.:323.5 soybean :20%
# Max. :423.0 sunflower:17%
# Or even this...
# weight feed
# Min. :108.0 casein :12 17%
# 1st Qu.:204.5 horsebean:10 14%
# Median :258.0 linseed :12 17%
# Mean :261.3 meatmeal :11 15%
# 3rd Qu.:323.5 soybean :14 20%
# Max. :423.0 sunflower:12 17%
我找到的最接近的是Hmisc::describe()
。
您可以非常奇怪地重新编码部分函数体。
## Rework a piece of the body
mysummary <- summary.factor
body(mysummary)[[5]] <- quote(
tbl <- round(table(object)/sum(table(object))*100)
)
summary.factor(chickwts$feed)
# casein horsebean linseed meatmeal soybean sunflower
# 12 10 12 11 14 12
mysummary(chickwts$feed)
# casein horsebean linseed meatmeal soybean sunflower
# 17 14 17 15 20 17
这可能是一个比您正在寻找的更复杂的解决方案,但您可以对 summary.data.frame
做类似的事情并告诉它在您的示例中使用修改后的 summary.factor
。
所以,它看起来像这样
mysumm <- summary.data.frame
body(mysumm)[[3]] <- quote(
z <- lapply(X=as.list(object), FUN=function(x) if (is.factor(x)) mysummary(x) else summary(x))
)
mysumm(chickwts)
# weight feed
# Min. :108.0 casein :17
# 1st Qu.:204.5 horsebean:14
# Median :258.0 linseed :17
# Mean :261.3 meatmeal :15
# 3rd Qu.:323.5 soybean :20
# Max. :423.0 sunflower:17
注意:为了缩短代码,我忽略了 summary 的其他参数,但您可以添加这些参数以通过通用 summary 方法传递。
我们可以借鉴现有的摘要例程,并通过为因子提供瞬态额外 class 属性来减少侵入性。
summary.my.factor<-function(object,...) {
x<-prop.table(table(object))
setNames(sprintf("%1.2f%%",100*x),names(x))
}
my.summary<-function(object,...) {
f<-function(x) if(inherits(x,"factor")) structure(x,class=c("my.factor",class(x))) else x
summary(as.data.frame(lapply(object,f)),...)
}
my.summary(chickwts)
weight feed Min. :108.0 casein :16.90% 1st Qu.:204.5 horsebean:14.08% Median :258.0 linseed :16.90% Mean :261.3 meatmeal :15.49% 3rd Qu.:323.5 soybean :19.72% Max. :423.0 sunflower:16.90%
在格式化 my.factor
.
digits
等任何选项
糟糕而危险的方式:
# backup original summary.factor
original_summary_factor = base::summary.factor
# our new summary.factor
summary.factor = function(object,maxsum = 100, ...){
res = original_summary_factor(object = object, maxsum = maxsum, ...)
pct = round(res/length(object)*100)
setNames(paste0(res, " ", pct, "%"),names(res))
}
# DANGEROUS CODE. USE IT AT YOUR OWN RISK.
# Here we replace original summary.factor with the new one
unlockBinding("summary.factor", as.environment("package:base"))
assignInNamespace("summary.factor", summary.factor, ns="base", envir=as.environment("package:base"))
assign("summary.factor", summary.factor, as.environment("package:base"))
lockBinding("summary.factor", as.environment("package:base"))
summary(chickwts)
# weight feed
# Min. :108.0 casein :12 17%
# 1st Qu.:204.5 horsebean:10 14%
# Median :258.0 linseed :12 17%
# Mean :261.3 meatmeal :11 15%
# 3rd Qu.:323.5 soybean :14 20%
# Max. :423.0 sunflower:12 17%