总结具有连续变量和分类变量的数据集

Question

如果一个数据集有混合变量：数值型和分类型，除了summary(dataset)，有没有办法总结它，其中分类变量包括每个类别的计数和均值，包括sd对于数值变量？

目前我编写了一个代码片段来在检查每一列是数字还是分类后生成一个列表。但是一个更简单的函数会很有用。

一个示例可以是 data.frame(v1 = c(1:3),v2= c("a","b","b"))，如果需要的话输出是：

V1，类型（num/cat），均值（v1），sd（v1） V2，类型（num/cat），a，计数（a），b，计数（b）

Answer 1

我认为您正在寻找包 'Hmisc' 中的函数 describe()。有关详细信息，请参阅 the documentation。

Answer 2

是的，我正在查看 table 的分类变量和均值 + sd 的数值变量。对于研究论文中的描述性统计，通常报告如下。

我写了以下内容：

agg_function <- function(data_agg)
{
desc_list <- list()

    for(j in 1:ncol(data_agg))
    {
        if(is.factor(data_agg[,j]))
        {
          desc_list[[j]] <- list(Variable = colnames(data_agg) [j],table(data_agg[,j]))   ## Table of counts of labels of categorical variables
        }
        else  
        {
          desc_list[[j]] <- data.frame(Variable = colnames(data_agg)[j],Mean=mean(data_agg[,j],na.rm=T),SD = sd(data_agg[,j],na.rm=T)) ## First and second moments of numerical variables
        }
}
return(desc_list)
}

但是有更高效的解决方案吗？

总结具有连续变量和分类变量的数据集

Summarizing a dataset with continuous and categorical variables

r

summary

categorical-data