return 数值类型平均值的函数

Function to return the mean of type numeric

我有一个 df,它有不同的字段,每个字段有不同的数据类型:日期、数字、因子等。例如:

ID<- c(1,2,3)
AGE <- c(25,32,28)
SEX <- c(1,0,0)
HEIGHT <- c(152,172, 163)
WEIGHT <-c(65,53,70)
DF<-data.frame(ID, AGE, SEX, HEIGHT, WEIGHT)

我有几个这样的数据集,因此我想创建一个函数来 return 一个摘要。此摘要将由字段的平均值组成(仅当字段为数字时)和水平数(如果字段为因子)。

您对 "number of levels if the field is a factor" 的约束有点简化。如果一个字段是 character 怎么办? logical?

第一个例子。我将添加一个向量 factor:

FAC <- c('abc','abc','def')
DF <- data.frame(ID, AGE, SEX, HEIGHT, WEIGHT, FAC)
lapply(DF, function(x) if (is.numeric(x)) mean(x) else length(levels(x)))
# $ID
# [1] 2
# $AGE
# [1] 28.33333
# $SEX
# [1] 0.3333333
# $HEIGHT
# [1] 162.3333
# $WEIGHT
# [1] 62.66667
# $FAC
# [1] 2

如果你需要更多的控制,也许你可以扩展它:

lapply(DF, function(x) {
  if (is.logical(x)) x <- 1*x                    # turns into numeric, will show percentage of TRUE
  if (is.numeric(x)) return(mean(x))             # mean
  if (is.factor(x)) return(length(levels(x)))    # number of levels
  if (is.character(x)) return(length(unique(x))) # number of distinct strings, similar to levels
  if (inherits(x, "POSIXct")) return(min(x))     # min date/time
  return("oops")
})

您当然可以组合一些,例如 is.numeric(x) || is.logical(x) 甚至 is.numeric(x) || inherits(x, "POSIXct)

ID<- factor(c(1,2,3))

AGE <- c(25,32,28)
SEX <- factor(c('Male','Female','Male'))
HEIGHT <- c(152,172, 163)
WEIGHT <-c(65,53,70)

DF<-data.frame(ID, AGE, SEX, HEIGHT, WEIGHT)

# df_summary <- function(df) {
#   col_types <- sapply(df,class)
#   return(col_types)
# }

df_summary <- function(df) {
  numeric_cols <- df[,which(sapply(df, class) == "numeric")]
  numeric_means_df <- data.frame('Mean' = sapply(numeric_cols,mean))
  factor_cols <- df[,which(sapply(df,class) == 'factor')]
  factor_cols_summary <- summary(factor_cols)
  return(list('Numeric' = numeric_means_df,'Factors' = factor_cols_summary))

}

print(df_summary(DF))