如何使用 R 中的数据帧列表为箱线图编写函数

How to write a function for boxplot using list of dataframes in R

在下面的数据框中

set.seed(123)
code <- c(5001,5001,5250,5250,5425,5425,5610,5610,5910,5910,5010,5010,6110,6110,6135,6135,6220,6220,6550,6550)
county <- c("A01","A01","A01","A02","A01","A02","A03","A03","A01","A02","A03","A04","A01","A01","A01","A01","A01","A01","A02","A02")
state <- c("PA","PA","NY","NY","DE","DE","PA","PA","NY","NY","PA","PA","NY","NY","DE","DE","PA","PA","NY","NY")
dept <- c("energy",'energy','edu','hous','hous','edu','energy','energy','hous','hous','edu','hous','hous','energy','energy',"energy",'energy','edu','hous','hous')
year <- c(2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004)
corp_tax <- runif(20, min=5, max=200)
income_tax <- runif(20, min=4, max=175)
bonus <- runif(20, min=10, max=211)
length(dept)
df <- data.frame(code, state, county, year,dept,corp_tax,income_tax,bonus)
df

# for each year 
subset(df,year == 2001)

我需要帮助编写一个 user defined function 接受数据帧并执行以下操作:

(1)。 select yearstatedeptcorp_tax

(2)。对于每个 year 中的每个唯一 state,绘制按 dept 分组的 corp_tax 的箱线图。例如,对于 2001 年,我们将分别有 PANYDE 的箱线图。

(3)。在 pdf

中导出图表(每页 2 个数字)

下面是我的尝试:

library(ggplot2)
library(dplyr)

boxplotter <-function(data){

    #  select the columns
    new_data <-data%>%select(year,state,dept,corp_tax)
    
    #  split the data based on unique years 
    split_data <-split(new_data,new_data$year)
    
    # set the pdf for the plot
    pdf("boxplotter.pdf", 7, 5)
    #################This where I need help with the most, the looping process#####################
    # Looping through each state in each year 
    for (i in seq(1, length(unique(split_data$state)), 10)){
        
        # the actual plot
        ggplot(split_data, aes(x=dept, y=corp_tax)) + geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
        scale_y_continuous(limits=c(0, max(split_data$corp_tax, na.rm=TRUE))) +
        scale_x_continuous(limits=c(0, max(split_data$corp_tax))) 
    dev.off()
}
#testing my function
boxplotter(df)

所需的输出应如下所示:

我对其他方法持开放态度,请分享您的代码。谢谢!!

这是一个解决方案。
拆分数据时,在同一条指令中按yearstate拆分。然后循环遍历绘制每个数据集的拆分列表。使用 ggave.

保存

在下面的函数中,输出文件名取决于组合 year/state,我包含了一个参数 verbose,它在文件名写入磁盘时打印文件名。

library(ggplot2)

boxplotter <- function(X, file = "boxplotter%s.pdf", width = 7, height = 5, verbose = FALSE){
  # create a list of data.frame's by year and state
  year_list <- split(X, list(X[["year"]], X[["state"]]), sep = "_")
  # remove from the list the empty sub-lists. This is needed
  # because there might be combinations of year/state not
  # present in the input data and 'split' will create them 
  # anyway
  year_list <- year_list[sapply(year_list, nrow) > 0L]
  
  # loop with an index into the list to make it possible
  # to get the data and also the names attribute, used
  # to form the output filenames
  for(i in seq_along(year_list)){
    # work with a copy, this just makes the code that
    # follows easier to read
    Y <- year_list[[i]]
    # plot and save the plot
    filename <- sprintf(file, names(year_list)[i])
    g <- ggplot(Y, aes(x=dept, y=corp_tax)) + 
      geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
      scale_y_continuous(limits=c(0, max(Y$corp_tax, na.rm=TRUE)))
    ggsave(filename, plot = g, device = "pdf", width = width, height = height)
    # want to see what was written to disk?
    if(verbose){
      msg <- paste("output file:", filename)
      message(msg)
    }
  }
  # return nothing
  invisible(NULL)
}

boxplotter(df, verbose = TRUE)