在 R 中,如何为多个 csv 文件制作箱线图并导出为 pdf 文件

In R, how can I make boxplot for multiple csv files and export as pdf files

我想为 80 个 csv 文件制作箱线图,文件名如下所示:-NY_two.csvCA_three.csvFL_three.csv....NY_ten.csv.

理想的包括

(I) 箱线图(导出为 pdf,每页 2 个图)

80 个 csv 文件中的 3 个见下文


# All 80 files have the same column names - state, dept, year and revenue

#copy and paste to generate 3 out of 80 csv, 

# The datasets generated below represent 3 out of the 80 csv files

# Dataset 1
state <-c("NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY","NY")

dept <- c("energy","energy","energy","energy","works",'works','works','works','fin','fin','fin','fin','parks','parks','parks','parks','trans','trans','trans','trans')
year <- c("two","two","two","two","two","two","two","two","two","two","two","two","two","two","two","two","two","two","two","two")
revenue <-c(1212.9,1253,1244.4,5123.5,1312,3134,515.8,2449.9,3221.6,3132.5,2235.09,2239.01,3235.01,5223.01,4235.6,2204.5,2315.5,6114,4512,3514.2)

NY_two <-data.frame(state,dept,year,revenue)


# Dataset 2
state <- rep("FL",20)

dept <- c("energy","energy","energy","energy","works",'works','works','works','fin','fin','fin','fin','parks','parks','parks','parks','trans','trans','trans','trans')
year <- rep("three",20)
revenue <-c(112.9,123,124,523.5,112,334,55,449,221.6,332,235,239,235,223,235.6,204,315.5,614,512,514.2)

FL_three <- data.frame(state,dept,year,revenue)

# Dataset 3
state <- rep("CA",20)

dept <- c("energy","energy","energy","energy","works",'works','works','works','fin','fin','fin','fin','parks','parks','parks','parks','trans','trans','trans','trans')
year <- rep("three",20)
revenue <-c(1102.9,1023,1024,5203.5,1012,3034,505,4049,2021.6,3032,2035,2039,2035,2023,2035.6,2004,3015.5,6014,5012,5014.2)

CA_three <- data.frame(state,dept,year,revenue)

# exporting the the above datasets as csv files ( imagine them as 3 out of the 80 files)
# set the path in the write.csv(/path/.csv) to collect the datasets

write.csv(NY_two,"C:\Path to export the DataFrame\NY_two.csv", row.names = FALSE)
write.csv(FL_three,"C:\Path to export the DataFrame\FL_three.csv", row.names = FALSE)
write.csv(CA_three,"C:\Path to export the DataFrame\CA_three.csv", row.names = FALSE)

我的尝试

# Desirables include
#(I) plot the boxplot & export as pdf file (2 graphs per page)

######################################################################################

library(ggplot2)

# import all csv files in the folder
files <- list.files("C:\path to the files\", pattern="*.csv", full.names = T)
files

# set the pdf file path, I want two plots per page
pdf(file = "/Users/Desktop/boxplot_anova.pdf")

#specify to save plots in 2x2 grid
par(mfrow = c(2,2))

out <- lapply(1:length(files), function(idx) {
  # read the file
  this_data <- read.csv(files[idx], header = TRUE) # choose TRUE/FALSE accordingly
  # boxplot using ggplot
   p <-ggplot(this_data, aes(x = dept, y = revenue, fill = dept)) + 
       stat_boxplot(geom = "errorbar", width = 0.15) + geom_boxplot(alpha = 0.8,    # Fill transparency
                                            colour = "#474747",   # Border color
                                            outlier.colour = 1)+ theme(panel.background = element_blank())+ ggtitle("Title using each file name ")

  p
dev.off() 
})

out

请分享您的代码,提前谢谢

有几个单独的问题可能会导致您的代码出现问题:

  1. 函数中生成的绘图可能无法正确导出(使用 plot(p)print(p) 而不是 p)。
  2. 您必须在循环之前打开 pdf 设备并在之后关闭它,而不是在循环内。例如。这原则上可行:
pdf(file = "boxplot_anova.pdf")
#specify to save plots in 2x2 grid
par(mfrow = c(2,2))
out <- lapply(1:length(files), function(idx) {
  # read the file
  this_data <- read.csv(files[idx], header = TRUE) # choose TRUE/FALSE accordingly
  # boxplot using ggplot
   p <-ggplot(this_data, aes(x = dept, y = revenue, fill = dept)) + 
       stat_boxplot(geom = "errorbar", width = 0.15) + geom_boxplot(alpha = 0.8,    # Fill transparency
                                            colour = "#474747",   # Border color
                                            outlier.colour = 1)+ theme(panel.background = element_blank())+ ggtitle("Title using each file name ")
   plot(p)
})
out
dev.off() 
  1. 上面的代码不会在同一页上绘制(最多 4 个,您希望从 mfrow(2,2) 中得到)绘图,因为 ggplot2 不使用基本图形。使用例如cowplot 包中的 plot_grid 函数来实现这一点。要生成多个页面,请将绘图列表拆分为匹配数量的元素,例如每页 4 个图:
res <- lapply(files, function(x){
    this_data <- read.csv(x, header = TRUE) # choose TRUE/FALSE accordingly
  # boxplot using ggplot
   ggplot(this_data, aes(x = dept, y = revenue, fill = dept)) + 
       stat_boxplot(geom = "errorbar", width = 0.15) + 
       geom_boxplot(alpha = 0.8,    # Fill transparency
           colour = "#474747",   # Border color
           outlier.colour = 1)+ 
       theme(panel.background = element_blank()) + 
       ggtitle(gsub("(.*/)(.*)(.csv)", "\2", x))
})

# set the pdf file path, I want two plots per page
pdf(file = "boxplot_anova.pdf")
lapply(split(res, ceiling(seq_along(res)/4)), 
    function(x) plot_grid(plotlist=x, ncol=2, nrow=2))
dev.off()