如何使用 R 中的数据帧列表为箱线图编写函数
How to write a function for boxplot using list of dataframes in R
在下面的数据框中
set.seed(123)
code <- c(5001,5001,5250,5250,5425,5425,5610,5610,5910,5910,5010,5010,6110,6110,6135,6135,6220,6220,6550,6550)
county <- c("A01","A01","A01","A02","A01","A02","A03","A03","A01","A02","A03","A04","A01","A01","A01","A01","A01","A01","A02","A02")
state <- c("PA","PA","NY","NY","DE","DE","PA","PA","NY","NY","PA","PA","NY","NY","DE","DE","PA","PA","NY","NY")
dept <- c("energy",'energy','edu','hous','hous','edu','energy','energy','hous','hous','edu','hous','hous','energy','energy',"energy",'energy','edu','hous','hous')
year <- c(2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004)
corp_tax <- runif(20, min=5, max=200)
income_tax <- runif(20, min=4, max=175)
bonus <- runif(20, min=10, max=211)
length(dept)
df <- data.frame(code, state, county, year,dept,corp_tax,income_tax,bonus)
df
# for each year
subset(df,year == 2001)
我需要帮助编写一个 user defined function
接受数据帧并执行以下操作:
(1)。 select year
、state
、dept
、corp_tax
列
(2)。对于每个 year
中的每个唯一 state
,绘制按 dept
分组的 corp_tax
的箱线图。例如,对于 2001
年,我们将分别有 PA
、NY
和 DE
的箱线图。
(3)。在 pdf
中导出图表(每页 2 个数字)
下面是我的尝试:
library(ggplot2)
library(dplyr)
boxplotter <-function(data){
# select the columns
new_data <-data%>%select(year,state,dept,corp_tax)
# split the data based on unique years
split_data <-split(new_data,new_data$year)
# set the pdf for the plot
pdf("boxplotter.pdf", 7, 5)
#################This where I need help with the most, the looping process#####################
# Looping through each state in each year
for (i in seq(1, length(unique(split_data$state)), 10)){
# the actual plot
ggplot(split_data, aes(x=dept, y=corp_tax)) + geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits=c(0, max(split_data$corp_tax, na.rm=TRUE))) +
scale_x_continuous(limits=c(0, max(split_data$corp_tax)))
dev.off()
}
#testing my function
boxplotter(df)
所需的输出应如下所示:
我对其他方法持开放态度,请分享您的代码。谢谢!!
这是一个解决方案。
拆分数据时,在同一条指令中按year
和state
拆分。然后循环遍历绘制每个数据集的拆分列表。使用 ggave
.
保存
在下面的函数中,输出文件名取决于组合 year/state,我包含了一个参数 verbose
,它在文件名写入磁盘时打印文件名。
library(ggplot2)
boxplotter <- function(X, file = "boxplotter%s.pdf", width = 7, height = 5, verbose = FALSE){
# create a list of data.frame's by year and state
year_list <- split(X, list(X[["year"]], X[["state"]]), sep = "_")
# remove from the list the empty sub-lists. This is needed
# because there might be combinations of year/state not
# present in the input data and 'split' will create them
# anyway
year_list <- year_list[sapply(year_list, nrow) > 0L]
# loop with an index into the list to make it possible
# to get the data and also the names attribute, used
# to form the output filenames
for(i in seq_along(year_list)){
# work with a copy, this just makes the code that
# follows easier to read
Y <- year_list[[i]]
# plot and save the plot
filename <- sprintf(file, names(year_list)[i])
g <- ggplot(Y, aes(x=dept, y=corp_tax)) +
geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits=c(0, max(Y$corp_tax, na.rm=TRUE)))
ggsave(filename, plot = g, device = "pdf", width = width, height = height)
# want to see what was written to disk?
if(verbose){
msg <- paste("output file:", filename)
message(msg)
}
}
# return nothing
invisible(NULL)
}
boxplotter(df, verbose = TRUE)
在下面的数据框中
set.seed(123)
code <- c(5001,5001,5250,5250,5425,5425,5610,5610,5910,5910,5010,5010,6110,6110,6135,6135,6220,6220,6550,6550)
county <- c("A01","A01","A01","A02","A01","A02","A03","A03","A01","A02","A03","A04","A01","A01","A01","A01","A01","A01","A02","A02")
state <- c("PA","PA","NY","NY","DE","DE","PA","PA","NY","NY","PA","PA","NY","NY","DE","DE","PA","PA","NY","NY")
dept <- c("energy",'energy','edu','hous','hous','edu','energy','energy','hous','hous','edu','hous','hous','energy','energy',"energy",'energy','edu','hous','hous')
year <- c(2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2003,2003,2003,2003,2003,2004,2004,2004,2004,2004)
corp_tax <- runif(20, min=5, max=200)
income_tax <- runif(20, min=4, max=175)
bonus <- runif(20, min=10, max=211)
length(dept)
df <- data.frame(code, state, county, year,dept,corp_tax,income_tax,bonus)
df
# for each year
subset(df,year == 2001)
我需要帮助编写一个 user defined function
接受数据帧并执行以下操作:
(1)。 select year
、state
、dept
、corp_tax
列
(2)。对于每个 year
中的每个唯一 state
,绘制按 dept
分组的 corp_tax
的箱线图。例如,对于 2001
年,我们将分别有 PA
、NY
和 DE
的箱线图。
(3)。在 pdf
下面是我的尝试:
library(ggplot2)
library(dplyr)
boxplotter <-function(data){
# select the columns
new_data <-data%>%select(year,state,dept,corp_tax)
# split the data based on unique years
split_data <-split(new_data,new_data$year)
# set the pdf for the plot
pdf("boxplotter.pdf", 7, 5)
#################This where I need help with the most, the looping process#####################
# Looping through each state in each year
for (i in seq(1, length(unique(split_data$state)), 10)){
# the actual plot
ggplot(split_data, aes(x=dept, y=corp_tax)) + geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits=c(0, max(split_data$corp_tax, na.rm=TRUE))) +
scale_x_continuous(limits=c(0, max(split_data$corp_tax)))
dev.off()
}
#testing my function
boxplotter(df)
所需的输出应如下所示:
我对其他方法持开放态度,请分享您的代码。谢谢!!
这是一个解决方案。
拆分数据时,在同一条指令中按year
和state
拆分。然后循环遍历绘制每个数据集的拆分列表。使用 ggave
.
在下面的函数中,输出文件名取决于组合 year/state,我包含了一个参数 verbose
,它在文件名写入磁盘时打印文件名。
library(ggplot2)
boxplotter <- function(X, file = "boxplotter%s.pdf", width = 7, height = 5, verbose = FALSE){
# create a list of data.frame's by year and state
year_list <- split(X, list(X[["year"]], X[["state"]]), sep = "_")
# remove from the list the empty sub-lists. This is needed
# because there might be combinations of year/state not
# present in the input data and 'split' will create them
# anyway
year_list <- year_list[sapply(year_list, nrow) > 0L]
# loop with an index into the list to make it possible
# to get the data and also the names attribute, used
# to form the output filenames
for(i in seq_along(year_list)){
# work with a copy, this just makes the code that
# follows easier to read
Y <- year_list[[i]]
# plot and save the plot
filename <- sprintf(file, names(year_list)[i])
g <- ggplot(Y, aes(x=dept, y=corp_tax)) +
geom_boxplot(outlier.colour="red", outlier.shape=8,outlier.size=4) +
scale_y_continuous(limits=c(0, max(Y$corp_tax, na.rm=TRUE)))
ggsave(filename, plot = g, device = "pdf", width = width, height = height)
# want to see what was written to disk?
if(verbose){
msg <- paste("output file:", filename)
message(msg)
}
}
# return nothing
invisible(NULL)
}
boxplotter(df, verbose = TRUE)