使用 R 和 Openxlsx 将数据帧列表输出为单个 Excel 文件中的工作表
Use R and Openxlsx to output a list of dataframes as worksheets in a single Excel file
我有一组 CSV 文件。我想将它们打包并将数据导出到包含多个工作表的单个 Excel 文件。我将 CSV 文件作为一组数据框读入。
我的问题是如何构建 openxlsx
中的命令,我可以手动完成,但我遇到了列表构建问题。具体如何将数据框添加为命名列表的子组件,然后作为参数传递给 write.xlsx()
例子
好的,所以我先列出磁盘上的CSV文件,然后在内存中生成一组数据帧...
# Generate a list of csv files on disk and shorten names...
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets
# Now generate the dataframes for each csv file...
list2env(
lapply(setNames(fileListwithPath, make.names(datasets)),
read.csv), envir = .GlobalEnv)
示例输出:
dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843",
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847",
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447",
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803",
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331",
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827",
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304",
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652",
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020",
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819",
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914",
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134",
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846",
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738",
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724",
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535",
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508",
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")
现在我们有一组数据框,我们希望创建一个包含多个工作表的 excel 文件...
wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')
lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))
问题:
问题是我可以手动创建列表结构并确认它有效,但我似乎无法自动构建列表。
myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
因此:
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
具体问题:如何将每个数据框附加到列表中...
myList <- lapply( myList, function(x) eval(x) )
我在这里 lapply 做错了什么?上面的 lapply() 没有遍历列表并将数据框附加到名称列表条目。
i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
我错过了什么?非常感谢所有帮助。是的,我很确定我遗漏了一些明显的东西......但是......我很难过。
我没有你的数据框,所以我无法测试这个,但下面的代码类似于我在需要读写Excel文件时使用的方法。下面的代码使用了 xlsx
包,因为这是我所熟悉的,但是如果你需要使用 openxlsx
.
希望你可以调整它
library(xlsx)
首先,将文件读入列表。像这样:
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath,
pattern = filePattern,
full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
fileListwithPath = setNames( fileListwithPath,
list.files(path = filePath, pattern = filePattern))
df.list = lapply(fileListwithPath, read.csv)
# Now we rename the List Names for use in worksheets...
# Remove .csv and sample_ prefix used in filenames...
# Reult in workbook S<size>_<R version>_<date>
names(df.list) <- gsub("\.csv$","", names(df.list))
names(df.list) <- gsub("sample_","S", names(df.list))
您现在有了一个列表,其中每个元素都是一个数据框,每个元素的名称都是文件的名称。现在,让我们将每个数据帧写入同一 Excel 工作簿中的不同工作 sheet,然后将文件另存为 xlsx 文件:
wb = createWorkbook()
lapply( names(df.list),
function(df) {
sheet = createSheet(wb, df)
addDataFrame(df.list[[df]], sheet = sheet, row.names = FALSE)
} )
saveWorkbook(wb, "My_workbook.xlsx")
为了便于说明,我将读取和写入 csv 文件分开,但您可以将它们组合成一个函数,该函数读取每个单独的 csv 文件并将其写入一个新的 sheet 中,单个 [=23] =] 工作簿。
这是 openxlsx
的解决方案:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. 但是,openxlsx
也可以为此使用它的函数 openxlsx::write.xlsx
,因此您只需为对象提供数据帧列表和文件路径,然后 openxlsx
足够聪明,可以在 xlsx 文件中将列表创建为工作表。我 post 和 Map()
的代码是如果你想以特定方式格式化工作表。
我认为使用 imap
function from the purrr
包添加解决方案可能值得,因为它提供了一种方便的机制来访问列表元素的名称 和 索引在一次通话中:
imap_xxx(x, ...)
, an indexed map, is short hand for map2(x, names(x), ...)
if x
has names, or map2(x, seq_along(x), ...)
if it does not. This is useful if you need to compute on both the value and the position of an element.
imap
解决方案
关于可重复性的虚拟数据。
lst_data <- list(cars = mtcars, air = airmiles)
wb <- openxlsx::createWorkbook()
purrr::imap(
.x = lst_data,
.f = function(df, object_name) {
openxlsx::addWorksheet(wb = wb, sheetName = object_name)
openxlsx::writeData(wb = wb, sheet = object_name, x = df)
}
)
t_file <- tempfile(pattern = "test_df_export", fileext = ".xlsx")
saveWorkbook(wb = wb, file = t_file)
我有一组 CSV 文件。我想将它们打包并将数据导出到包含多个工作表的单个 Excel 文件。我将 CSV 文件作为一组数据框读入。
我的问题是如何构建 openxlsx
中的命令,我可以手动完成,但我遇到了列表构建问题。具体如何将数据框添加为命名列表的子组件,然后作为参数传递给 write.xlsx()
例子
好的,所以我先列出磁盘上的CSV文件,然后在内存中生成一组数据帧...
# Generate a list of csv files on disk and shorten names...
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets
# Now generate the dataframes for each csv file...
list2env(
lapply(setNames(fileListwithPath, make.names(datasets)),
read.csv), envir = .GlobalEnv)
示例输出:
dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843",
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847",
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447",
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803",
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331",
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827",
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304",
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652",
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020",
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819",
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914",
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134",
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846",
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738",
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724",
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535",
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508",
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")
现在我们有一组数据框,我们希望创建一个包含多个工作表的 excel 文件...
wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')
lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))
问题:
问题是我可以手动创建列表结构并确认它有效,但我似乎无法自动构建列表。
myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
因此:
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
具体问题:如何将每个数据框附加到列表中...
myList <- lapply( myList, function(x) eval(x) )
我在这里 lapply 做错了什么?上面的 lapply() 没有遍历列表并将数据框附加到名称列表条目。
i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
我错过了什么?非常感谢所有帮助。是的,我很确定我遗漏了一些明显的东西......但是......我很难过。
我没有你的数据框,所以我无法测试这个,但下面的代码类似于我在需要读写Excel文件时使用的方法。下面的代码使用了 xlsx
包,因为这是我所熟悉的,但是如果你需要使用 openxlsx
.
library(xlsx)
首先,将文件读入列表。像这样:
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath,
pattern = filePattern,
full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
fileListwithPath = setNames( fileListwithPath,
list.files(path = filePath, pattern = filePattern))
df.list = lapply(fileListwithPath, read.csv)
# Now we rename the List Names for use in worksheets...
# Remove .csv and sample_ prefix used in filenames...
# Reult in workbook S<size>_<R version>_<date>
names(df.list) <- gsub("\.csv$","", names(df.list))
names(df.list) <- gsub("sample_","S", names(df.list))
您现在有了一个列表,其中每个元素都是一个数据框,每个元素的名称都是文件的名称。现在,让我们将每个数据帧写入同一 Excel 工作簿中的不同工作 sheet,然后将文件另存为 xlsx 文件:
wb = createWorkbook()
lapply( names(df.list),
function(df) {
sheet = createSheet(wb, df)
addDataFrame(df.list[[df]], sheet = sheet, row.names = FALSE)
} )
saveWorkbook(wb, "My_workbook.xlsx")
为了便于说明,我将读取和写入 csv 文件分开,但您可以将它们组合成一个函数,该函数读取每个单独的 csv 文件并将其写入一个新的 sheet 中,单个 [=23] =] 工作簿。
这是 openxlsx
的解决方案:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. 但是,openxlsx
也可以为此使用它的函数 openxlsx::write.xlsx
,因此您只需为对象提供数据帧列表和文件路径,然后 openxlsx
足够聪明,可以在 xlsx 文件中将列表创建为工作表。我 post 和 Map()
的代码是如果你想以特定方式格式化工作表。
我认为使用 imap
function from the purrr
包添加解决方案可能值得,因为它提供了一种方便的机制来访问列表元素的名称 和 索引在一次通话中:
imap_xxx(x, ...)
, an indexed map, is short hand formap2(x, names(x), ...)
ifx
has names, ormap2(x, seq_along(x), ...)
if it does not. This is useful if you need to compute on both the value and the position of an element.
imap
解决方案
关于可重复性的虚拟数据。
lst_data <- list(cars = mtcars, air = airmiles)
wb <- openxlsx::createWorkbook()
purrr::imap(
.x = lst_data,
.f = function(df, object_name) {
openxlsx::addWorksheet(wb = wb, sheetName = object_name)
openxlsx::writeData(wb = wb, sheet = object_name, x = df)
}
)
t_file <- tempfile(pattern = "test_df_export", fileext = ".xlsx")
saveWorkbook(wb = wb, file = t_file)