运行 R Markdown 在许多不同的数据集上并分别保存每个编织的 word 文档
Run R Markdown on many different datasets and save each knitted word document separately
我创建了一个 R Markdown 来检查一系列数据集中的错误(例如,给定列中是否有空白?如果是,则打印一条声明,说明有 NA 以及哪些行有 NA)。我已经将 R Markdown 设置为输出 bookdown::word_document2
。我有大约 100 个数据集,我需要 运行 使用相同的 R Markdown,并分别为每个数据集输出一个 word 文档。
有没有办法 运行 跨所有数据集使用相同的 R Markdown 并为每个数据集获取一个新的 word 文档(这样它们就不会被覆盖)?所有数据集都在同一目录中。我知道每次编织文档时输出都会被覆盖;因此,我需要能够根据dataset/file名称保存每个word文档。
最小示例
创建包含 3 个 .xlsx 文件的目录
library(openxlsx)
setwd("~/Desktop")
dir.create("data")
dataset <-
structure(
list(
name = c("Andrew", "Max", "Sylvia", NA, "1"),
number = c(1, 2, 2, NA, NA),
category = c("cool", "amazing",
"wonderful", "okay", NA)
),
class = "data.frame",
row.names = c(NA,-5L)
)
write.xlsx(dataset, './data/test.xlsx')
write.xlsx(dataset, './data/dataset.xlsx')
write.xlsx(dataset, './data/another.xlsx')
RMarkdown
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
setwd("~/Desktop")
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx("./data/test.xlsx")
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```
因此,我将 运行 这个 R Markdown 与 /data
目录中的第一个 .xlsx 数据集 (test.xlsx
),并保存 word 文档。然后,我想对目录中列出的每个其他数据集执行此操作(即 list.files(path = "./data")
并保存一个新的 word 文档。因此,每个 RMarkdown 中唯一会改变的是这一行:dataset <- openxlsx::read.xlsx("./data/test.xlsx")
。我知道我需要设置一些我可以在 rmarkdown::render 中使用的参数,但不确定如何设置。
我查看了其他一些 SO 条目(例如,How to combine two RMarkdown (.Rmd) files into a single output? or ), but most focus on combining .Rmd files, and not running different iterations of the same file. I've also looked at 。
我还尝试了 中的以下内容。在这里,所有添加都添加到上面的示例 R Markdown 中。
将此添加到 YAML header:
params:
directory:
value: x
将此添加到 setup
代码块:
# Pull in the data
dataset <- openxlsx::read.xlsx(file.path(params$directory))
然后,最后我运行下面的代码来渲染文档。
rmarkdown::render(
input = 'Hello_World.Rmd'
, params = list(
directory = "./data"
)
)
然而,我收到以下错误,尽管我在 /data
中只有 .xlsx 文件:
Quitting from lines 14-24 (Hello_World.Rmd) Error: openxlsx can only
read .xlsx files
我也在我的完整 .Rmd 文件上尝试了这个,但出现了以下错误,尽管路径完全相同。
Quitting from lines 14-24 (Hello_World.Rmd) Error in file(con,
"rb") : cannot open the connection
*注意:第 14-24 行本质上是 .Rmd.
的 setup
部分
我不确定我需要更改什么。我还需要使用原始文件名生成多个输出文件(例如 test.xlsx
中的“test”、another.xlsx
中的“another”等)
您可以在循环中调用 render
来处理每个作为参数传递的 file
:
dir_in <- 'data'
dir_out <- 'result'
files <- file.path(getwd(),dir_in,list.files(dir_in))
for (file in files) {
print(file)
rmarkdown::render(
input = 'Hello_World.Rmd',
output_file = tools::file_path_sans_ext(basename(file)),
output_dir = dir_out,
params = list(file = file)
)
}
降价:
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
params:
file: ""
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx(file)
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```
使用 purrr
而不是 for
循环的替代方法,但使用与@Waldi 完全相同的设置。
渲染
dir_in <- 'data'
dir_out <- 'result'
files <- file.path(getwd(),dir_in,list.files(dir_in))
purrr::map(.x = files, .f = function(file){
rmarkdown::render(
input = 'Hello_World.Rmd',
output_file = tools::file_path_sans_ext(basename(file)),
output_dir = dir_out,
params = list(file = file)
)
})
Rmarkdown
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
params:
file: ""
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx(file)
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```
我创建了一个 R Markdown 来检查一系列数据集中的错误(例如,给定列中是否有空白?如果是,则打印一条声明,说明有 NA 以及哪些行有 NA)。我已经将 R Markdown 设置为输出 bookdown::word_document2
。我有大约 100 个数据集,我需要 运行 使用相同的 R Markdown,并分别为每个数据集输出一个 word 文档。
有没有办法 运行 跨所有数据集使用相同的 R Markdown 并为每个数据集获取一个新的 word 文档(这样它们就不会被覆盖)?所有数据集都在同一目录中。我知道每次编织文档时输出都会被覆盖;因此,我需要能够根据dataset/file名称保存每个word文档。
最小示例
创建包含 3 个 .xlsx 文件的目录
library(openxlsx)
setwd("~/Desktop")
dir.create("data")
dataset <-
structure(
list(
name = c("Andrew", "Max", "Sylvia", NA, "1"),
number = c(1, 2, 2, NA, NA),
category = c("cool", "amazing",
"wonderful", "okay", NA)
),
class = "data.frame",
row.names = c(NA,-5L)
)
write.xlsx(dataset, './data/test.xlsx')
write.xlsx(dataset, './data/dataset.xlsx')
write.xlsx(dataset, './data/another.xlsx')
RMarkdown
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
setwd("~/Desktop")
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx("./data/test.xlsx")
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```
因此,我将 运行 这个 R Markdown 与 /data
目录中的第一个 .xlsx 数据集 (test.xlsx
),并保存 word 文档。然后,我想对目录中列出的每个其他数据集执行此操作(即 list.files(path = "./data")
并保存一个新的 word 文档。因此,每个 RMarkdown 中唯一会改变的是这一行:dataset <- openxlsx::read.xlsx("./data/test.xlsx")
。我知道我需要设置一些我可以在 rmarkdown::render 中使用的参数,但不确定如何设置。
我查看了其他一些 SO 条目(例如,How to combine two RMarkdown (.Rmd) files into a single output? or
我还尝试了
将此添加到 YAML header:
params:
directory:
value: x
将此添加到 setup
代码块:
# Pull in the data
dataset <- openxlsx::read.xlsx(file.path(params$directory))
然后,最后我运行下面的代码来渲染文档。
rmarkdown::render(
input = 'Hello_World.Rmd'
, params = list(
directory = "./data"
)
)
然而,我收到以下错误,尽管我在 /data
中只有 .xlsx 文件:
Quitting from lines 14-24 (Hello_World.Rmd) Error: openxlsx can only read .xlsx files
我也在我的完整 .Rmd 文件上尝试了这个,但出现了以下错误,尽管路径完全相同。
Quitting from lines 14-24 (Hello_World.Rmd) Error in file(con, "rb") : cannot open the connection
*注意:第 14-24 行本质上是 .Rmd.
的setup
部分
我不确定我需要更改什么。我还需要使用原始文件名生成多个输出文件(例如 test.xlsx
中的“test”、another.xlsx
中的“another”等)
您可以在循环中调用 render
来处理每个作为参数传递的 file
:
dir_in <- 'data'
dir_out <- 'result'
files <- file.path(getwd(),dir_in,list.files(dir_in))
for (file in files) {
print(file)
rmarkdown::render(
input = 'Hello_World.Rmd',
output_file = tools::file_path_sans_ext(basename(file)),
output_dir = dir_out,
params = list(file = file)
)
}
降价:
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
params:
file: ""
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx(file)
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```
使用 purrr
而不是 for
循环的替代方法,但使用与@Waldi 完全相同的设置。
渲染
dir_in <- 'data'
dir_out <- 'result'
files <- file.path(getwd(),dir_in,list.files(dir_in))
purrr::map(.x = files, .f = function(file){
rmarkdown::render(
input = 'Hello_World.Rmd',
output_file = tools::file_path_sans_ext(basename(file)),
output_dir = dir_out,
params = list(file = file)
)
})
Rmarkdown
---
title: Hello_World
author: "Somebody"
output:
bookdown::word_document2:
fig_caption: yes
number_sections: FALSE
params:
file: ""
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(openxlsx)
# Load data for one .xlsx file. The other datasets are all in "/data".
dataset <- openxlsx::read.xlsx(file)
```
# Test for Errors
```{r test, echo=FALSE, comment=NA}
# Are there any NA values in the column?
suppressWarnings(if (TRUE %in% is.na(dataset$name)) {
na.index <- which(is.na(dataset$name))
cat(
paste(
"– There are NAs/blanks in the name column. There should be no blanks in this column. The following row numbers in this column need to be corrected:",
paste(na.index, collapse = ', ')
),
".",
sep = "",
"\n",
"\n"
)
})
```