如何从 knitr/latex pdf 输出中提取图表列表并将其加载到 R 中？

Question

是否有一种方便的方法来提取 knitr 和 latex 在 PDF 文档中创建的图表列表并将其加载到 R 中？

我的PDF有几十个数字；他们非常需要被跟踪和组织，图表列表可以帮助做到这一点。但是在 R 中使用列表会在很多方面有所帮助。

从 PDF 中截取列表，将其粘贴到 Excel，然后使用该工作表是一条艰巨的路线，但如果可以找到图表列表并加载它会更快更顺畅它直接（或多或少）进入 R。编织过程会创建许多文件，也许 List 潜伏在其中一个文件中？

这是一个简单的创建图表列表的小例子，借用自a question on hiding captions here

\documentclass{article}

\usepackage{graphicx}
\setcounter{topnumber}{3}% Just for this example
\begin{document}

\listoffigures

\begin{figure}
  \addcontentsline{lof}{figure}{Example image A}%
  \centering
  \includegraphics[height=4\baselineskip]{example-image-a}

  Example image A
\end{figure}

\begin{figure}
  \addcontentsline{lof}{figure}{\protect\numberline{}Example image B}%
  \centering
  \includegraphics[height=4\baselineskip]{example-image-b}

  Example image B
\end{figure}

\begin{figure}
  \centering
  \includegraphics[height=4\baselineskip]{example-image-c}
  \caption{Example image C}
\end{figure}

\end{document}

Answer 1

你可以这样做：

---
title: "Untitled"
output: 
  pdf_document:
    fig_caption: true
---

```{r setup, include=FALSE}
gen_lof <- TRUE

if (gen_lof) {

  unlink("/tmp/figures.csv")

  cat("pdf_name,output_path,caption,subcaption\n",
      file="/tmp/figures.csv", append=TRUE)

  knitr::knit_hooks$set(plot=function(x, opt) {
    cat(x, ",",
        opt$fig.path, ",",
        opt$fig.cap, ",",
        opt$fig.scap, "\n", 
        sep="", file="/tmp/figures.csv", append=TRUE)
  })

}
```

我稍微修改了默认的 RStudio 示例 knitr 文档以添加两个带有名称和标题的图形。

将 gen_lof 设置为 FALSE 以创建正常的 PDF（使用钩子意味着必须为完整输出 PDF 编织一次，而仅为图形的 CSV 编织一次）。将它设置为 TRUE 并编织它以获得图形输出列表（无论你想要什么，我只是为了方便起见使用该文件名）文件看起来像：

pdf_name,output_path,caption,subcaption
Untitled_files/figure-latex/cars-1.pdf,Untitled_files/figure-latex/,lines cars,
Untitled_files/figure-latex/pressure-1.pdf,Untitled_files/figure-latex/,points cars,

虽然输出类型可能有 pdf，但进行 1:1 比较应该不会太麻烦。

您还可以通过这种方式访问所有 knitr 块选项。即：

aniopts autodep background cache cache.lazy cache.path 
cache.rebuild cache.vars child code collapse comment 
crop dependson dev dev.args dpi echo engine error eval 
external 

fig.align fig.cap fig.cur fig.env fig.ext fig.height 
fig.keep fig.lp fig.num fig.path fig.pos fig.retina 
fig.scap fig.show fig.subcap fig.width 

highlight include interval label message out.extra 
out.height out.height.px out.width out.width.px 
params.src prompt purl ref.label render results 
sanitize size split strip.white tidy tidy.opts warning

（我特意把 "fig" 个具体选项分开了）。

使用变量来触发生成意味着您可以编写参数化的 knitr 工作流代码来生成一个生成图形，然后生成另一个生成最终的 PDF。

其他人可能有更好的方法。

Answer 2

由于不需要页码，每个块保存 fig.cap 就足够了。

这可以使用将 options$fig.cap 保存在全局变量中并在编织过程结束时将此变量保存到文件中的块钩子来完成。

\documentclass{article}
\begin{document}

<<setup>>=
library(knitr)

figureCaptions <- c()

knit_hooks$set(listit = function(before, options, envir) {
  if (!before) figureCaptions <<- c(figureCaptions, options$fig.cap)
})

<<fig.cap = "First one", listit = TRUE>>=
plot(1)
@

<<fig.cap = "Second one", listit = TRUE>>=
plot(rnorm(10))
@

<<final>>=
save(figureCaptions, file = "figureCaptions.RData")
@

\end{document}

为了避免 eval.after.

出现问题，最好仅在对块进行评估 (if (!before)) 后才保存标题

之后要访问字幕，请使用 load("figureCaptions.RData")。

如何从 knitr/latex pdf 输出中提取图表列表并将其加载到 R 中？

How to extract a List of Figures from knitr/latex pdf output and load it into R?

latex

r

extract

knitr