Pandoc 将 docx 转换为带有嵌入图像的 markdown

Question

将 .docx 文件转换为 markdown 时，嵌入的图像未从 docx 存档中提取，但输出包含 ![](media/image1.png){width="6.291666666666667in" height="3.1083333333333334in"}

是否需要设置参数才能提取嵌入的图片？

Answer 1

pandoc --extract-media ./myMediaFolder input.docx -o output.md

来自manual：

--extract-media=DIR Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. Media are downloaded, read from the file system, or extracted from a binary container (e.g. docx), as needed. The original file paths are used if they are relative paths not containing ... Otherwise filenames are constructed from the SHA1 hash of the contents.

Answer 2

参考gridtrak的评论和目录结构太深的问题（例如media/media/image2.jpeg），使用当前目录作为路径DIR，然后在当前目录下创建一个文件夹media目录（例如 media/image2.jpeg）：

pandoc --extract-media=. input.docx -o output.md

Pandoc 将 docx 转换为带有嵌入图像的 markdown

Pandoc convert docx to markdown with embedded images

pandoc