如何在进行更改之前解析整个 gulp 流？

Question

我正在尝试使用 Gulp 制作静态网站。我遇到了一个有趣的问题，翻译我在以前版本中写的概念以及如何使用 Gulp.

实现它

其中一个概念，如果我有动态包含其他文件的文件。

---
title: Table of Contents
include:
  key: book
  value: book-1
---

Introduction.

然后，其他文件包含有该密钥。

---
title: Chapter 1
book: book-1
---
It was a dark and stormy night...

...和：

---
title: Chapter 2
book: book-1
---

期望的最终结果是：

---
title: Table of Contents
include:
  key: book
  value: book-1
  files:
    - path: chapters/chapter-01.markdown
      title: Chapter 1
      book: book-1
    - path: chapters/chapter-02.markdown
      title: Chapter 2
      book: book-1
---

基本上，扫描文件并将 data 元素作为序列插入到包含包含的页面中。我不知道要提前包含的所有类别或标签（我将 30-40 Git 个存储库合并在一起），所以我不想为每个类别创建一个任务。

我希望的是这样的：

return gulp.src("src/**/*.markdown")
  .pipe(magicHappens())
  .pipe(gulp.dest("build"));

问题似乎出在流的工作方式上。我不能将两种方法链接在一起，因为每个文件都从一个管道传递到下一个管道。要插入 include.files 元素，我必须解析所有输入文件（它们甚至不在子目录中）以找出包含哪些文件才能完成。

看来我必须 "split the stream"，解析第一个以获取数据，将第二个链接到第一个的末尾，然后使用第二个将结果传递出方法。我只是不完全确定该怎么做，并希望得到一些指示或建议。我的 google-fu 并没有真正提出好的建议，甚至没有我重新组织的提示。谢谢。

Answer 1

经过一番摸索，我想到了这个：

var through = require('through2');
var pumpify = require("pumpify");

module.exports = function(params)
{
    // Set up the scanner as an inner pipe that goes through the files and
    // loads the metadata into memory.
    var scanPipe = through.obj(
        function(file, encoding, callback)
        {
            console.log("SCAN: ", file.path);
            return callback(null, file);
        });

    // We have a second pipe that does the actual manipulation to the files
    // before emitting.
    var updatePipe = through.obj(
        {
            // We need a highWaterMark larger than the total files being processed
            // to ensure everything is read into memory first before writing it out.
            // There is no way to disable the buffer entirely, so we just give it
            // the highest integer value.
            highWaterMark: 2147483647
        },
        function(file, encoding, callback)
        {
            console.log("UPDATE: ", file.path);
            return callback(null, file);
        });

    // We have to cork() updatePipe. What this does is prevent updatePipe
    // from getting any data until it is uncork()ed, which we won't do, or
    // the scanPipe gets to the end.
    updatePipe.cork();

    // We have to combine all of these pipes into a single one because
    // gulp needs a single pipe  but we have to treat these all as a unit.
    return pumpify.obj(scanPipe, updatePipe);
}

我认为评论很清楚，但我不得不将两个管道合并为一个管道（使用 pumpify），然后使用 cork 停止处理第二个流直到第一个一个完成了（自动 uncorkd 第二个流）。由于我有大量文件，我不得不使用更高的水印以避免第一个文件饿死。

如何在进行更改之前解析整个 gulp 流？

How to parse the entire gulp stream before making changes?

gulp

node-streams