缓冲区和流 - 它们有什么关系?

buffer and stream - how are they related?

我在这里放了一些代码:

const { createReadStream, ReadStream } = require('fs');

var readStream = createReadStream('./data.txt');

readStream.on('data', chunk => {
  console.log('---------------------------------');
  console.log(chunk);
  console.log('---------------------------------');
});

readStream.on('open', () => {
  console.log('Stream opened...');
});

readStream.on('end', () => {
  console.log('Stream Closed...');
});

所以,流是数据从一个地方到另一个地方的移动。在这种情况下,从 data.txt 文件到我的眼睛,因为我必须阅读它。

我在 google 中读到过这样的内容:

Typically, the movement of data is usually with the intention to process it, or read it, and make decisions based on it. But there is a minimum and a maximum amount of data a process could take over time. So if the rate the data arrives is faster than the rate the process consumes the data, the excess data need to wait somewhere for its turn to be processed.

On the other hand, if the process is consuming the data faster than it arrives, the few data that arrive earlier need to wait for a certain amount of data to arrive before being sent out for processing.

我的问题是:哪一行代码是“消费数据,处理数据”?是 console.log(chunk) 吗?如果我有一大行耗时的代码而不是 console.log(chunk),我的代码怎么会不从缓冲区获取更多数据并等待我的处理完成呢?在上面的代码中,它似乎仍然会进入 readStream.on('data')'s 回调..

My question is: which line of code is "consuming the data, processing the data"

readStream.on('data', ...) 事件处理程序是“使用”或“处理”数据的代码。

if I had a huge time consuming line of code instead of console.log(chunk), how would my code not grab more data from buffer and wait until my processing is done ?

如果耗时的代码是同步的(例如阻塞),那么在您的同步代码完成之前不会再发生 data 事件,因为只有您的事件处理程序是 运行(在 single-threaded node.js 的事件循环驱动架构)。在您 return 从事件处理程序回调函数返回控制权之前,不会再生成 data 事件。

如果耗时的代码是异步的(例如 non-blocking 并因此 returned 控制权回到事件循环),那么更多 data 事件肯定会发生,即使一个prior data 事件处理程序还没有完全完成它的异步工作。有时在进行异步工作时调用 readStream.pause() 告诉 readStream 在您准备好之前不要再生成任何 data 事件是合适的,然后您可以 readStream.resume().