节点为 grep -f --fixed-strings 传入字符串

node pass in strings for grep -f --fixed-strings

我想在 Node.js 环境中使用 grep --count --fixed-strings needles.txt < haystack.txt

我没有 needles.txt 的文件,而是要搜索的字符串数组,而不是 haystack.txt 我有大量的 string/buffer 文本。

child_process 方法的最佳组合是什么?

类似于:

import {spawn} from "child_process";

// haystack to search within
const haystack = "I am \n such a big string, do you\n see me?";
const readable = new Readable();
readable.push(haystack);
readable.push(null);

// the list of needles that would normally go in `--file=needles.txt`
const needles = ["find", "me", "or", "me"];

// spawn `fgrep`
// Q: How do I pass in `needles` as a string?
const fgrep = spawn(`fgrep`, [needles])

// pipe my haystack to fgrep
readable.pipe(fgrep.stdin);

grep documentation

对于 grep 参数,-e 允许您指定多个模式:

grep -e 1 -e 2

用于生成 args 的 JS 将类似于:

const needles = ["find", "me", "or", "me"];
const grep_pattern_args = needles.reduce((res, pattern) => {
    res.push('-e', pattern)
    return res
}, [])
const grep_args = [ '--count', '--fixed-strings', ...grep_pattern_args ]

3000 针正进入击中范围 execves length limit of MAX_ARG_STRLEN in Linux of 128kiB。如果你有很长的针头,为了安全起见,你可能需要将它们写入文件。

spawn 很好,因为您可以为 stdin 取回可写流,您可以将其写入,因为 haystack 是 read/generated(假设您的 Readable流示例设置是人为设计的)

const stdout = []
const stderr = []
const fgrep = spawn('/usr/bin/fgrep', grep_args, { stdio: ['pipe', 'pipe', 'pipe'] })
fgrep.on('error', console.error)

// For larger output you can process more on the stream. 
fgrep.stdout.on('data', chunk => stdout.push(chunk))
fgrep.stderr.on('data', chunk => {
  process.stderr.write(chunk)
  stderr.push(chunk)
})

fgrep.on('close', (code) => {
  if (code !== 0) console.error(`grep process exited with code ${code}`)
  stdout.map(chunk => process.stdout.write(chunk))
})

fgrep.stdin.pipe(haystream)