传递目录列表以获取带有 TODO 关键字（例如 //TODO）但不是变量/字符串的文件列表时 TODO 关键字的正则表达式

Question

我正在尝试编写一个应用程序来查看目录并标记出所有具有 TODO 关键字的文件（无论是在目录还是子目录中）（每当我们编码时 flashes/highlights 的颜色在我们的代码编辑器中 [我正在使用 visual studio 代码]

我已经得到了大部分代码运行，只是最后一点让我感到困惑：因为我的 RegEx 接受 'TODO' 作为一个词块，它甚至会提取包含 TODO 的文件作为变量名/字符串内容，例如

var todo = 'TODO' 要么 var TODO = 'abcdefg'

所以它弄乱了我的测试用例。我们如何编写一个健壮的 TODO 正则表达式/表达式，它能够只提取 TODO 关键字（例如 //TODO 或 // TODO）并忽略其他用例（在 variables/strings 等中）我也不想硬编码 // 或正则表达式中的任何内容，因为我希望它尽可能地跨语言（例如 // （单行）或 /* （多行） -line) javascript, # python 等)

这是我的代码：

import * as fs from 'fs'; 
import * as path from 'path';

const args = process.argv.slice(2);
const directory = args[0];

// Using recursion, we find every file with the desired extention, even if its deeply nested in subfolders.
// Returns a list of file paths
const getFilesInDirectory = (dir, ext) => {
  if (!fs.existsSync(dir)) {
    console.log(`Specified directory: ${dir} does not exist`);
    return;
  }

  let files = [];
  fs.readdirSync(dir).forEach(file => {
    const filePath = path.join(dir, file);
    const stat = fs.lstatSync(filePath); // Getting details of a symbolic link of file

    // If we hit a directory, recurse our fx to subdir. If we hit a file (basecase), add it to the array of files
    if (stat.isDirectory()) {
      const nestedFiles = getFilesInDirectory(filePath, ext);
      files = files.concat(nestedFiles);
    } else {
      if (path.extname(file) === ext) {
        files.push(filePath);
      }
    }
  });

  return files;
};



const checkFilesWithKeyword = (dir, keyword, ext) => {
  if (!fs.existsSync(dir)) {
    console.log(`Specified directory: ${dir} does not exist`);
    return;
  }

  const allFiles = getFilesInDirectory(dir, ext);
  const checkedFiles = [];

  allFiles.forEach(file => {
    const fileContent = fs.readFileSync(file);

    // We want full words, so we use full word boundary in regex.
    const regex = new RegExp('\b' + keyword + '\b');
    if (regex.test(fileContent)) {
      // console.log(`Your word was found in file: ${file}`);
      checkedFiles.push(file);
    }
  });

  console.log(checkedFiles);
  return checkedFiles;
};

checkFilesWithKeyword(directory, 'TODO', '.js');

非常感谢帮助！！

Answer 1

我认为没有可靠的方法可以在变量名或字符串值中排除 TODO 跨语言。您需要正确解析每种语言，并在评论中扫描 TODO。

您可以做一个可以随时间调整的近似值：

对于变量名，您需要排除 TODO = 赋值和任何类型的使用，例如 TODO.length
对于字符串值，您可以在查找匹配引号时排除 'TODO' 和 "TODO"，甚至 "Something TODO today"。带反引号的多行字符串怎么样？

这是使用一堆否定前瞻的开始：

const input = `Test Case:
// TODO blah
// TODO do "stuff"
/* stuff
 * TODO
 */
let a = 'TODO';
let b = 'Something TODO today';
let c = "TODO";
let d = "More stuff TODO today";
let TODO = 'stuff';
let l = TODO.length;
let e = "Even more " + TODO + " to do today";
let f = 'Nothing to do';
`;
let keyword = 'TODO';
const regex = new RegExp(
  // exclude TODO in string value with matching quotes:
  '^(?!.*([\'"]).*\b' + keyword + '\b.*\1)' +
  // exclude TODO.property access:
  '(?!.*\b' + keyword + '\.\w)' +
  // exclude TODO = assignment
  '(?!.*\b' + keyword + '\s*=)' +
  // final TODO match
  '.*\b' + keyword + '\b'
);
input.split('\n').forEach((line) => {
  let m = regex.test(line);
  console.log(m + ': ' + line);
});

输出：

false: Test Case:
true: // TODO blah
true: // TODO do "stuff"
false: /* stuff
true:  * TODO
false:  */
false: let a = 'TODO';
false: let b = 'Something TODO today';
false: let c = "TODO";
false: let d = "More stuff TODO today";
false: let TODO = 'stuff';
false: let l = TODO.length;
false: let e = "Even more " + TODO + " to do today";
false: let f = 'Nothing to do';
false:

正则表达式组成说明：

^ - 字符串的开头（在我们的例子中是由于拆分而导致的行开头）
用匹配的引号排除字符串值中的 TODO：
- (?! - 否定先行开始
- .* - 贪婪扫描（扫描所有字符，但仍然匹配后面的内容）
- (['"]) - 单引号或双引号的捕获组
- .* - 贪婪扫描
- \b - 关键字前的单词 woundary（期望包含在非单词字符中的关键字）
- 在此处添加关键字
- \b - 关键字
- .* - 贪婪扫描
- </code> - 对捕获组的反向引用（单引号或双引号，但上面捕获的那个）</li> <li><code>) - 否定前瞻结束
排除 TODO.property 访问：
- (?! - 否定先行开始
- .* - 贪婪扫描
- \b - 关键字前的单词 woundary
- 在此处添加关键字
- \.\w - 一个点后跟一个字符字符，例如 .x
- ) - 否定前瞻结束
排除 TODO = 作业
- (?! - 否定先行开始
- .* - 贪婪扫描
- \b - 关键字前的单词 woundary
- 在此处添加关键字
- \s*= - 可选空格后跟 =
- ) - 否定前瞻结束
最后的 TODO 匹配
- .* - 贪婪扫描
- \b - word woundary（期望包含在非单词字符中的关键字）
- 在此处添加关键字
- \b - word woundary

了解有关正则表达式的更多信息：https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex

传递目录列表以获取带有 TODO 关键字（例如 //TODO）但不是变量/字符串的文件列表时 TODO 关键字的正则表达式

Regex for TODO keyword when passing through a list of directories to get a list of files with TODO keyword (eg. //TODO) but not as variable / string

javascript

regex

testing

node.js

todo