如何使用 Regex 和 diff 实用程序（“-I regex”选项）忽略特定的文件行？

Question

我正在编写自动化测试来比较 HTML 文件。为了比较，我使用 diff linux utility

所以，第一个 HTML 文件 1.html

<!-- just example -->
<html>
  <div id="userdata_hidden">bla bla bla</div>
  <div id="something else" >bla bla bla</div>
  <div id="waiver_id"      >bla bla bla</div>
<html>

第二个 HTML 文件 2.html

<!-- just example -->
<html>
  <div id="userdata_hidden">bla bla bla DIFFERENCE </div>
  <div id="something else" >bla bla bla</div>
  <div id="waiver_id"      >bla bla bla DIFFERENCE </div>
<html>

比较文件的命令：

diff -biw 1.html 2.html

结果：

3c3
<   <div id="userdata_hidden">bla bla bla</div>
---
>   <div id="userdata_hidden">bla bla bla DIFFERENCE </div>
5c5
<   <div id="waiver_id"      >bla bla bla</div>
---
>   <div id="waiver_id"      >bla bla bla DIFFERENCE </div>

比较工作正常，但我需要忽略包含特殊词的行的差异 - waiver_id 和 userdata_hidden.

diff 命令有 -I option 用于按数字或正则表达式匹配忽略行：

To ignore insertions and deletions of lines that match a grep-style regular expression, use the --ignore-matching-lines=regexp (-I regexp) option. You should escape regular expressions that contain shell metacharacters to prevent the shell from expanding them. For example, ‘diff -I '^[[:digit:]]'’ ignores all changes to lines beginning with a digit.

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk—every insertion and every deletion—matches the regular expression. In other words, for each nonignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones.

You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression.

所以，我可以使用正则表达式来忽略与 waiver_id 或 userdata_hidden 的行的比较。如果文件没有差异 diff returns 没有任何内容（空字符串）可以控制台。

问题：

如何编写正则表达式，排除包含单词 waiver_id 或 userdata_hidden 的字符串？
使用 -I 选项和正则表达式时 diff 命令应该如何正确？

P.S。不幸的是，这个变体不起作用：

diff -biw -I '^(?!.*(?:userdata_hidden|waiver_id))' 1.html 2.html

Answer 1

I need to check that string does not contain words waiver_id and userdata_hidden.

^(?!.*\bwaiver_id\b)(?!.*\buserdata_hidden\b)

如果您不想显示任何一个字符串。

^(?!.*\b(?:userdata_hidden|waiver_id)\b)

RUbular

如何使用 Regex 和 diff 实用程序（“-I regex”选项）忽略特定的文件行？

How ignore specific lines of file using Regex and diff utility ("-I regex" option)?

regex

linux

bash

ubuntu

diff