如何使用 Regex 和 diff 实用程序(“-I regex”选项)忽略特定的文件行?
How ignore specific lines of file using Regex and diff utility ("-I regex" option)?
我正在编写自动化测试来比较 HTML 文件。为了比较,我使用 diff
linux utility
所以,第一个 HTML 文件 1.html
<!-- just example -->
<html>
<div id="userdata_hidden">bla bla bla</div>
<div id="something else" >bla bla bla</div>
<div id="waiver_id" >bla bla bla</div>
<html>
第二个 HTML 文件 2.html
<!-- just example -->
<html>
<div id="userdata_hidden">bla bla bla DIFFERENCE </div>
<div id="something else" >bla bla bla</div>
<div id="waiver_id" >bla bla bla DIFFERENCE </div>
<html>
比较文件的命令:
diff -biw 1.html 2.html
结果:
3c3
< <div id="userdata_hidden">bla bla bla</div>
---
> <div id="userdata_hidden">bla bla bla DIFFERENCE </div>
5c5
< <div id="waiver_id" >bla bla bla</div>
---
> <div id="waiver_id" >bla bla bla DIFFERENCE </div>
比较工作正常,但我需要忽略包含特殊词的行的差异 - waiver_id
和 userdata_hidden
.
diff
命令有 -I
option 用于按数字或正则表达式匹配忽略行:
To ignore insertions and deletions of lines that match a grep-style
regular expression, use the --ignore-matching-lines=regexp (-I regexp)
option. You should escape regular expressions that contain shell
metacharacters to prevent the shell from expanding them. For example,
‘diff -I '^[[:digit:]]'’ ignores all changes to lines beginning with a
digit.
However, -I only ignores the insertion or deletion of lines that
contain the regular expression if every changed line in the hunk—every
insertion and every deletion—matches the regular expression. In other
words, for each nonignorable change, diff prints the complete set of
changes in its vicinity, including the ignorable ones.
You can specify more than one regular expression for lines to ignore
by using more than one -I option. diff tries to match each line
against each regular expression.
所以,我可以使用正则表达式来忽略与 waiver_id
或 userdata_hidden
的行的比较。如果文件没有差异 diff
returns 没有任何内容(空字符串)可以控制台。
问题:
如何编写正则表达式,排除包含单词 waiver_id 或 userdata_hidden 的字符串?
使用 -I
选项和正则表达式时 diff
命令应该如何正确?
P.S。不幸的是,这个变体不起作用:
diff -biw -I '^(?!.*(?:userdata_hidden|waiver_id))' 1.html 2.html
I need to check that string does not contain words waiver_id
and userdata_hidden
.
^(?!.*\bwaiver_id\b)(?!.*\buserdata_hidden\b)
如果您不想显示任何一个字符串。
^(?!.*\b(?:userdata_hidden|waiver_id)\b)
我正在编写自动化测试来比较 HTML 文件。为了比较,我使用 diff
linux utility
所以,第一个 HTML 文件 1.html
<!-- just example -->
<html>
<div id="userdata_hidden">bla bla bla</div>
<div id="something else" >bla bla bla</div>
<div id="waiver_id" >bla bla bla</div>
<html>
第二个 HTML 文件 2.html
<!-- just example -->
<html>
<div id="userdata_hidden">bla bla bla DIFFERENCE </div>
<div id="something else" >bla bla bla</div>
<div id="waiver_id" >bla bla bla DIFFERENCE </div>
<html>
比较文件的命令:
diff -biw 1.html 2.html
结果:
3c3
< <div id="userdata_hidden">bla bla bla</div>
---
> <div id="userdata_hidden">bla bla bla DIFFERENCE </div>
5c5
< <div id="waiver_id" >bla bla bla</div>
---
> <div id="waiver_id" >bla bla bla DIFFERENCE </div>
比较工作正常,但我需要忽略包含特殊词的行的差异 - waiver_id
和 userdata_hidden
.
diff
命令有 -I
option 用于按数字或正则表达式匹配忽略行:
To ignore insertions and deletions of lines that match a grep-style regular expression, use the --ignore-matching-lines=regexp (-I regexp) option. You should escape regular expressions that contain shell metacharacters to prevent the shell from expanding them. For example, ‘diff -I '^[[:digit:]]'’ ignores all changes to lines beginning with a digit.
However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk—every insertion and every deletion—matches the regular expression. In other words, for each nonignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones.
You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression.
所以,我可以使用正则表达式来忽略与 waiver_id
或 userdata_hidden
的行的比较。如果文件没有差异 diff
returns 没有任何内容(空字符串)可以控制台。
问题:
如何编写正则表达式,排除包含单词 waiver_id 或 userdata_hidden 的字符串?
使用
-I
选项和正则表达式时diff
命令应该如何正确?
P.S。不幸的是,这个变体不起作用:
diff -biw -I '^(?!.*(?:userdata_hidden|waiver_id))' 1.html 2.html
I need to check that string does not contain words
waiver_id
anduserdata_hidden
.
^(?!.*\bwaiver_id\b)(?!.*\buserdata_hidden\b)
如果您不想显示任何一个字符串。
^(?!.*\b(?:userdata_hidden|waiver_id)\b)