如何逐行比较两个文件并在不同时输出整行
How to Compare two files line by line and output the whole line if different
我有两个排序文件有问题
1)one is a control file(ctrl.txt) which is external process generated
2)and other is line count file(count.txt) that I generate using `wc -l`
$更多ctrl.txt
Thunderbird|1000
Mustang|2000
Hurricane|3000
$更多count.txt
Thunder_bird|1000
MUSTANG|2000
Hurricane|3001
我想比较这两个文件,忽略第 1 列(文件名)中的皱纹,例如 "_"(对于 Thunder_bird)或 "upper case"(对于 MUSTANG)以便我的输出仅显示下面的文件作为唯一真正的不同文件,其计数不匹配。
Hurricane|3000
我的想法是只比较两个文件的第二列,如果它们不同则输出整行
我在 AWK 中看到过其他示例,但我无法使用任何东西。
能否请您尝试关注 awk
如果这对您有帮助,请告诉我。
awk -F"|" 'FNR==NR{gsub(/_/,"");a[tolower()]=;next} {gsub(/_/,"")} ((tolower() in a) && !=a[tolower()])' cntrl.txt count.txt
现在也添加了一种非线性形式的解决方案。
awk -F"|" '
FNR==NR{
gsub(/_/,"");
a[tolower()]=;
next}
{ gsub(/_/,"") }
((tolower() in a) && !=a[tolower()])
' cntrl.txt count.txt
说明: 上面的代码也在这里添加说明。
awk -F"|" ' ##Setting field seprator as |(pipe) here for all lines in Input_file(s).
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file(cntrl.txt) in this case is being read. Following instructions will be executed once this condition is TRUE.
gsub(/_/,""); ##Using gsub utility of awk to globally subtitute _ with NULL in current line.
a[tolower()]=; ##Creating an array named a whose index is first field in LOWER CASE to avoid confusions and value is of current line.
next} ##next is awk out of the box keyword which will skip all further instructions now.(to make sure they are read when 2nd Input-file named count.txt is being read).
{ gsub(/_/,"") } ##Statements from here will be executed when 2nd Input_file is being read, using gsub to remove _ all occurrences from line.
((tolower() in a) && !=a[tolower()]) ##Checking condition here if lower form of is present in array a and value of current line is NOT equal to array a value. If this condition is TRUE then print the current line, since I have NOT given any action so by default printing of current line will happen from count.txt file.
' cntrl.txt count.txt ##Mentioning the Input_file names here which we have to pass to awk.
我有两个排序文件有问题
1)one is a control file(ctrl.txt) which is external process generated
2)and other is line count file(count.txt) that I generate using `wc -l`
$更多ctrl.txt
Thunderbird|1000
Mustang|2000
Hurricane|3000
$更多count.txt
Thunder_bird|1000
MUSTANG|2000
Hurricane|3001
我想比较这两个文件,忽略第 1 列(文件名)中的皱纹,例如 "_"(对于 Thunder_bird)或 "upper case"(对于 MUSTANG)以便我的输出仅显示下面的文件作为唯一真正的不同文件,其计数不匹配。
Hurricane|3000
我的想法是只比较两个文件的第二列,如果它们不同则输出整行
我在 AWK 中看到过其他示例,但我无法使用任何东西。
能否请您尝试关注 awk
如果这对您有帮助,请告诉我。
awk -F"|" 'FNR==NR{gsub(/_/,"");a[tolower()]=;next} {gsub(/_/,"")} ((tolower() in a) && !=a[tolower()])' cntrl.txt count.txt
现在也添加了一种非线性形式的解决方案。
awk -F"|" '
FNR==NR{
gsub(/_/,"");
a[tolower()]=;
next}
{ gsub(/_/,"") }
((tolower() in a) && !=a[tolower()])
' cntrl.txt count.txt
说明: 上面的代码也在这里添加说明。
awk -F"|" ' ##Setting field seprator as |(pipe) here for all lines in Input_file(s).
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file(cntrl.txt) in this case is being read. Following instructions will be executed once this condition is TRUE.
gsub(/_/,""); ##Using gsub utility of awk to globally subtitute _ with NULL in current line.
a[tolower()]=; ##Creating an array named a whose index is first field in LOWER CASE to avoid confusions and value is of current line.
next} ##next is awk out of the box keyword which will skip all further instructions now.(to make sure they are read when 2nd Input-file named count.txt is being read).
{ gsub(/_/,"") } ##Statements from here will be executed when 2nd Input_file is being read, using gsub to remove _ all occurrences from line.
((tolower() in a) && !=a[tolower()]) ##Checking condition here if lower form of is present in array a and value of current line is NOT equal to array a value. If this condition is TRUE then print the current line, since I have NOT given any action so by default printing of current line will happen from count.txt file.
' cntrl.txt count.txt ##Mentioning the Input_file names here which we have to pass to awk.