比较两个文件和带标记的输出文件
Compare two files and output file with mark
一整天都在工作,就是无法正常工作,我从 diff awk sed 中尝试的代码太多,可以再次记住我尝试过的代码,
所以这是我的问题,我有 2 个文件(file1 和 file2)
File1 :
#4 and a row (2)
+1 hello post (5)
10 Years After (6)
21 & Over (8)
50_50 (1)
Almost Christmas (3)
File2:
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
我的问题是如何比较这两个文件并像这样生成 File3 输出
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1 http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
----> 表示此文本行不在 file1 中。
我希望我已经解释得足够好,如果可能的话请帮助我,因为我缺乏 linux 的技能,先谢谢你!希望有人能帮我解决这个问题。
~干杯~
来自@RavinderSingh13 的解决方案
awk -v s1="---->" 'FNR==NR{a[[=13=]]=[=13=];next} {val=[=13=];sub(/ http.*/,"",val);printf("%s\n",val in a?[=13=]:s1 OFS [=13=])}' file1 file2
而且效果很好
统一差异化怎么样?例如:
diff -u file1 <(awk 'NF--' file2)
输出:
--- file1 2018-03-26 14:59:49.569347677 +0200
+++ /proc/self/fd/11 2018-03-26 15:01:34.117800718 +0200
@@ -1,6 +1,9 @@
#4 and a row (2)
+1 hello post (5)
+Not over yet (3)
10 Years After (6)
+Can get it done (2)
21 & Over (8)
50_50 (1)
+hear me loud (5)
Almost Christmas (3)
Awk
解法:
awk 'NR==FNR{ a[[=10=]]; next }
{
r = [=10=]; m = "";
sub(/ http:.*/, "");
if ([=10=] in a) delete a[[=10=]]; else m = "----> ";
print m r
}' file1 file2
r = [=13=]
- 分配给当前处理记录的变量
m
- 旨在成为 marker
的变量
输出:
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
能否请您尝试关注 awk
如果这对您有帮助,请告诉我。
awk -v s1="---->" 'FNR==NR{a[[=10=]]=[=10=];next} {val=[=10=];sub(/ http.*/,"",val);printf("%s\n",val in a?[=10=]:s1 OFS [=10=])}' Input_file1 Input_file2
现在也添加 non-one 线性形式的解决方案。
awk -v s1="---->" '
FNR==NR{ a[[=11=]]=[=11=];next }
{
val=[=11=];
sub(/ http.*/,"",val);
printf("%s\n",val in a?[=11=]:s1 OFS [=11=])
}
' Input_file1 Input_file2
$ cat tst.awk
NR==FNR {
keys[[=10=]]
next
}
{
key = [=10=]
sub(/ [^ ]+$/,"",key)
print (key in keys ? "" : "----> ") [=10=]
}
$ awk -f tst.awk file1 file2
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
一整天都在工作,就是无法正常工作,我从 diff awk sed 中尝试的代码太多,可以再次记住我尝试过的代码,
所以这是我的问题,我有 2 个文件(file1 和 file2)
File1 :
#4 and a row (2)
+1 hello post (5)
10 Years After (6)
21 & Over (8)
50_50 (1)
Almost Christmas (3)
File2:
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
我的问题是如何比较这两个文件并像这样生成 File3 输出
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1 http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
----> 表示此文本行不在 file1 中。
我希望我已经解释得足够好,如果可能的话请帮助我,因为我缺乏 linux 的技能,先谢谢你!希望有人能帮我解决这个问题。
~干杯~
来自@RavinderSingh13 的解决方案
awk -v s1="---->" 'FNR==NR{a[[=13=]]=[=13=];next} {val=[=13=];sub(/ http.*/,"",val);printf("%s\n",val in a?[=13=]:s1 OFS [=13=])}' file1 file2
而且效果很好
统一差异化怎么样?例如:
diff -u file1 <(awk 'NF--' file2)
输出:
--- file1 2018-03-26 14:59:49.569347677 +0200
+++ /proc/self/fd/11 2018-03-26 15:01:34.117800718 +0200
@@ -1,6 +1,9 @@
#4 and a row (2)
+1 hello post (5)
+Not over yet (3)
10 Years After (6)
+Can get it done (2)
21 & Over (8)
50_50 (1)
+hear me loud (5)
Almost Christmas (3)
Awk
解法:
awk 'NR==FNR{ a[[=10=]]; next }
{
r = [=10=]; m = "";
sub(/ http:.*/, "");
if ([=10=] in a) delete a[[=10=]]; else m = "----> ";
print m r
}' file1 file2
r = [=13=]
- 分配给当前处理记录的变量m
- 旨在成为marker
的变量
输出:
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5
能否请您尝试关注 awk
如果这对您有帮助,请告诉我。
awk -v s1="---->" 'FNR==NR{a[[=10=]]=[=10=];next} {val=[=10=];sub(/ http.*/,"",val);printf("%s\n",val in a?[=10=]:s1 OFS [=10=])}' Input_file1 Input_file2
现在也添加 non-one 线性形式的解决方案。
awk -v s1="---->" '
FNR==NR{ a[[=11=]]=[=11=];next }
{
val=[=11=];
sub(/ http.*/,"",val);
printf("%s\n",val in a?[=11=]:s1 OFS [=11=])
}
' Input_file1 Input_file2
$ cat tst.awk
NR==FNR {
keys[[=10=]]
next
}
{
key = [=10=]
sub(/ [^ ]+$/,"",key)
print (key in keys ? "" : "----> ") [=10=]
}
$ awk -f tst.awk file1 file2
#4 and a row (2) http://example.com/post1
+1 hello post (5) http://example.com/post2
----> Not over yet (3) http://example.com/post12
10 Years After (6) http://example.com/post3
----> Can get it done (2) http://example.com/post24
21 & Over (8) http://example.com/post9
50_50 (1) http://example.com/post7
----> hear me loud (5) http://example.com/post258
Almost Christmas (3) http://example.com/post5