使用命令行在两个文本文件中查找重复的单词
Find duplicate words in two text files using command line
我有两个文本文件:
f1.txt
boom Boom pow
Lazy dog runs.
The Grass is Green
This is TEST
Welcome
和
f2.txt
Welcome
I am lazy
Welcome, Green
This is my room
Welcome
bye
我在 Ubuntu 命令行中尝试:
awk 'BEGIN {RS=" "}FNR==NR {a[]=NR; next} in a' f1.txt f2.txt
并获得输出:
Green
This
is
我想要的输出是:
lazy
Green
This is
Welcome
描述:我想逐行比较两个 txt 文件。然后我想输出所有重复的单词。匹配项不应区分大小写。此外,逐行比较比在整个 f2.txt 文件中查找来自 f1.txt 的匹配项会更好。例如,单词 "Welcome" 如果它在 f2.txt
中的第 6 行而不是第 5 行,则不应出现在所需的输出中
那好吧。使用 awk:
awk 'NR == FNR { for(i = 1; i <= NF; ++i) { a[NR,tolower($i)] = 1 }; next } { flag = 0; for(i = 1; i <= NF; ++i) { if(a[FNR,tolower($i)]) { printf("%s%s", flag ? OFS : "", $i); flag = 1 } } if(flag) print "" }' f1.txt f2.txt
其工作原理如下:
NR == FNR { # While processing the first file:
for(i = 1; i <= NF; ++i) { # Remember which fields were in
a[NR,tolower($i)] = 1 # each line (lower-cased)
}
next # Do nothing else.
}
{ # After that (when processing the
# second file)
flag = 0 # reset flag so we know we haven't
# printed anything yet
for(i = 1; i <= NF; ++i) { # wade through fields (words)
if(a[FNR,tolower($i)]) { # if this field was in the
# corresponding line in the first
# file, then
printf("%s%s", flag ? OFS : "", $i) # print it (with a separator if it
# isn't the first)
flag = 1 # raise flag
}
}
if(flag) { # and if we printed anything
print "" # add a newline at the end.
}
}
我有两个文本文件:
f1.txt
boom Boom pow
Lazy dog runs.
The Grass is Green
This is TEST
Welcome
和
f2.txt
Welcome
I am lazy
Welcome, Green
This is my room
Welcome
bye
我在 Ubuntu 命令行中尝试:
awk 'BEGIN {RS=" "}FNR==NR {a[]=NR; next} in a' f1.txt f2.txt
并获得输出:
Green
This
is
我想要的输出是:
lazy
Green
This is
Welcome
描述:我想逐行比较两个 txt 文件。然后我想输出所有重复的单词。匹配项不应区分大小写。此外,逐行比较比在整个 f2.txt 文件中查找来自 f1.txt 的匹配项会更好。例如,单词 "Welcome" 如果它在 f2.txt
中的第 6 行而不是第 5 行,则不应出现在所需的输出中那好吧。使用 awk:
awk 'NR == FNR { for(i = 1; i <= NF; ++i) { a[NR,tolower($i)] = 1 }; next } { flag = 0; for(i = 1; i <= NF; ++i) { if(a[FNR,tolower($i)]) { printf("%s%s", flag ? OFS : "", $i); flag = 1 } } if(flag) print "" }' f1.txt f2.txt
其工作原理如下:
NR == FNR { # While processing the first file:
for(i = 1; i <= NF; ++i) { # Remember which fields were in
a[NR,tolower($i)] = 1 # each line (lower-cased)
}
next # Do nothing else.
}
{ # After that (when processing the
# second file)
flag = 0 # reset flag so we know we haven't
# printed anything yet
for(i = 1; i <= NF; ++i) { # wade through fields (words)
if(a[FNR,tolower($i)]) { # if this field was in the
# corresponding line in the first
# file, then
printf("%s%s", flag ? OFS : "", $i) # print it (with a separator if it
# isn't the first)
flag = 1 # raise flag
}
}
if(flag) { # and if we printed anything
print "" # add a newline at the end.
}
}