使用命令行在两个文本文件中查找重复的单词

Question

我有两个文本文件：

f1.txt

boom Boom pow
Lazy dog runs.
The Grass is Green
This is TEST
Welcome

和

f2.txt

Welcome
I am lazy
Welcome, Green
This is my room
Welcome
bye

我在 Ubuntu 命令行中尝试：

awk 'BEGIN {RS=" "}FNR==NR {a[]=NR; next}  in a' f1.txt f2.txt

并获得输出：

Green
This
is

我想要的输出是：

lazy
Green
This is
Welcome

描述：我想逐行比较两个 txt 文件。然后我想输出所有重复的单词。匹配项不应区分大小写。此外，逐行比较比在整个 f2.txt 文件中查找来自 f1.txt 的匹配项会更好。例如，单词 "Welcome" 如果它在 f2.txt

中的第 6 行而不是第 5 行，则不应出现在所需的输出中

Answer 1

那好吧。使用 awk:

awk 'NR == FNR { for(i = 1; i <= NF; ++i) { a[NR,tolower($i)] = 1 }; next } { flag = 0; for(i = 1; i <= NF; ++i) { if(a[FNR,tolower($i)]) { printf("%s%s", flag ? OFS : "", $i); flag = 1 } } if(flag) print "" }' f1.txt f2.txt

其工作原理如下：

NR == FNR {                                 # While processing the first file:
  for(i = 1; i <= NF; ++i) {                # Remember which fields were in
    a[NR,tolower($i)] = 1                   # each line (lower-cased)
  }
  next                                      # Do nothing else.
}
{                                           # After that (when processing the
                                            # second file)
  flag = 0                                  # reset flag so we know we haven't
                                            # printed anything yet
  for(i = 1; i <= NF; ++i) {                # wade through fields (words)
    if(a[FNR,tolower($i)]) {                # if this field was in the
                                            # corresponding line in the first
                                            # file, then
      printf("%s%s", flag ? OFS : "", $i)   # print it (with a separator if it
                                            # isn't the first)
      flag = 1                              # raise flag
    }
  }
  if(flag) {                                # and if we printed anything
    print ""                                # add a newline at the end.
  }
}

使用命令行在两个文本文件中查找重复的单词

Find duplicate words in two text files using command line

unix

awk

command

compare