如何获取至少存在于两个或更多文件中的公共行？

Question

我有七个测试文件。他们看起来像以下

文件 1

chr     start   end     strand
chr1    10525   10525   +
chr1    10542   10542   +
chr1    10571   10571   +
chr1    10577   10577   +
chr2    10589   10589   +
chr2    565262  565262  +
chr2    565397  565397  +
chr3    567239  567239  +
chr3    567312  567312  +
chr4    567348  567348  +

如何获取至少两个文件中以下格式的公共行

chr     start   end     strand  File1   File2   File3   File4   File5   File6   File7
chr1    10525   10525   +   0   1   0   0   0   1   1
chr1    10542   10542   +   1   1   1   1   1   0   0
chr1    10571   10571   +   0   1   0   1   1   0   0
chr3    10577   10577   +   1   1   0   0   0   1   0
chr3    10589   10589   +   0   0   1   0   1   0   1
chr4    565262  565262  +   1   0   0   1   1   1   1

“1”代表给定文件中存在的行，“0”代表给定文件中确实存在的行。我不想显示在任何文件中不常见的行。

Answer 1

使用 awk：

awk '
    FNR==1{ #Header line:
        fn[++i]=FILENAME; # record filenames 
        fn[0]=[=10=]; # & file header
    }

    (FNR>1){ # For lines other than header lines
        list[[=10=]]++; # Record line
        file_list[[=10=] FILENAME]++; # Record which file has that line
    }

    END{
        for(t=0;t<=i;t++) printf "%s\t", fn[t]; # Print header & file names
        print ""; # Quick hack for printing newline.
        for(t in list){ # For every line that occurred in any of the files
            if (list[t]>=2){ # If count is >= 2
                printf "%s\t", t; # Print line
                for(j=1;j<=i;j++) {
                    printf "%d\t", file_list[t fn[j]]; # Print per file occurrence count.
                }
                print "" # Print newline.
            }
        }
    }' File{1..7}

如何获取至少存在于两个或更多文件中的公共行？

How can I get common rows that exist in at lest two files or more?

bash

text

text-files