当行不匹配时从两个文件中的列添加值

Question

我有两个文件有一些公共列，少数没有，我正在尝试添加与公共列关联的列，如下所示：

 paste file1 file2
M246_0.6.motif_CBS_count    15023   M246_0.6.motif_CBS_count    15767
M247_0.6.motif_CBS_count    15023   M247_0.6.motif_CBS_count    15767
M250_0.6.motif_CBS_count    8483    M250_0.6.motif_CBS_count    8815
M254_0.6.motif_CBS_count    12921   M254_0.6.motif_CBS_count    13435
M256_0.6.motif_CBS_count    36045   M256_0.6.motif_CBS_count    39390
M261_0.6.motif_CBS_count    6339    M260_0.6.motif_CBS_count    2
M262_0.6.motif_CBS_count    1026    M261_0.6.motif_CBS_count    6523
M269_0.6.motif_CBS_count    47      M262_0.6.motif_CBS_count    863
M271_0.6.motif_CBS_count    7162    M269_0.6.motif_CBS_count    57
M272_0.6.motif_CBS_count    2245    M271_0.6.motif_CBS_count    8218
M273_0.6.motif_CBS_count    159     M272_0.6.motif_CBS_count    2459

请注意，file2 包含 file1 不包含的 M260，我想要做的就是 a) 将具有公共 column1 的两个文件中的 column2 相加，并保留不常见的那些

 M246_0.6.motif_CBS_count   30790
 M247_0.6.motif_CBS_count   30790
 M250_0.6.motif_CBS_count   17298
 M254_0.6.motif_CBS_count   26356
 M256_0.6.motif_CBS_count   75435
 M260_0.6.motif_CBS_count   2
 M261_0.6.motif_CBS_count   72862   
 M262_0.6.motif_CBS_count   1889    
 M269_0.6.motif_CBS_count   104    
 M271_0.6.motif_CBS_count   15380
 M272_0.6.motif_CBS_count   10463 
 M272_0.6.motif_CBS_count   2459
 M273_0.6.motif_CBS_count   159

Answer 1

您可以尝试使用 gawk，特定于 gawk 的功能 PROCINFO（如果输出顺序无关紧要，则删除此行）

awk '{d[]+=}
     END{
         PROCINFO["sorted_in"] = "@ind_str_asc"; 
         for(k in d){ print k, d[k] }
     }' file1 file2

你明白了，

M246_0.6.motif_CBS_count 30790
M247_0.6.motif_CBS_count 30790
M250_0.6.motif_CBS_count 17298
M254_0.6.motif_CBS_count 26356
M256_0.6.motif_CBS_count 75435
M260_0.6.motif_CBS_count 2
M261_0.6.motif_CBS_count 12862
M262_0.6.motif_CBS_count 1889
M269_0.6.motif_CBS_count 104
M271_0.6.motif_CBS_count 15380
M272_0.6.motif_CBS_count 4704
M273_0.6.motif_CBS_count 159

当行不匹配时从两个文件中的列添加值

Add values from columns in two files when rows not matching

shell

awk

paste