加入 2 个不同行数的文件

join 2 files with different number of rows

早上好

我有 2 个文件,我想加入它们。

我正在使用 awk,但我可以在 bash 中使用其他命令。 问题是当我尝试 awk 时,一些不在两个文件中的记录不会出现在最终文件中。

文件 1

supply_DBReplication, 27336
test_after_upgrade, 0
test_describe_topic, 0
teste2e_funcional, 0
test_latency, 0
test_replication, 0
ticket_dl, 90010356798
ticket_dl.replica_cloudera, 0
traza_auditoria_eventos, 0
Ezequiel1,473789563
Ezequiel2,526210437
Ezequiel3,1000000000

文件2

Domimio2,supply_bdsupply-stock-valorado-sherpa
Domimio8,supply_DBReplication
Domimio9,test_after_upgrade
Domimio7,test_describe_topic
Domimio3,teste2e_funcional
,test_latency
,test_replication
,ticket_dl
,ticket_dl.replica_cloudera
,traza_auditoria_eventos

我希望:

文件 3

Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000

我正在执行这个

awk 'NR==FNR {T[]=FS ; next} {print  T[]}' FS="," file1 file2

但是我得到了:

Domimio2, 0
Domimio8, 27336
Domimio9, 0
Domimio7, 0
Domimio3, 0
, 0
, 0
, 90010356798
, 0
, 23034
, 0

我该怎么做?

谢谢

假设:

  • 加入条件:file1.field#1 == file2.field#2
  • 输出格式:file2.field#1,file1,field#2
  • file2 - 如果字段#1 为空,则替换为 NoDomain
  • file2.field#2 - 如果 file1.field#1 中没有匹配,则输出 file2.field#1 + 0
  • file1.field#1 - 如果 file2.field#2 中没有匹配项,则输出 NoDomain + file1.field#2(按 field#2 值排序)

一个GNU awk想法:

awk '
BEGIN   { FS=OFS="," }

NR==FNR { gsub(" ","",)                        # strip blanks from field #2
          a[]=
          next
        }

        {  = ( == "") ? "NoDomain" :       # if file2.field#1 is missing then set to "NoDomain"
          print ,a[]+0
          delete a[]                           # delete file1 entry so we do not print again in the END{} block
        }

END     { PROCINFO["sorted_in"]="@val_num_asc"   # any entries leftover from file1 (ie, no matches) then sort by value and ...
          for (i in a)
              print "NoDomain",a[i]              # print to stdout
        }
' file1 file2

注意:使用PROCINFO["sorted_in"]需要GNU awk;如果不需要对 file1 剩菜进行排序,则可以从代码中删除 PROCINFO["sorted_in"]="@val_num_asc"

这会生成:

Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000