加入 2 个不同行数的文件
join 2 files with different number of rows
早上好
我有 2 个文件,我想加入它们。
我正在使用 awk,但我可以在 bash 中使用其他命令。
问题是当我尝试 awk 时,一些不在两个文件中的记录不会出现在最终文件中。
文件 1
supply_DBReplication, 27336
test_after_upgrade, 0
test_describe_topic, 0
teste2e_funcional, 0
test_latency, 0
test_replication, 0
ticket_dl, 90010356798
ticket_dl.replica_cloudera, 0
traza_auditoria_eventos, 0
Ezequiel1,473789563
Ezequiel2,526210437
Ezequiel3,1000000000
文件2
Domimio2,supply_bdsupply-stock-valorado-sherpa
Domimio8,supply_DBReplication
Domimio9,test_after_upgrade
Domimio7,test_describe_topic
Domimio3,teste2e_funcional
,test_latency
,test_replication
,ticket_dl
,ticket_dl.replica_cloudera
,traza_auditoria_eventos
我希望:
文件 3
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000
我正在执行这个
awk 'NR==FNR {T[]=FS ; next} {print T[]}' FS="," file1 file2
但是我得到了:
Domimio2, 0
Domimio8, 27336
Domimio9, 0
Domimio7, 0
Domimio3, 0
, 0
, 0
, 90010356798
, 0
, 23034
, 0
我该怎么做?
谢谢
假设:
- 加入条件:
file1.field#1
== file2.field#2
- 输出格式:
file2.field#1
,
file1,field#2
file2
- 如果字段#1 为空,则替换为 NoDomain
file2.field#2
- 如果 file1.field#1
中没有匹配,则输出 file2.field#1
+ 0
file1.field#1
- 如果 file2.field#2
中没有匹配项,则输出 NoDomain
+ file1.field#2
(按 field#2
值排序)
一个GNU awk
想法:
awk '
BEGIN { FS=OFS="," }
NR==FNR { gsub(" ","",) # strip blanks from field #2
a[]=
next
}
{ = ( == "") ? "NoDomain" : # if file2.field#1 is missing then set to "NoDomain"
print ,a[]+0
delete a[] # delete file1 entry so we do not print again in the END{} block
}
END { PROCINFO["sorted_in"]="@val_num_asc" # any entries leftover from file1 (ie, no matches) then sort by value and ...
for (i in a)
print "NoDomain",a[i] # print to stdout
}
' file1 file2
注意:使用PROCINFO["sorted_in"]
需要GNU awk
;如果不需要对 file1
剩菜进行排序,则可以从代码中删除 PROCINFO["sorted_in"]="@val_num_asc"
这会生成:
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000
早上好
我有 2 个文件,我想加入它们。
我正在使用 awk,但我可以在 bash 中使用其他命令。 问题是当我尝试 awk 时,一些不在两个文件中的记录不会出现在最终文件中。
文件 1
supply_DBReplication, 27336
test_after_upgrade, 0
test_describe_topic, 0
teste2e_funcional, 0
test_latency, 0
test_replication, 0
ticket_dl, 90010356798
ticket_dl.replica_cloudera, 0
traza_auditoria_eventos, 0
Ezequiel1,473789563
Ezequiel2,526210437
Ezequiel3,1000000000
文件2
Domimio2,supply_bdsupply-stock-valorado-sherpa
Domimio8,supply_DBReplication
Domimio9,test_after_upgrade
Domimio7,test_describe_topic
Domimio3,teste2e_funcional
,test_latency
,test_replication
,ticket_dl
,ticket_dl.replica_cloudera
,traza_auditoria_eventos
我希望:
文件 3
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000
我正在执行这个
awk 'NR==FNR {T[]=FS ; next} {print T[]}' FS="," file1 file2
但是我得到了:
Domimio2, 0
Domimio8, 27336
Domimio9, 0
Domimio7, 0
Domimio3, 0
, 0
, 0
, 90010356798
, 0
, 23034
, 0
我该怎么做?
谢谢
假设:
- 加入条件:
file1.field#1
==file2.field#2
- 输出格式:
file2.field#1
,
file1,field#2
file2
- 如果字段#1 为空,则替换为NoDomain
file2.field#2
- 如果file1.field#1
中没有匹配,则输出file2.field#1
+0
file1.field#1
- 如果file2.field#2
中没有匹配项,则输出NoDomain
+file1.field#2
(按field#2
值排序)
一个GNU awk
想法:
awk '
BEGIN { FS=OFS="," }
NR==FNR { gsub(" ","",) # strip blanks from field #2
a[]=
next
}
{ = ( == "") ? "NoDomain" : # if file2.field#1 is missing then set to "NoDomain"
print ,a[]+0
delete a[] # delete file1 entry so we do not print again in the END{} block
}
END { PROCINFO["sorted_in"]="@val_num_asc" # any entries leftover from file1 (ie, no matches) then sort by value and ...
for (i in a)
print "NoDomain",a[i] # print to stdout
}
' file1 file2
注意:使用PROCINFO["sorted_in"]
需要GNU awk
;如果不需要对 file1
剩菜进行排序,则可以从代码中删除 PROCINFO["sorted_in"]="@val_num_asc"
这会生成:
Domimio2,0
Domimio8,27336
Domimio9,0
Domimio7,0
Domimio3,0
NoDomain,0
NoDomain,0
NoDomain,90010356798
NoDomain,0
NoDomain,0
NoDomain,473789563
NoDomain,526210437
NoDomain,1000000000