比较两个文件并将缺失值添加到文件中
Comparing two files and adding the missing values to a file
我有一个大文件 (file_new.txt),其中一组属性及其值出现了多次。现在在某些集合中,与一个示例文件 (sample.txt) 属性相比,某些属性及其值会丢失。
Sample.txt
apple = 0
black = 0
cat = 0
dog = 0
elephant = 0
file_next.txt
apple = 6
black = 7
elephant = 8
==============
apple=9
cat = 10
elephant =11
我在这里寻找如下输出(sample.txt 中缺少的属性应该添加到 file_new.txt 中,值为零)
file_output.txt
apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
=============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11
注意 =第一个和最后一个属性值是永久的(这里是苹果和大象)
谢谢
awk -F '[[:blank:]]*=[[:blank:]]*' '
function Feed() {
for( Key in ToAdd){
if( ToAdd[ Key] == 1) print Sample[ Key]
else ToAdd[ Key] = 1
}
return
}
FNR == NR { Sample[]=[=10=];ToAdd[]=1}
FNR != NR && [=10=] !~ /^=====/ { ToAdd[ ]=0; print }
[=10=] ~ /^=====/ { Feed(); print }
END { Feed() }
' Sample.txt file_new.txt
使用:
- 数据关联数组和数据计数器打印或提醒打印
- 函数避免两次相同的代码(在
=====
之前和之后)
文件的顺序是强制性的
$ cat tst.awk
BEGIN { FS="[[:space:]]*=[[:space:]]s*"; OFS=" = " }
NR==FNR { names[++numNames] = ; dflt[] = ; next }
/^=+$/ { prtRec(); print }
{ curr[] = }
END { prtRec() }
function prtRec() {
for (nameNr=1; nameNr<=numNames; nameNr++) {
name = names[nameNr]
print name, (name in curr ? curr[name] : dflt[name])
}
delete curr
}
$ awk -f tst.awk sample.txt file_next.txt
apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
==============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11
或者如果你不关心每条输出记录中行的顺序,那就更简单了:
$ cat tst2.awk
BEGIN { FS="[[:space:]]*=[[:space:]]*"; OFS=" = " }
NR==FNR { dflt[] = ; next }
/^=+$/ { prtRec(); print }
{ curr[] = }
END { prtRec() }
function prtRec() {
for (name in dflt) {
print name, (name in curr ? curr[name] : dflt[name])
}
delete curr
}
$ awk -f tst2.awk sample.txt file_next.txt
apple = 6
elephant = 8
cat = 0
black = 7
dog = 0
==============
apple = 9
elephant = 11
cat = 10
black = 0
dog = 0
我有一个大文件 (file_new.txt),其中一组属性及其值出现了多次。现在在某些集合中,与一个示例文件 (sample.txt) 属性相比,某些属性及其值会丢失。
Sample.txt
apple = 0
black = 0
cat = 0
dog = 0
elephant = 0
file_next.txt
apple = 6
black = 7
elephant = 8
==============
apple=9
cat = 10
elephant =11
我在这里寻找如下输出(sample.txt 中缺少的属性应该添加到 file_new.txt 中,值为零)
file_output.txt
apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
=============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11
注意 =第一个和最后一个属性值是永久的(这里是苹果和大象)
谢谢
awk -F '[[:blank:]]*=[[:blank:]]*' '
function Feed() {
for( Key in ToAdd){
if( ToAdd[ Key] == 1) print Sample[ Key]
else ToAdd[ Key] = 1
}
return
}
FNR == NR { Sample[]=[=10=];ToAdd[]=1}
FNR != NR && [=10=] !~ /^=====/ { ToAdd[ ]=0; print }
[=10=] ~ /^=====/ { Feed(); print }
END { Feed() }
' Sample.txt file_new.txt
使用:
- 数据关联数组和数据计数器打印或提醒打印
- 函数避免两次相同的代码(在
=====
之前和之后)
文件的顺序是强制性的
$ cat tst.awk
BEGIN { FS="[[:space:]]*=[[:space:]]s*"; OFS=" = " }
NR==FNR { names[++numNames] = ; dflt[] = ; next }
/^=+$/ { prtRec(); print }
{ curr[] = }
END { prtRec() }
function prtRec() {
for (nameNr=1; nameNr<=numNames; nameNr++) {
name = names[nameNr]
print name, (name in curr ? curr[name] : dflt[name])
}
delete curr
}
$ awk -f tst.awk sample.txt file_next.txt
apple = 6
black = 7
cat = 0
dog = 0
elephant = 8
==============
apple = 9
black = 0
cat = 10
dog = 0
elephant = 11
或者如果你不关心每条输出记录中行的顺序,那就更简单了:
$ cat tst2.awk
BEGIN { FS="[[:space:]]*=[[:space:]]*"; OFS=" = " }
NR==FNR { dflt[] = ; next }
/^=+$/ { prtRec(); print }
{ curr[] = }
END { prtRec() }
function prtRec() {
for (name in dflt) {
print name, (name in curr ? curr[name] : dflt[name])
}
delete curr
}
$ awk -f tst2.awk sample.txt file_next.txt
apple = 6
elephant = 8
cat = 0
black = 7
dog = 0
==============
apple = 9
elephant = 11
cat = 10
black = 0
dog = 0