从多个文件中提取数据列表
extract a list of data from multiple files
我想就此寻求帮助。非常感谢!
我有数千个文件,每个文件包含 5 列,第一列包含名称。
$ cat file1
name math eng hist sci
Kyle 56 45 68 97
Angela 88 86 59 30
June 48 87 85 98
我还有一个包含可在 5 列文件中找到的姓名列表的文件。
$ cat list.txt
June
Isa
Angela
Manny
具体来说,我想以结构化的方式提取与我拥有的列表文件对应的第 3 列中的数据;代表数千个文件的列和作为行的名称。如果列表文件中的一个名称在 5 列文件中不存在,则应显示为 0。此外,列应以文件名开头。
$ cat output.txt
names file1 file2 file3 file4
June 87 65 67 87
Isa 0 0 0 54
Angela 86 75 78 78
Manny 39 46 0 38
$ cat awk-script
BEGIN{f_name="names"} # save the "names" to var f_name
NR==FNR{
a[]=;b[];next # assign 2 array a & b, which keys is the content of "list.txt"
}
FNR==1{ # a new file is scanned
f_name=f_name"\t"FILENAME; # save the FILENAME to f_name
for(i in a){
a[i]=b[i]==""?a[i]:a[i]"\t"b[i]; # flush the value of b[i] to append to the value of a[i]
b[i]=0 # reset the value of b[i]
}
}
{ if( in b){b[]=} } # set as the value of b[] if existed in the keys of array b
END{
print f_name; # print the f_name
for(i in a){
a[i]=b[i]==""?a[i]:a[i]"\t"b[i]; # flush the the value of b[i] to a[i] belongs to the last file
print a[i] # print a[i]
}
}
假设多了一个文件(即file1、file2等),可以使用命令获取结果,
$ awk -f awk-script list.txt file*
names file1 file2
Manny 0 46
Isa 0 0
Angela 86 75
June 87 65
使用您的测试文件 list.txt
和 file1
(两次)进行测试。首先是 awk:
$ cat program.awk
function isEmpty(arr, idx) { # using @EdMorton's test for array emptiness
for (idx in arr) # for figuring out the first data file
return 0 #
return 1
}
function add(n,a) { # appending grades for the chosen ones
if(!isEmpty(a)) { # if a is not empty
for(i in n) # iterate thru all chosen ones
n[i]=n[i] (n[i]==""?"":OFS) (i in a?a[i]:0) # and append
}
}
FNR==1 { # for each new file
h=h (h==""?"":OFS) FILENAME # build header
process(n,a) # and process the previous file in hash a
}
NR==FNR { # chosen ones to hash n
n[]
next
}
in n { # add chosen ones to a
a[]= #
}
END {
process(n,a) # in the end
print h # print the header
for(i in n) # and names with grades
print i,n[i]
}
运行它:
$ awk -f program.awk list.txt file1 file1
list.txt file1 file1
Manny 0 0
Isa 0 0
Angela 86 86
June 87 87
我想就此寻求帮助。非常感谢!
我有数千个文件,每个文件包含 5 列,第一列包含名称。
$ cat file1
name math eng hist sci
Kyle 56 45 68 97
Angela 88 86 59 30
June 48 87 85 98
我还有一个包含可在 5 列文件中找到的姓名列表的文件。
$ cat list.txt
June
Isa
Angela
Manny
具体来说,我想以结构化的方式提取与我拥有的列表文件对应的第 3 列中的数据;代表数千个文件的列和作为行的名称。如果列表文件中的一个名称在 5 列文件中不存在,则应显示为 0。此外,列应以文件名开头。
$ cat output.txt
names file1 file2 file3 file4
June 87 65 67 87
Isa 0 0 0 54
Angela 86 75 78 78
Manny 39 46 0 38
$ cat awk-script
BEGIN{f_name="names"} # save the "names" to var f_name
NR==FNR{
a[]=;b[];next # assign 2 array a & b, which keys is the content of "list.txt"
}
FNR==1{ # a new file is scanned
f_name=f_name"\t"FILENAME; # save the FILENAME to f_name
for(i in a){
a[i]=b[i]==""?a[i]:a[i]"\t"b[i]; # flush the value of b[i] to append to the value of a[i]
b[i]=0 # reset the value of b[i]
}
}
{ if( in b){b[]=} } # set as the value of b[] if existed in the keys of array b
END{
print f_name; # print the f_name
for(i in a){
a[i]=b[i]==""?a[i]:a[i]"\t"b[i]; # flush the the value of b[i] to a[i] belongs to the last file
print a[i] # print a[i]
}
}
假设多了一个文件(即file1、file2等),可以使用命令获取结果,
$ awk -f awk-script list.txt file*
names file1 file2
Manny 0 46
Isa 0 0
Angela 86 75
June 87 65
使用您的测试文件 list.txt
和 file1
(两次)进行测试。首先是 awk:
$ cat program.awk
function isEmpty(arr, idx) { # using @EdMorton's test for array emptiness
for (idx in arr) # for figuring out the first data file
return 0 #
return 1
}
function add(n,a) { # appending grades for the chosen ones
if(!isEmpty(a)) { # if a is not empty
for(i in n) # iterate thru all chosen ones
n[i]=n[i] (n[i]==""?"":OFS) (i in a?a[i]:0) # and append
}
}
FNR==1 { # for each new file
h=h (h==""?"":OFS) FILENAME # build header
process(n,a) # and process the previous file in hash a
}
NR==FNR { # chosen ones to hash n
n[]
next
}
in n { # add chosen ones to a
a[]= #
}
END {
process(n,a) # in the end
print h # print the header
for(i in n) # and names with grades
print i,n[i]
}
运行它:
$ awk -f program.awk list.txt file1 file1
list.txt file1 file1
Manny 0 0
Isa 0 0
Angela 86 86
June 87 87