如何根据第 1 列汇总所有其他列?
How to sum up all other columns based on column 1?
我有一个示例 csv 文件,如下所示(但更多的列编号最多为示例 100 和几行)
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,44
Unclassified,0,0,392
Unclassified,0,0,0
Woeseia,0,0,76
我想要一个汇总的 csv 文件,如下所示,其中汇总了第 1 列中所有相同的条目
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
我尝试了以下 awk 代码,但没有成功
awk -F "," 'function SP() {n=split ([=12=], T); ID=}
function PR() {printf "%s", ID; for (i=2; i<=n; i++) printf "\t%s", T[i]; printf "\n"}
NR==1 {SP();next}
!= ID {PR(); SP(); next}
{for (i=2; i<=NF; i++) T[i]+=$i}
END {PR()}
' Filename.csv
我也知道做类似下面的事情,但是当有数百列时这是不切实际的。如有任何帮助,我们将不胜感激。
awk -F "," ' NR==1 {print; next} NF {a[]+=; b[]+=; c[]+=; d[]+=; e[]+=; f[]++} END {for(i in a)print i, a[i], b[i], c[i], d[i], e[i], f[i]} ' Filename.csv
使用您显示的示例,请尝试执行以下 awk
程序。您无需创建这么多数组,您可以在此处轻松创建 1 或 2 个数组。
awk '
BEGIN { FS=OFS="," }
FNR==1{
print
next
}
{
for(i=2;i<=NF;i++){
arr1[]
arr2[,i]+=$i
}
}
END{
for(i in arr1){
printf("%s,",i)
for(j=2;j<=NF;j++){
printf("%s%s",arr2[i,j],j==NF?ORS:OFS)
}
}
}
' Input_file
输出如下:
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
BEGIN { FS=OFS="," } ##In BEGIN section setting FS and OFS as comma here.
FNR==1{ ##Checking if this is first line then do following.
print ##Printing current line.
next ##next will skip further statements from here.
}
{
for(i=2;i<=NF;i++){ ##Running for loop from 2nd field to till NF here.
arr1[] ##Creating arr1 array with index of 1st field.
arr2[,i]+=$i ##Creating arr2 with index of 1st field and current field number and value is current field value which is keep adding into it.
}
}
END{ ##Starting END block for this program from here.
for(i in arr1){ ##Traversing through arr1 all elements here one by one.
printf("%s,",i) ##Printing its current index here.
for(j=2;j<=NF;j++){ ##Running for loop from 2nd field to till NF here.
printf("%s%s",arr2[i,j],j==NF?ORS:OFS) ##Printing value of arr2 with index of i and j, printing new line if its last field.
}
}
}
' Input_file ##Mentioning Input_file here.
这是另一个 awk
:
awk -v FS=',' -v OFS=',' '
NR == 1 {
print
next
}
{
ids[]
for (i = 2; i <= NF; i++)
sums[i "," ] += $i
}
END {
for (id in ids) {
out = id
for (i = 2; i <= NF; i++)
out = out OFS sums[i "," id]
print out
}
}
' Filename.csv
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
您还可以使用提供数据分析工具的 CSV-aware 程序。
这是 Miller, which is available as a stand-alone executable:
的示例
IFS='' read -r csv_header < Filename.csv
mlr --csv \
stats1 -a sum -g "${csv_header%%,*}" -f "${csv_header#*,}" \
then rename -r '(.*)_sum,' \
Filename.csv
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
我有一个示例 csv 文件,如下所示(但更多的列编号最多为示例 100 和几行)
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,44
Unclassified,0,0,392
Unclassified,0,0,0
Woeseia,0,0,76
我想要一个汇总的 csv 文件,如下所示,其中汇总了第 1 列中所有相同的条目
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
我尝试了以下 awk 代码,但没有成功
awk -F "," 'function SP() {n=split ([=12=], T); ID=}
function PR() {printf "%s", ID; for (i=2; i<=n; i++) printf "\t%s", T[i]; printf "\n"}
NR==1 {SP();next}
!= ID {PR(); SP(); next}
{for (i=2; i<=NF; i++) T[i]+=$i}
END {PR()}
' Filename.csv
我也知道做类似下面的事情,但是当有数百列时这是不切实际的。如有任何帮助,我们将不胜感激。
awk -F "," ' NR==1 {print; next} NF {a[]+=; b[]+=; c[]+=; d[]+=; e[]+=; f[]++} END {for(i in a)print i, a[i], b[i], c[i], d[i], e[i], f[i]} ' Filename.csv
使用您显示的示例,请尝试执行以下 awk
程序。您无需创建这么多数组,您可以在此处轻松创建 1 或 2 个数组。
awk '
BEGIN { FS=OFS="," }
FNR==1{
print
next
}
{
for(i=2;i<=NF;i++){
arr1[]
arr2[,i]+=$i
}
}
END{
for(i in arr1){
printf("%s,",i)
for(j=2;j<=NF;j++){
printf("%s%s",arr2[i,j],j==NF?ORS:OFS)
}
}
}
' Input_file
输出如下:
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
说明:为以上代码添加详细说明。
awk ' ##Starting awk program from here.
BEGIN { FS=OFS="," } ##In BEGIN section setting FS and OFS as comma here.
FNR==1{ ##Checking if this is first line then do following.
print ##Printing current line.
next ##next will skip further statements from here.
}
{
for(i=2;i<=NF;i++){ ##Running for loop from 2nd field to till NF here.
arr1[] ##Creating arr1 array with index of 1st field.
arr2[,i]+=$i ##Creating arr2 with index of 1st field and current field number and value is current field value which is keep adding into it.
}
}
END{ ##Starting END block for this program from here.
for(i in arr1){ ##Traversing through arr1 all elements here one by one.
printf("%s,",i) ##Printing its current index here.
for(j=2;j<=NF;j++){ ##Running for loop from 2nd field to till NF here.
printf("%s%s",arr2[i,j],j==NF?ORS:OFS) ##Printing value of arr2 with index of i and j, printing new line if its last field.
}
}
}
' Input_file ##Mentioning Input_file here.
这是另一个 awk
:
awk -v FS=',' -v OFS=',' '
NR == 1 {
print
next
}
{
ids[]
for (i = 2; i <= NF; i++)
sums[i "," ] += $i
}
END {
for (id in ids) {
out = id
for (i = 2; i <= NF; i++)
out = out OFS sums[i "," id]
print out
}
}
' Filename.csv
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76
您还可以使用提供数据分析工具的 CSV-aware 程序。
这是 Miller, which is available as a stand-alone executable:
IFS='' read -r csv_header < Filename.csv
mlr --csv \
stats1 -a sum -g "${csv_header%%,*}" -f "${csv_header#*,}" \
then rename -r '(.*)_sum,' \
Filename.csv
Genus,Sample1,Sample2,Sample3
Unclassified,0,1,436
Woeseia,0,0,76