awk 中的 groupby 列
groupby columns in awk
您好,我想在 awk 中转换一个 python 脚本,如何从数据框中按列进行分组。
import pandas as pd
df = pd.read_csv("data.csv")
res0 = df.groupby("genes").agg({'start':'count'}).reset_index()
res0
如何使用 awk 或 sh 执行此操作?
没有更多的细节,很难帮到你;这能解决您的问题吗?
最小可重现示例:
cat test.csv
genes,timepoint,value
P53,1,3.1
P53,2,3.2
P53,3,4.5
P53,4,5.1
P53,5,6.6
TRIM43,1,44
TRIM43,2,50
TRIM43,3,55
TRIM43,4,60
TRIM43,5,67
GAPDH,1,0.1
GAPDH,2,0.1
GAPDH,3,0.1
GAPDH,4,0.1
GAPDH,5,0.1
运行 python 脚本
cat test.py
#!/usr/bin/env python3
import pandas as pd
df = pd.read_csv("test.csv")
res0 = df.groupby("genes").agg({'value':'count'}).reset_index()
print(res0)
./test.py
genes value
0 GAPDH 5
1 P53 5
2 TRIM43 5
用awk复制它
awk 'BEGIN{FS=","; OFS="\t"}
NR==1 {print "genes","value"}
NR>1 {genes[]++}
END {for (i in genes)
print i, genes[i]
}' test.csv
genes value
GAPDH 5
TRIM43 5
P53 5
您好,我想在 awk 中转换一个 python 脚本,如何从数据框中按列进行分组。
import pandas as pd
df = pd.read_csv("data.csv")
res0 = df.groupby("genes").agg({'start':'count'}).reset_index()
res0
如何使用 awk 或 sh 执行此操作?
没有更多的细节,很难帮到你;这能解决您的问题吗?
最小可重现示例:
cat test.csv
genes,timepoint,value
P53,1,3.1
P53,2,3.2
P53,3,4.5
P53,4,5.1
P53,5,6.6
TRIM43,1,44
TRIM43,2,50
TRIM43,3,55
TRIM43,4,60
TRIM43,5,67
GAPDH,1,0.1
GAPDH,2,0.1
GAPDH,3,0.1
GAPDH,4,0.1
GAPDH,5,0.1
运行 python 脚本
cat test.py
#!/usr/bin/env python3
import pandas as pd
df = pd.read_csv("test.csv")
res0 = df.groupby("genes").agg({'value':'count'}).reset_index()
print(res0)
./test.py
genes value
0 GAPDH 5
1 P53 5
2 TRIM43 5
用awk复制它
awk 'BEGIN{FS=","; OFS="\t"}
NR==1 {print "genes","value"}
NR>1 {genes[]++}
END {for (i in genes)
print i, genes[i]
}' test.csv
genes value
GAPDH 5
TRIM43 5
P53 5