从列中提取子字符串,并使用 bash 添加新列
Extract substring from a column, and add a new column using bash
如有转载请见谅
我的数据(在制表符分隔的文本文件中):
Number Description
1 YR=2020 Country=country_A ID=QWE
2 YR=2020 ID=ASD
3 YR=2019 Country=country_B ID=ZXC
4 Country=country_C ID=POI
我想使用 bash 从 description
列中提取信息。
期望的输出:
Number YR ID
1 2020 QWE
2 2020 ASD
3 2019 ZXC
4 - POI
一个awk
想法:
awk '
BEGIN { OFS="\t" } # set output field delimiter as tab
FNR==1 { print "Number","YR", "ID"; next } # print header, skip to next input line
{ yr=id="-" # init variables for new line
for (i=2;i<=NF;i++) { # loop through space-delimited fields
if ($i ~ /^YR=/) yr=substr($i,4) # save everything after the "="
if ($i ~ /^ID=/) id=substr($i,4) # save everything after the "="
}
print ,yr,id # print new line
}
' input
注意: 删除 # comments ...
以整理代码
这会生成:
Number YR ID
1 2020 QWE
2 2020 ASD
3 2019 ZXC
4 - POI
如有转载请见谅
我的数据(在制表符分隔的文本文件中):
Number Description
1 YR=2020 Country=country_A ID=QWE
2 YR=2020 ID=ASD
3 YR=2019 Country=country_B ID=ZXC
4 Country=country_C ID=POI
我想使用 bash 从 description
列中提取信息。
期望的输出:
Number YR ID
1 2020 QWE
2 2020 ASD
3 2019 ZXC
4 - POI
一个awk
想法:
awk '
BEGIN { OFS="\t" } # set output field delimiter as tab
FNR==1 { print "Number","YR", "ID"; next } # print header, skip to next input line
{ yr=id="-" # init variables for new line
for (i=2;i<=NF;i++) { # loop through space-delimited fields
if ($i ~ /^YR=/) yr=substr($i,4) # save everything after the "="
if ($i ~ /^ID=/) id=substr($i,4) # save everything after the "="
}
print ,yr,id # print new line
}
' input
注意: 删除 # comments ...
以整理代码
这会生成:
Number YR ID
1 2020 QWE
2 2020 ASD
3 2019 ZXC
4 - POI