从列中提取子字符串,并使用 bash 添加新列

Extract substring from a column, and add a new column using bash

如有转载请见谅

我的数据(在制表符分隔的文本文件中):

Number  Description
1   YR=2020 Country=country_A ID=QWE
2   YR=2020 ID=ASD
3   YR=2019 Country=country_B ID=ZXC
4   Country=country_C ID=POI

我想使用 bash 从 description 列中提取信息。

期望的输出:

Number  YR  ID
1   2020    QWE
2   2020    ASD
3   2019    ZXC
4   -   POI

一个awk想法:

awk '
BEGIN  { OFS="\t" }                                  # set output field delimiter as tab
FNR==1 { print "Number","YR", "ID"; next }           # print header, skip to next input line
       { yr=id="-"                                   # init variables for new line
         for (i=2;i<=NF;i++) {                       # loop through space-delimited fields
             if ($i ~ /^YR=/) yr=substr($i,4)        # save everything after the "="
             if ($i ~ /^ID=/) id=substr($i,4)        # save everything after the "="
         }
         print ,yr,id                              # print new line
       }
' input

注意: 删除 # comments ... 以整理代码

这会生成:

Number  YR      ID
1       2020    QWE
2       2020    ASD
3       2019    ZXC
4       -       POI