如何将每组记录的名称和值附加到多条记录中

How to append name and values of each set of records into multiple records

使用下面的代码,但需要更多时间阅读。

while read TAG
do
    TAGNAME=$(echo $TAG | cut -d '>' -f1)
    TAGVALUE=$(echo $TAG | cut -d '>' -f2)
    if [ "$TAGNAME" =  "START_OF_REC" ]
    then
        CNT_VAR=`expr $CNT_VAR + 1`
        DERIVED_ID=${DATE_VAR}${CNT_VAR}

        CUST_ID_VAR="NULL_CUST_ID"
        OPPOR_ID_VAR="NULL_OPPOR_ID"

    elif [ "$TAGNAME" = "bd-cust-id" ]
    then
        CUST_ID_VAR=$TAGVALUE
        sed -i 's/NULL_CUST_ID/'$CUST_ID_VAR'/g' $FLAT_FILE 
        echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE 

    elif [ "$TAGNAME" = "mars-opportunity-id" ]
    then
        OPPOR_ID_VAR=$TAGVALUE
        if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
        then
            sed -i 's/NULL_OPPOR_ID//g' $FLAT_FILE 
        else
            sed -i 's/NULL_OPPOR_ID/'$OPPOR_ID_VAR'/g' $FLAT_FILE 
            echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE 
        fi

    else
        if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
        then
            echo ${CUST_ID_VAR}${PIPE}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
        else
            echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE 
        fi
    fi
done < INPUT_FILE

我正在使用上面的代码读取 50K 条记录,如下所示 2 个记录示例,每条记录以 START_OF_REC 开头。

我写了一个脚本,但它需要很长时间才能完成 50K 条记录。

我正在寻找运行速度更快的 bash 脚本。

INPUT_FILE

START_OF_REC>START 
trigger>SalesLeadCreated 
message-sent-at-ts>2015-01-27T00:00.08
bd-cust-id>01234 
mars-opportunity-id>2-BFGCMQ5
mars-activity-id>2-BFGCMPZ
lead-type>AccountOpen 
media-ad-code>WWW 
lead-action-code>completed 
START_OF_REC>START  
trigger>SalesLeadCreated  
message-sent-at-ts>2015-01-27T00:00.10 
bd-cust-id>054671 
mars-opportunity-id>2-BFGC39C
mars-activity-id>2-BFGC396
lead-type>AccountOpen    
media-ad-code>WWW    `enter code here`
lead-action-code>saved    

预期输出

bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value    
01234|2-BFGCMQ5|1|trigger|SalesLeadCreated    
01234|2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08   
01234|2-BFGCMQ5|1|bd-cust-id|01234    
01234|2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5   
01234|2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ    
01234|2-BFGCMQ5|1|lead-type|AccountOpen
01234|2-BFGCMQ5|1|media-ad-code|WWW
01234|2-BFGCMQ5|1|lead-action-code|completed
054671|2-BFGC39C|2|trigger|SalesLeadCreated
054671|2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10
054671|2-BFGC39C|2|bd-cust-id|054671 
054671|2-BFGC39C|2|mars-opportunity-id|2-BFGC39C 
054671|2-BFGC39C|2|mars-activity-id|2-BFGC396
054671|2-BFGC39C|2|lead-type|AccountOpen
054671|2-BFGC39C|2|media-ad-code|WWW
054671|2-BFGC39C|2|lead-action-code|completed
awk -F ">" -v OFS="|" '
    BEGIN { print "bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value" }
    function output() {
        sqlid++
        custid = data["bd-cust-id"]
        oppid  = data["mars-opportunity-id"]
        for (key in data)
            print custid, oppid, sqlid, key, data[key]
        delete data
    }
     == "START_OF_REC" { if (NR > 1) output(); next }
    { data[] =  }
    END { output() }
' INPUT_FILE 
bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value
01234 |2-BFGCMQ5|1|bd-cust-id|01234 
01234 |2-BFGCMQ5|1|trigger|SalesLeadCreated 
01234 |2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ
01234 |2-BFGCMQ5|1|lead-action-code|completed 
01234 |2-BFGCMQ5|1|lead-type|AccountOpen 
01234 |2-BFGCMQ5|1|media-ad-code|WWW 
01234 |2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08
01234 |2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5
054671 |2-BFGC39C|2|bd-cust-id|054671 
054671 |2-BFGC39C|2|trigger|SalesLeadCreated  
054671 |2-BFGC39C|2|mars-activity-id|2-BFGC396
054671 |2-BFGC39C|2|lead-action-code|saved    
054671 |2-BFGC39C|2|lead-type|AccountOpen    
054671 |2-BFGC39C|2|media-ad-code|WWW    `enter code here`
054671 |2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10 
054671 |2-BFGC39C|2|mars-opportunity-id|2-BFGC39C

这些空格是由于输入文件中的尾随空格造成的。

我假设 SQL_ID 只是记录的 运行 计数。