如何将每组记录的名称和值附加到多条记录中
How to append name and values of each set of records into multiple records
使用下面的代码,但需要更多时间阅读。
while read TAG
do
TAGNAME=$(echo $TAG | cut -d '>' -f1)
TAGVALUE=$(echo $TAG | cut -d '>' -f2)
if [ "$TAGNAME" = "START_OF_REC" ]
then
CNT_VAR=`expr $CNT_VAR + 1`
DERIVED_ID=${DATE_VAR}${CNT_VAR}
CUST_ID_VAR="NULL_CUST_ID"
OPPOR_ID_VAR="NULL_OPPOR_ID"
elif [ "$TAGNAME" = "bd-cust-id" ]
then
CUST_ID_VAR=$TAGVALUE
sed -i 's/NULL_CUST_ID/'$CUST_ID_VAR'/g' $FLAT_FILE
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
elif [ "$TAGNAME" = "mars-opportunity-id" ]
then
OPPOR_ID_VAR=$TAGVALUE
if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
then
sed -i 's/NULL_OPPOR_ID//g' $FLAT_FILE
else
sed -i 's/NULL_OPPOR_ID/'$OPPOR_ID_VAR'/g' $FLAT_FILE
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
fi
else
if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
then
echo ${CUST_ID_VAR}${PIPE}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
else
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
fi
fi
done < INPUT_FILE
我正在使用上面的代码读取 50K 条记录,如下所示 2 个记录示例,每条记录以 START_OF_REC 开头。
我写了一个脚本,但它需要很长时间才能完成 50K 条记录。
我正在寻找运行速度更快的 bash 脚本。
INPUT_FILE
START_OF_REC>START
trigger>SalesLeadCreated
message-sent-at-ts>2015-01-27T00:00.08
bd-cust-id>01234
mars-opportunity-id>2-BFGCMQ5
mars-activity-id>2-BFGCMPZ
lead-type>AccountOpen
media-ad-code>WWW
lead-action-code>completed
START_OF_REC>START
trigger>SalesLeadCreated
message-sent-at-ts>2015-01-27T00:00.10
bd-cust-id>054671
mars-opportunity-id>2-BFGC39C
mars-activity-id>2-BFGC396
lead-type>AccountOpen
media-ad-code>WWW `enter code here`
lead-action-code>saved
预期输出
bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value
01234|2-BFGCMQ5|1|trigger|SalesLeadCreated
01234|2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08
01234|2-BFGCMQ5|1|bd-cust-id|01234
01234|2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5
01234|2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ
01234|2-BFGCMQ5|1|lead-type|AccountOpen
01234|2-BFGCMQ5|1|media-ad-code|WWW
01234|2-BFGCMQ5|1|lead-action-code|completed
054671|2-BFGC39C|2|trigger|SalesLeadCreated
054671|2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10
054671|2-BFGC39C|2|bd-cust-id|054671
054671|2-BFGC39C|2|mars-opportunity-id|2-BFGC39C
054671|2-BFGC39C|2|mars-activity-id|2-BFGC396
054671|2-BFGC39C|2|lead-type|AccountOpen
054671|2-BFGC39C|2|media-ad-code|WWW
054671|2-BFGC39C|2|lead-action-code|completed
awk -F ">" -v OFS="|" '
BEGIN { print "bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value" }
function output() {
sqlid++
custid = data["bd-cust-id"]
oppid = data["mars-opportunity-id"]
for (key in data)
print custid, oppid, sqlid, key, data[key]
delete data
}
== "START_OF_REC" { if (NR > 1) output(); next }
{ data[] = }
END { output() }
' INPUT_FILE
bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value
01234 |2-BFGCMQ5|1|bd-cust-id|01234
01234 |2-BFGCMQ5|1|trigger|SalesLeadCreated
01234 |2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ
01234 |2-BFGCMQ5|1|lead-action-code|completed
01234 |2-BFGCMQ5|1|lead-type|AccountOpen
01234 |2-BFGCMQ5|1|media-ad-code|WWW
01234 |2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08
01234 |2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5
054671 |2-BFGC39C|2|bd-cust-id|054671
054671 |2-BFGC39C|2|trigger|SalesLeadCreated
054671 |2-BFGC39C|2|mars-activity-id|2-BFGC396
054671 |2-BFGC39C|2|lead-action-code|saved
054671 |2-BFGC39C|2|lead-type|AccountOpen
054671 |2-BFGC39C|2|media-ad-code|WWW `enter code here`
054671 |2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10
054671 |2-BFGC39C|2|mars-opportunity-id|2-BFGC39C
这些空格是由于输入文件中的尾随空格造成的。
我假设 SQL_ID 只是记录的 运行 计数。
使用下面的代码,但需要更多时间阅读。
while read TAG
do
TAGNAME=$(echo $TAG | cut -d '>' -f1)
TAGVALUE=$(echo $TAG | cut -d '>' -f2)
if [ "$TAGNAME" = "START_OF_REC" ]
then
CNT_VAR=`expr $CNT_VAR + 1`
DERIVED_ID=${DATE_VAR}${CNT_VAR}
CUST_ID_VAR="NULL_CUST_ID"
OPPOR_ID_VAR="NULL_OPPOR_ID"
elif [ "$TAGNAME" = "bd-cust-id" ]
then
CUST_ID_VAR=$TAGVALUE
sed -i 's/NULL_CUST_ID/'$CUST_ID_VAR'/g' $FLAT_FILE
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
elif [ "$TAGNAME" = "mars-opportunity-id" ]
then
OPPOR_ID_VAR=$TAGVALUE
if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
then
sed -i 's/NULL_OPPOR_ID//g' $FLAT_FILE
else
sed -i 's/NULL_OPPOR_ID/'$OPPOR_ID_VAR'/g' $FLAT_FILE
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
fi
else
if [ "$OPPOR_ID_VAR" = "EMPTY_VAL" ]
then
echo ${CUST_ID_VAR}${PIPE}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
else
echo ${CUST_ID_VAR}${PIPE}${OPPOR_ID_VAR}${PIPE}${DERIVED_ID}${PIPE}${TAGNAME}${PIPE}${TAGVALUE} >> $FLAT_FILE
fi
fi
done < INPUT_FILE
我正在使用上面的代码读取 50K 条记录,如下所示 2 个记录示例,每条记录以 START_OF_REC 开头。
我写了一个脚本,但它需要很长时间才能完成 50K 条记录。
我正在寻找运行速度更快的 bash 脚本。
INPUT_FILE
START_OF_REC>START
trigger>SalesLeadCreated
message-sent-at-ts>2015-01-27T00:00.08
bd-cust-id>01234
mars-opportunity-id>2-BFGCMQ5
mars-activity-id>2-BFGCMPZ
lead-type>AccountOpen
media-ad-code>WWW
lead-action-code>completed
START_OF_REC>START
trigger>SalesLeadCreated
message-sent-at-ts>2015-01-27T00:00.10
bd-cust-id>054671
mars-opportunity-id>2-BFGC39C
mars-activity-id>2-BFGC396
lead-type>AccountOpen
media-ad-code>WWW `enter code here`
lead-action-code>saved
预期输出
bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value
01234|2-BFGCMQ5|1|trigger|SalesLeadCreated
01234|2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08
01234|2-BFGCMQ5|1|bd-cust-id|01234
01234|2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5
01234|2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ
01234|2-BFGCMQ5|1|lead-type|AccountOpen
01234|2-BFGCMQ5|1|media-ad-code|WWW
01234|2-BFGCMQ5|1|lead-action-code|completed
054671|2-BFGC39C|2|trigger|SalesLeadCreated
054671|2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10
054671|2-BFGC39C|2|bd-cust-id|054671
054671|2-BFGC39C|2|mars-opportunity-id|2-BFGC39C
054671|2-BFGC39C|2|mars-activity-id|2-BFGC396
054671|2-BFGC39C|2|lead-type|AccountOpen
054671|2-BFGC39C|2|media-ad-code|WWW
054671|2-BFGC39C|2|lead-action-code|completed
awk -F ">" -v OFS="|" '
BEGIN { print "bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value" }
function output() {
sqlid++
custid = data["bd-cust-id"]
oppid = data["mars-opportunity-id"]
for (key in data)
print custid, oppid, sqlid, key, data[key]
delete data
}
== "START_OF_REC" { if (NR > 1) output(); next }
{ data[] = }
END { output() }
' INPUT_FILE
bd-cust-id|mars-opportunity-id|SQL_ID|Tag_name|Tag_Value
01234 |2-BFGCMQ5|1|bd-cust-id|01234
01234 |2-BFGCMQ5|1|trigger|SalesLeadCreated
01234 |2-BFGCMQ5|1|mars-activity-id|2-BFGCMPZ
01234 |2-BFGCMQ5|1|lead-action-code|completed
01234 |2-BFGCMQ5|1|lead-type|AccountOpen
01234 |2-BFGCMQ5|1|media-ad-code|WWW
01234 |2-BFGCMQ5|1|message-sent-at-ts|2015-01-27T00:00.08
01234 |2-BFGCMQ5|1|mars-opportunity-id|2-BFGCMQ5
054671 |2-BFGC39C|2|bd-cust-id|054671
054671 |2-BFGC39C|2|trigger|SalesLeadCreated
054671 |2-BFGC39C|2|mars-activity-id|2-BFGC396
054671 |2-BFGC39C|2|lead-action-code|saved
054671 |2-BFGC39C|2|lead-type|AccountOpen
054671 |2-BFGC39C|2|media-ad-code|WWW `enter code here`
054671 |2-BFGC39C|2|message-sent-at-ts|2015-01-27T00:00.10
054671 |2-BFGC39C|2|mars-opportunity-id|2-BFGC39C
这些空格是由于输入文件中的尾随空格造成的。
我假设 SQL_ID 只是记录的 运行 计数。