GAWK - 寻找相应的动作

GAWK - Finding a corresponding Action

我有一个大文件,其中包含与通过系统的包裹相关的连续添加的数据行,该文件会在一天中的每个操作发生和构建时添加到其中。我需要做的是每分钟检查文件并检查尚未到达那里的项目,即没有 "DISCHARGE_VERIFIED"。下面的示例是一个完整的记录,但由于同时处理了数千个项目,因此它可以分布在整个文件中。

170209 043314 0887 DE(N) ItemHandler.ItemLog event=<ITEM_AT_INDUCTION>, *********************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, inductionId=<3: IU04>, position=<sorter#0.induction#3: IU04>, itemRevisionNumber=<0> ##[
170209 043314 0888 DE(N) ItemHandler.ItemLog event=<SET_ITEM_ID>, ***************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, itemRevisionNumber=<0> ##[
170209 043317 0314 DE(N) ItemHandler.ItemLog event=<SCANNER_RESULT>, ************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodeCount=<3>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600004054211864>, type=<C0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<1910456693>, type=<A0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<2LAU2000+52000000>, type=<C0>, result=<OK>, ccType=<>)]>, codeSource=<ohscan>, scannerId=<4001: IU04-SCAN02>, scannerStatus=<0>, position=<sorter#0.scanner#4001: IU04-SCAN02>, itemRevisionNumber=<2> ##[
170209 043317 0315 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600004054211864>, type=<C0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<1910456693>, type=<A0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<2LAU2000+52000000>, type=<C0>, result=<OK>, ccType=<>)]>, ccReason=<SCANNER_DATA_ADDED>, PreviousccResult=<>, sortSchemeId=<-1>, sortSchemeName=<>, logicalDestination=<>, BatchCountItem=<true>, collectionId=<-1>, goodsId=<>, position=<sorter#0.scanner#4001: IU04-SCAN02>, dynamicDataCount=<0>, dynamicData=<{}>, carrierId=<159>, carrierCount=<-1>, itemRevisionNumber=<2> ##[
170209 043317 0322 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, PendingccResult=<OK>, Pendingstrategy=<notSpecified>, PendingchuteGroup=<[3000]: Parked0>, PendingNotChutedestinationId=<-1>, PendingsortSchemeId=<-1>, PendingsortSchemeName=<>, PendinglogicalDestination=<>, PendinggoodsId=<>, PendingBatchCountItem=<true>, PendingcollectionId=<-1>, position=<sorter#0.scanner#4001: IU04-SCAN02>, dynamicDataCount=<0>, dynamicData=<{}>, itemRevisionNumber=<4> ##[
170209 043317 0322 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[3000]: Parked0>, Pendingstrategy=<notSpecified>, CscdestinationId=<-1: UnDef>, CmcdestinationId=<-1: UnDef>, position=<sorter#0.scanner#4001: IU04-SCAN02>, itemRevisionNumber=<4> ##[
170209 043317 0484 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4001: IU04-SCAN02>, chuteListStartPoint=<-1>, itemRevisionNumber=<6> ##[
170209 043317 0486 DE(N) ItemHandler.ItemLog event=<RECONVERT>, *****************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForData>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4001: IU04-SCAN02>, chuteListStartPoint=<-1>, itemRevisionNumber=<7> ##[
170209 043317 0486 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600004054211864>, type=<C0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<1910456693>, type=<A0>, result=<OK>, ccType=<>), ProxyWrapperBarcode(barcode=<2LAU2000+52000000>, type=<C0>, result=<OK>, ccType=<>)]>, ccReason=<SCANNER_DATA_ADDED>, PreviousccResult=<OK>, sortSchemeId=<-1>, sortSchemeName=<>, logicalDestination=<>, BatchCountItem=<true>, collectionId=<-1>, goodsId=<>, position=<sorter#0.scanner#4001: IU04-SCAN02>, dynamicDataCount=<0>, dynamicData=<{}>, carrierId=<159>, carrierCount=<-1>, itemRevisionNumber=<7> ##[
170209 043317 0492 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, PendingccResult=<OK>, Pendingstrategy=<priority>, PendingchuteGroup=<[121]: FLY425>, PendingNotChutedestinationId=<-1>, PendingsortSchemeId=<-1>, PendingsortSchemeName=<>, PendinglogicalDestination=<FLY425>, PendinggoodsId=<>, PendingBatchCountItem=<true>, PendingcollectionId=<-1>, position=<sorter#0.scanner#4001: IU04-SCAN02>, dynamicDataCount=<0>, dynamicData=<{}>, itemRevisionNumber=<9> ##[
170209 043317 0492 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[121]: FLY425>, Pendingstrategy=<priority>, CscdestinationId=<3000: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4001: IU04-SCAN02>, itemRevisionNumber=<9> ##[
170209 043317 0666 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<121: FLY425>, chuteGroup=<[121]: FLY425>, CmcdestinationId=<121: FLY425>, position=<sorter#0.scanner#4001: IU04-SCAN02>, chuteListStartPoint=<121>, itemRevisionNumber=<11> ##[
170209 043317 0667 DE(N) ItemHandler.ItemLog event=<ITEM_INDUCTED>, *************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectHeadingForChute>, inductionId=<3: IU04>, inductionMode=<SCANNER>, inductStatus=<NORMAL_ITEM>, carrierId=<159>, carrierCount=<1>, CmcdestinationId=<121: FLY425>, position=<sorter#0: MS01>, itemRevisionNumber=<12> ##[
170209 043327 0379 DE(N) ItemHandler.ItemLog event=<ITEM_DISCHARGED>, ***********************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectHeadingForChute>, CscdestinationId=<121: FLY425>, chuteGroup=<[121]: FLY425>, CmcdestinationId=<121: FLY425>, ccResult=<OK>, sortSchemeId=<-1>, logicalDestination=<FLY425>, goodsId=<>, carrierId=<159>, length=<-1 mm>, width=<-1 mm>, height=<-1 mm>, volume=<-1 mm3>, position=<sorter#0.chute#121: FLY425>, itemRevisionNumber=<13> ##[
170209 043339 0765 DE(N) ItemHandler.ItemLog event=<DISCHARGE_VERIFIED>, ********************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectWaitForVerification>, CscdestinationId=<121: FLY425>, chuteGroup=<[121]: FLY425>, CmcdestinationId=<121: FLY425>, position=<sorter#0.chute#121: FLY425>, itemRevisionNumber=<14> ##[

一旦某个项目到达它的滑槽,它就会被表示为 "DISCHARGE_VERIFIED" 所以我需要能够从日志文件中提取出具有 "itemId" 但带有在那个时间点没有相应的 "DISCHARGE_VERIFIED" 并在元素 "CmcdestinationId=<121: FLY425>" 中显示它正前往的滑槽,该滑槽位于行 "ITEM_INDUCTED" 中,见下文

170209 043317 0667 DE(N) ItemHandler.ItemLog event=<ITEM_INDUCTED>, *************************, itemId=<12562305>, globalId=<12562305>, cmcIndex=<750>, sorter=<0: MS01>, state=<CSC: ProjectHeadingForChute>, inductionId=<3: IU04>, inductionMode=<SCANNER>, inductStatus=<NORMAL_ITEM>, carrierId=<159>, carrierCount=<1>, CmcdestinationId=<121: FLY425>, position=<sorter#0: MS01>, itemRevisionNumber=<12> ##[

我正在使用 gawk 在 windows 机器上执行此操作,因此存在所有常见的报价问题。

如有任何帮助,我们将不胜感激

谢谢

菲尔

我想我有一个有效的脚本。例如,将其另存为 inducted.awk,这样您就可以跳过您描述的所有 windows 引用问题。

inducted.awk:

/ITEM_INDUCTED/ {
    match([=10=], /itemId=<([^>]+)>/, ary1)
    match([=10=], /CmcdestinationId=<([^>]+)>/, ary2)
    dest[ary1[1]] = ary2[1]
}

/DISCHARGE_VERIFIED/ {
    match([=10=], /itemId=<([^>]+)>/, ary1)
    delete dest[ary1[1]]
}

END {
    for (id in dest) {
        print id " -- " dest[id]
    }
}

所以基本上,当找到带有 ITEM_INDUCTED 的行时,它会将 itemid 和目标添加到数组中。

当一行有 DISCHARGED_VERIFIED 时,该信息将从数组中删除。

在脚本的末尾,剩余的入职但未出院的 ID 会打印出他们的目的地。

要执行它:

gawk -f .\inducted.awk large_log_file

如果 awk 文件与 large_log_file 不在同一文件夹中,请为其指定完整路径。