使用 REGEX 将 .txt 日志文件数据提取输出到 CSV
.txt Log File Data Extraction Output to CSV with REGEX
我之前问过这个问题,LotPings 给出了完美的结果。当与用户交谈时,这涉及到我一开始只得到了一半的信息!
现在确切地知道需要什么我将再次解释该场景...
注意事项:
终端始终是 A 后跟 3 位数字,即 A123
用户 ID 在日志文件的顶部,只出现一次,始终以 89 开头,长度为六位数字。该行将始终开始 SELECTED FOR OPERATOR 89XXXX
文件中有两个日期模式(一个是搜索日期,另一个是DOB),每个都需要提取到单独的列中。并非所有记录都有 DOB,有些只有年份。
询问者并不总是以 'C' 开头,需要整个行。
搜索结果总是有'Enquiry'然后提取
这是日志文件
L TRANSACTIONS LOGGED FROM 01/05/2018 0001 TO 31/05/2018 2359
SELECTED FOR OPERATOR 891234
START TERMINAL USER ENQUIRER TERMINAL IP
========================================================================================================================
01/05/18 1603 A555 CART87565 46573 RBCO NPC SERVICES GW/10/0043
SEARCH ENQUIRY RECORD NO : S48456/06P CHAPTER CODE =
RECORD DISPLAYED : S48853/98D
PRINT REQUESTED : SINGLE RECORD
========================================================================================================================
03/05/18 1107 A555 CERT16574 BTD/54/1786 16475
REF ENQUIRY DHF ID : 58/94710W CHAPTER CODE =
RECORD DISPLAYED : S585988/84H
========================================================================================================================
24/05/18 1015 A555 CERT15473 19625 CBRS DDS SERVICES NM/18/0199
IMAGE ENQUIRY NAME : TREVOR SMITH CHAPTER CODE =
DATE OF BIRTH : / /1957
========================================================================================================================
24/05/18 1025 A555 CERT15473 15325 CBRS DDS SERVICES NM/12/0999
REF ENQUIRY DDS ID : 04/102578R CHAPTER CODE =
========================================================================================================================
这里是日志文件的例子以及需要提取的内容和在什么下面header。
像这样的 CSV
PowerShell 脚本 LotPings 完美运行,我只需要从顶行提取用户 ID,以说明并非所有具有 DOB 的记录并且存在不止一种类型的查询,即参考查询、搜索查询, 图片查询.
$FileIn = '.\SO_51209341_data.txt'
$TodayCsv = '.\SO_51209341_data.csv'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<User>C[A-Z0-9]+) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+SEARCH REF\s+NAME : (?<Enquiry>.+?) (PAGE|CHAPTER) CODE ='
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$Csv = ForEach($Section in $Sections){
$Row= @{} | Select-Object Date, Time, Terminal, User, Enquirer, Enquiry, DOB
$Cnt = 0
if ($Section -match $RE1) {
++$Cnt
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.User = $Matches.User
$Row.Enquirer = $Matches.Enquirer.Trim()
}
if ($Section -match $RE2) {
++$Cnt
$Row.Enquiry = $Matches.Enquiry
}
if ($Section -match $RE3){
++$Cnt
$Row.DOB = $Matches.DOB
}
if ($Cnt -eq 3) {$Row}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation
有了如此精确的数据,第一个答案可能是:
## Q:\Test18\SO_51311417.ps1
$FileIn = '.\SO_51311417_data.txt'
$TodayCsv = '.\SO_51311417_data.csv'
$RE0 = [RegEx]'SELECTED FOR OPERATOR\s+(?<UserID>\d{6})'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+(SEARCH|REF|IMAGE) ENQUIRY\s+(?<SearchResult>.+?)\s+(PAGE|CHAPTER) CODE'
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$UserID = "n/a"
$Csv = ForEach($Section in $Sections){
If ($Section -match $RE0){
$UserID = $Matches.UserID
} Else {
$Row= @{} | Select-Object Date,Time,Terminal,UserID,Enquirer,SearchResult,DOB
$Cnt = 0
If ($Section -match $RE1){
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.Enquirer = $Matches.Enquirer.Trim()
$Row.UserID = $UserID
}
If ($Section -match $RE2){
$Row.SearchResult = $Matches.SearchResult
}
If ($Section -match $RE3){
$Row.DOB = $Matches.DOB
}
$Row
}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation
示例输出
Date Time Terminal UserID Enquirer SearchResult DOB
---- ---- -------- ------ -------- ------------ ---
01/05/18 1603 A555 891234 CART87565 46573 RBCO NPC SERVICES GW/10/0043 RECORD NO : S48456/06P
03/05/18 1107 A555 891234 CERT16574 BTD/54/1786 16475 DHF ID : 58/94710W
24/05/18 1015 A555 891234 CERT15473 19625 CBRS DDS SERVICES NM/18/0199 NAME : TREVOR SMITH / /1957
24/05/18 1025 A555 891234 CERT15473 15325 CBRS DDS SERVICES NM/12/0999 DDS ID : 04/102578R
我之前问过这个问题,LotPings 给出了完美的结果。当与用户交谈时,这涉及到我一开始只得到了一半的信息!
现在确切地知道需要什么我将再次解释该场景...
注意事项:
终端始终是 A 后跟 3 位数字,即 A123
用户 ID 在日志文件的顶部,只出现一次,始终以 89 开头,长度为六位数字。该行将始终开始 SELECTED FOR OPERATOR 89XXXX
文件中有两个日期模式(一个是搜索日期,另一个是DOB),每个都需要提取到单独的列中。并非所有记录都有 DOB,有些只有年份。
询问者并不总是以 'C' 开头,需要整个行。
搜索结果总是有'Enquiry'然后提取
这是日志文件
L TRANSACTIONS LOGGED FROM 01/05/2018 0001 TO 31/05/2018 2359 SELECTED FOR OPERATOR 891234 START TERMINAL USER ENQUIRER TERMINAL IP ======================================================================================================================== 01/05/18 1603 A555 CART87565 46573 RBCO NPC SERVICES GW/10/0043 SEARCH ENQUIRY RECORD NO : S48456/06P CHAPTER CODE = RECORD DISPLAYED : S48853/98D PRINT REQUESTED : SINGLE RECORD ======================================================================================================================== 03/05/18 1107 A555 CERT16574 BTD/54/1786 16475 REF ENQUIRY DHF ID : 58/94710W CHAPTER CODE = RECORD DISPLAYED : S585988/84H ======================================================================================================================== 24/05/18 1015 A555 CERT15473 19625 CBRS DDS SERVICES NM/18/0199 IMAGE ENQUIRY NAME : TREVOR SMITH CHAPTER CODE = DATE OF BIRTH : / /1957 ======================================================================================================================== 24/05/18 1025 A555 CERT15473 15325 CBRS DDS SERVICES NM/12/0999 REF ENQUIRY DDS ID : 04/102578R CHAPTER CODE = ========================================================================================================================
这里是日志文件的例子以及需要提取的内容和在什么下面header。
像这样的 CSV
PowerShell 脚本 LotPings 完美运行,我只需要从顶行提取用户 ID,以说明并非所有具有 DOB 的记录并且存在不止一种类型的查询,即参考查询、搜索查询, 图片查询.
$FileIn = '.\SO_51209341_data.txt'
$TodayCsv = '.\SO_51209341_data.csv'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<User>C[A-Z0-9]+) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+SEARCH REF\s+NAME : (?<Enquiry>.+?) (PAGE|CHAPTER) CODE ='
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$Csv = ForEach($Section in $Sections){
$Row= @{} | Select-Object Date, Time, Terminal, User, Enquirer, Enquiry, DOB
$Cnt = 0
if ($Section -match $RE1) {
++$Cnt
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.User = $Matches.User
$Row.Enquirer = $Matches.Enquirer.Trim()
}
if ($Section -match $RE2) {
++$Cnt
$Row.Enquiry = $Matches.Enquiry
}
if ($Section -match $RE3){
++$Cnt
$Row.DOB = $Matches.DOB
}
if ($Cnt -eq 3) {$Row}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation
有了如此精确的数据,第一个答案可能是:
## Q:\Test18\SO_51311417.ps1
$FileIn = '.\SO_51311417_data.txt'
$TodayCsv = '.\SO_51311417_data.csv'
$RE0 = [RegEx]'SELECTED FOR OPERATOR\s+(?<UserID>\d{6})'
$RE1 = [RegEx]'(?m)(?<Date>\d{2}\/\d{2}\/\d{2}) (?<Time>\d{4}) +(?<Terminal>A\d{3}) +(?<Enquirer>.*)$'
$RE2 = [RegEx]'\s+(SEARCH|REF|IMAGE) ENQUIRY\s+(?<SearchResult>.+?)\s+(PAGE|CHAPTER) CODE'
$RE3 = [RegEx]'\s+DATE OF BIRTH : (?<DOB>[0-9 /]+?/\d{4})'
$Sections = (Get-Content $FileIn -Raw) -split "={30,}`r?`n" -ne ''
$UserID = "n/a"
$Csv = ForEach($Section in $Sections){
If ($Section -match $RE0){
$UserID = $Matches.UserID
} Else {
$Row= @{} | Select-Object Date,Time,Terminal,UserID,Enquirer,SearchResult,DOB
$Cnt = 0
If ($Section -match $RE1){
$Row.Date = $Matches.Date
$Row.Time = $Matches.Time
$Row.Terminal = $Matches.Terminal
$Row.Enquirer = $Matches.Enquirer.Trim()
$Row.UserID = $UserID
}
If ($Section -match $RE2){
$Row.SearchResult = $Matches.SearchResult
}
If ($Section -match $RE3){
$Row.DOB = $Matches.DOB
}
$Row
}
}
$csv | Format-Table
$csv | Export-Csv $Todaycsv -NoTypeInformation
示例输出
Date Time Terminal UserID Enquirer SearchResult DOB
---- ---- -------- ------ -------- ------------ ---
01/05/18 1603 A555 891234 CART87565 46573 RBCO NPC SERVICES GW/10/0043 RECORD NO : S48456/06P
03/05/18 1107 A555 891234 CERT16574 BTD/54/1786 16475 DHF ID : 58/94710W
24/05/18 1015 A555 891234 CERT15473 19625 CBRS DDS SERVICES NM/18/0199 NAME : TREVOR SMITH / /1957
24/05/18 1025 A555 891234 CERT15473 15325 CBRS DDS SERVICES NM/12/0999 DDS ID : 04/102578R