多行 *非* 与 attoparsec 匹配
Multi-line *non* match with attoparsec
我正在尝试解析 (PostgreSQL) 日志,这些日志可以包含多行条目。
2016-01-01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3
所以 - 使用 Perl 或 Python 脚本,我只是抓住下一行,如果它不是以时间戳开头,则将其附加到上一个日志条目。将 attoparsec
连接到 io-streams
来解决这个问题的明智方法是什么?我显然想用 lookAhead
做点什么,但未能匹配时间戳,但我的大脑只是遗漏了一些东西。
不 - 仍然看不到。我已经剥夺了我所拥有的。解析一行很容易。我不知道如何解析 "up to" 另一种解析模式 - 我可以看到我可以使用的 lookAhead 函数,但我看不出它如何适合应用 "not" 条件。
我也看不出怎么匹配。完全有可能是我脑子坏了。
{-# LANGUAGE OverloadedStrings #-}
module DummyParser (
LogStatement (..), parseLogLine
-- and, so we can test it...
, LogTimestamp , parseTimestamp
, parseSqlStmt
, newLineAndTimestamp
) where
{- we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
'c';
TIME003 statement: SELECT 3;
-}
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B
type LogTimestamp = Int
data LogStatement = LogStatement {
l_ts :: LogTimestamp
,l_sql :: String
} deriving (Eq, Show)
restOfLine :: Parser B.ByteString
restOfLine = do
rest <- takeTill (== '\n')
isEOF <- atEnd
if isEOF then
return rest
else
(char '\n') >> return rest
-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp = do
string "TIME"
digits <- count 3 digit
return (read digits)
-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
string "statement: "
-- How can I match until the next timestamp?
sql <- restOfLine
return (B.unpack sql)
newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp
spaces :: Parser ()
spaces = do
skipWhile (== ' ')
-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
let ls = LogStatement log_ts log_sql
return ls
编辑:所以,这就是我最终得到的结果,感谢 arrowd 的帮助
isTimestampNext = lookAhead parseTimestamp *> pure()
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
return ls
我在许多 attoparsec 问题上分享的组合器:
notFollowedBy p = p >> fail "not followed by"
您的解决方案类似于
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
newlineLeftover <- ((notFollowedBy parseTimestamp) *> parseSqlStmt) <|> pure ""
let ls = LogStatement log_ts (log_sql ++ newlineLeftover
return ls
*>
的右手 newlineLeftOver
表达式需要更多的工作,我想,但总体思路是这样的。
我正在尝试解析 (PostgreSQL) 日志,这些日志可以包含多行条目。
2016-01-01 01:01:01 entry1
2016-01-01 01:01:02 entry2a
entry2b
2016-01-01 01:01:03 entry3
所以 - 使用 Perl 或 Python 脚本,我只是抓住下一行,如果它不是以时间戳开头,则将其附加到上一个日志条目。将 attoparsec
连接到 io-streams
来解决这个问题的明智方法是什么?我显然想用 lookAhead
做点什么,但未能匹配时间戳,但我的大脑只是遗漏了一些东西。
不 - 仍然看不到。我已经剥夺了我所拥有的。解析一行很容易。我不知道如何解析 "up to" 另一种解析模式 - 我可以看到我可以使用的 lookAhead 函数,但我看不出它如何适合应用 "not" 条件。
我也看不出怎么匹配。完全有可能是我脑子坏了。
{-# LANGUAGE OverloadedStrings #-}
module DummyParser (
LogStatement (..), parseLogLine
-- and, so we can test it...
, LogTimestamp , parseTimestamp
, parseSqlStmt
, newLineAndTimestamp
) where
{- we want to parse...
TIME001 statement: SELECT true;
TIME002 statement: SELECT 'b',
'c';
TIME003 statement: SELECT 3;
-}
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B
type LogTimestamp = Int
data LogStatement = LogStatement {
l_ts :: LogTimestamp
,l_sql :: String
} deriving (Eq, Show)
restOfLine :: Parser B.ByteString
restOfLine = do
rest <- takeTill (== '\n')
isEOF <- atEnd
if isEOF then
return rest
else
(char '\n') >> return rest
-- e.g. TIME001
parseTimestamp :: Parser LogTimestamp
parseTimestamp = do
string "TIME"
digits <- count 3 digit
return (read digits)
-- e.g. statement: SELECT 1
parseSqlStmt :: Parser String
parseSqlStmt = do
string "statement: "
-- How can I match until the next timestamp?
sql <- restOfLine
return (B.unpack sql)
newLineAndTimestamp :: Parser LogTimestamp
newLineAndTimestamp = (char '\n') *> parseTimestamp
spaces :: Parser ()
spaces = do
skipWhile (== ' ')
-- e.g. TIME001 statement: SELECT * FROM schema.table;
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
let ls = LogStatement log_ts log_sql
return ls
编辑:所以,这就是我最终得到的结果,感谢 arrowd 的帮助
isTimestampNext = lookAhead parseTimestamp *> pure()
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
extraLines <- manyTill restOfLine (endOfInput <|> isTimestampNext)
let ls = LogStatement log_ts (log_sql ++ (B.unpack $ B.concat extraLines))
return ls
我在许多 attoparsec 问题上分享的组合器:
notFollowedBy p = p >> fail "not followed by"
您的解决方案类似于
parseLogLine :: Parser LogStatement
parseLogLine = do
log_ts <- parseTimestamp
spaces
log_sql <- parseSqlStmt
newlineLeftover <- ((notFollowedBy parseTimestamp) *> parseSqlStmt) <|> pure ""
let ls = LogStatement log_ts (log_sql ++ newlineLeftover
return ls
*>
的右手 newlineLeftOver
表达式需要更多的工作,我想,但总体思路是这样的。