我的 Megaparsec 解析器卡住了,ghci 调试也无济于事
My Megaparsec parser gets stuck and ghci debugging isn't helping either
我已经完成了 this Megaparsec tutorial,现在我正在尝试基于它编写我自己的解析器。我想为汇编语言编写一个简单的解析器:
Label: lda [=10=]ffe
sta %10100110
push , ,
这是我使用的简单数据类型:
-- Syntax.hs
module Syntax where
import Data.Int
-- |A program is made up of one or more source lines
type Program = [SourceLine]
data SourceLine = SourceLine
{ label :: Maybe String -- ^ Each line may contain a label
, instr :: Maybe String -- ^ This can either be an opcode or an assembler directive
, operand :: Maybe String -- ^ The opcode/instruction may need operand(s)
}
deriving (Show, Eq)
解析器代码如下:
--Parser.hs
module Parser where
import Syntax
import Control.Applicative (empty)
import Control.Monad (void)
import Control.Monad.Combinators.Expr
-- import Data.Scientific (toRealFloat)
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
-- | Parse a single source code line
sourceline :: Parser SourceLine
sourceline = do
l <- optional labelfield
i <- optional instrfield
o <- optional oprfield
return $ SourceLine l i o
-- TODO: forbid double underscores
-- | Parse the label field of a source line
labelfield :: Parser String
labelfield = (lexeme . try) $ do
l <- identifier
symbol ":"
return l
-- TODO: parse assembler directives starting with an elipse (.)
-- | Parse the instruction field of a source line
instrfield :: Parser String
instrfield = (lexeme . try) $ do
i <- some letterChar
return i
-- | Parse the operand field of a source line
oprfield :: Parser String
oprfield = (lexeme . try) $ do
o <- try identifier
<|> datalist
<|> number
return o
-- | Parses a legal identifier; identifiers must start with a letter
-- and my contain underscores or numbers
identifier :: Parser String
identifier = ((:) <$> letterChar <*> many (alphaNumChar <|> char '_'))
-- | Parse a list of values separated by commas (,)
datalist :: Parser String
datalist = do
x <- some datalist'
y <- number
return $ filter (/='\n') $ unlines x ++ y
datalist' :: Parser String
datalist' = try ((++) <$> number <*> (symbol ","))
-- | Parse numbers
number :: Parser String
number = try binnumber
<|> decnumber
<|> hexnumber
binnumber :: Parser String
binnumber = lexeme ((:) <$> char '%' <*> (some $ binDigitChar))
decnumber :: Parser String
decnumber = lexeme $ some digitChar
hexnumber :: Parser String
hexnumber = lexeme ((:) <$> char '$' <*> (some $ hexDigitChar))
----- Helper Function ----------------------------------------------------------
lineComment :: Parser ()
lineComment = L.skipLineComment "#"
-- eats all whitespace and newline
scn :: Parser ()
scn = L.space space1 lineComment empty
-- eats all whitespace but newline
sc :: Parser ()
sc = L.space (void $ takeWhile1P Nothing f) lineComment empty
where
f x = x == ' ' || x == '\t'
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: String -> Parser String
symbol = L.symbol sc
-- this is giving me trouble
prog :: Parser Program
prog = between scn eof (sepEndBy sourceline scn)
我已经添加了给我带来麻烦的函数。我已经为这些功能写了一些测试,这里是测试:
-- file Spec.hs
import Syntax
import Parser
import Text.Megaparsec
import Test.Hspec
import Test.Hspec.Megaparsec
import Test.QuickCheck
import Control.Exception (evaluate)
main :: IO ()
main = hspec $ do
describe "Label parsing" $ do
it "Parse empty label field" $
parse sourceline "" " " `shouldParse` SourceLine Nothing Nothing Nothing
it "Parse single character lower-case label" $
parse sourceline "" "x:" `shouldParse` SourceLine (Just "x") Nothing Nothing
it "Parse multi-character label" $
parse sourceline "" "label:" `shouldParse` SourceLine (Just "label") Nothing Nothing
it "Parse multi-character label with trailing whitespace" $
parse sourceline "" "label: " `shouldParse` SourceLine (Just "label") Nothing Nothing
it "Parse label with underscore" $
parse sourceline "" "la_bel: " `shouldParse` SourceLine (Just "la_bel") Nothing Nothing
it "Parse label with underscores and numbers" $
parse sourceline "" "l4_b3l: " `shouldParse` SourceLine (Just "l4_b3l") Nothing Nothing
describe "Label and opcode parsing" $ do
it "Parse line with label and opcode" $
parse sourceline "" "label: lda" `shouldParse` SourceLine (Just "label") (Just "lda") Nothing
it "Parse line opcode only" $
parse sourceline "" "lda" `shouldParse` SourceLine Nothing (Just "lda") Nothing
describe "Opcodes and operands parsing" $ do
it "Parse an opcode with symbol operand" $
parse sourceline "" "lda label_2" `shouldParse` SourceLine Nothing (Just "lda") (Just "label_2")
it "Parse an opcode with binary operand" $
parse sourceline "" "lda %01101" `shouldParse` SourceLine Nothing (Just "lda") (Just "%01101")
it "Parse an opcode with decimal operand" $
parse sourceline "" "lda 1234" `shouldParse` SourceLine Nothing (Just "lda") (Just "1234")
it "Parse an opcode with hexdecimal operand" $
parse sourceline "" "lda $affe34" `shouldParse` SourceLine Nothing (Just "lda") (Just "$affe34")
it "Parse a labeled opcode with symbol operand" $
parse sourceline "" "label: lda label_2" `shouldParse` SourceLine (Just "label") (Just "lda") (Just "label_2")
it "Parse a labeled opcode with binary operand" $
parse sourceline "" "labe_l: lda %01101" `shouldParse` SourceLine (Just "labe_l") (Just "lda") (Just "%01101")
it "Parse a labeled opcode with decimal operand" $
parse sourceline "" "label_2: lda 1234" `shouldParse` SourceLine (Just "label_2") (Just "lda") (Just "1234")
it "Parse a labeled opcode with hexdecimal operand" $
parse sourceline "" "l4b3l: lda $affe34" `shouldParse` SourceLine (Just "l4b3l") (Just "lda") (Just "$affe34")
describe "Operand parsing" $ do
it "Parse a value/data list with decimal values" $
parse sourceline "" "lda 12,23,23,43 " `shouldParse` SourceLine Nothing (Just "lda") (Just "12,23,23,43")
it "Parse a value/data list with binary values" $
parse sourceline "" "lda %101,%111,%000,%001 " `shouldParse` SourceLine Nothing (Just "lda") (Just "%101,%111,%000,%001")
it "Parse a value/data list with hexdecimal values" $
parse sourceline "" "lda 1,$affe,$AfF3,$c3D4 " `shouldParse` SourceLine Nothing (Just "lda") (Just "1,$affe,$AfF3,$c3D4")
it "Parse a value/data list with spaces" $
parse sourceline "" "lda 1, $affe , $AfF3,$c3D4" `shouldParse` SourceLine Nothing (Just "lda") (Just "1,$affe,$AfF3,$c3D4")
it "Parse a value/data list with spaces and mixed values" $
parse sourceline "" "lda %101, 1234 , $AfF3,$c3D4" `shouldParse` SourceLine Nothing (Just "lda") (Just "%101,1234,$AfF3,$c3D4")
-- describe "Parse multiple lines" $ do
-- it "Parse a 3-line program" $
-- parse prog "" "label1: \n lda \nsta %10011001" `shouldParse` [SourceLine (Just "label1") Nothing Nothing,
-- SourceLine Nothing (Just "lda") (Just ""),
-- SourceLine Nothing (Just "sta") (Just "%10011001")]
和汇编文件一样,我想逐行解析源代码。上面的所有测试都通过了,除了注释掉的那个。 运行 prog
in ghci
与 parseTest
产生相同的结果,它 returns 没有结果并最终崩溃:
*Main Parser Syntax Text.Megaparsec> parseTest sourceline "lda # comment ignored"
SourceLine {label = Nothing, instr = Just "lda", operand = Just ""}
*Main Parser Syntax Text.Megaparsec> parseTest prog "lda \nsta %1010"
-- crashes
我假设我在我的代码中以某种方式 ab-/overusing lexeme
从解析的字符串中删除尾随空格。我错过了什么?
只要 sourceLine
和 scn
解析器匹配,sepEndBy sourceline scn
就会继续迭代。但是,这两个解析器都可以在不消耗任何输入的情况下成功完成,因此它们将始终匹配。由于 sourceLine
的所有分支都有一个 try
,任何解析错误都会导致解析器后退并匹配无限数量的空源代码行。即使没有解析错误,达到 eof 也会产生无限多的源代码行。
我已经完成了 this Megaparsec tutorial,现在我正在尝试基于它编写我自己的解析器。我想为汇编语言编写一个简单的解析器:
Label: lda [=10=]ffe
sta %10100110
push , ,
这是我使用的简单数据类型:
-- Syntax.hs
module Syntax where
import Data.Int
-- |A program is made up of one or more source lines
type Program = [SourceLine]
data SourceLine = SourceLine
{ label :: Maybe String -- ^ Each line may contain a label
, instr :: Maybe String -- ^ This can either be an opcode or an assembler directive
, operand :: Maybe String -- ^ The opcode/instruction may need operand(s)
}
deriving (Show, Eq)
解析器代码如下:
--Parser.hs
module Parser where
import Syntax
import Control.Applicative (empty)
import Control.Monad (void)
import Control.Monad.Combinators.Expr
-- import Data.Scientific (toRealFloat)
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
-- | Parse a single source code line
sourceline :: Parser SourceLine
sourceline = do
l <- optional labelfield
i <- optional instrfield
o <- optional oprfield
return $ SourceLine l i o
-- TODO: forbid double underscores
-- | Parse the label field of a source line
labelfield :: Parser String
labelfield = (lexeme . try) $ do
l <- identifier
symbol ":"
return l
-- TODO: parse assembler directives starting with an elipse (.)
-- | Parse the instruction field of a source line
instrfield :: Parser String
instrfield = (lexeme . try) $ do
i <- some letterChar
return i
-- | Parse the operand field of a source line
oprfield :: Parser String
oprfield = (lexeme . try) $ do
o <- try identifier
<|> datalist
<|> number
return o
-- | Parses a legal identifier; identifiers must start with a letter
-- and my contain underscores or numbers
identifier :: Parser String
identifier = ((:) <$> letterChar <*> many (alphaNumChar <|> char '_'))
-- | Parse a list of values separated by commas (,)
datalist :: Parser String
datalist = do
x <- some datalist'
y <- number
return $ filter (/='\n') $ unlines x ++ y
datalist' :: Parser String
datalist' = try ((++) <$> number <*> (symbol ","))
-- | Parse numbers
number :: Parser String
number = try binnumber
<|> decnumber
<|> hexnumber
binnumber :: Parser String
binnumber = lexeme ((:) <$> char '%' <*> (some $ binDigitChar))
decnumber :: Parser String
decnumber = lexeme $ some digitChar
hexnumber :: Parser String
hexnumber = lexeme ((:) <$> char '$' <*> (some $ hexDigitChar))
----- Helper Function ----------------------------------------------------------
lineComment :: Parser ()
lineComment = L.skipLineComment "#"
-- eats all whitespace and newline
scn :: Parser ()
scn = L.space space1 lineComment empty
-- eats all whitespace but newline
sc :: Parser ()
sc = L.space (void $ takeWhile1P Nothing f) lineComment empty
where
f x = x == ' ' || x == '\t'
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
symbol :: String -> Parser String
symbol = L.symbol sc
-- this is giving me trouble
prog :: Parser Program
prog = between scn eof (sepEndBy sourceline scn)
我已经添加了给我带来麻烦的函数。我已经为这些功能写了一些测试,这里是测试:
-- file Spec.hs
import Syntax
import Parser
import Text.Megaparsec
import Test.Hspec
import Test.Hspec.Megaparsec
import Test.QuickCheck
import Control.Exception (evaluate)
main :: IO ()
main = hspec $ do
describe "Label parsing" $ do
it "Parse empty label field" $
parse sourceline "" " " `shouldParse` SourceLine Nothing Nothing Nothing
it "Parse single character lower-case label" $
parse sourceline "" "x:" `shouldParse` SourceLine (Just "x") Nothing Nothing
it "Parse multi-character label" $
parse sourceline "" "label:" `shouldParse` SourceLine (Just "label") Nothing Nothing
it "Parse multi-character label with trailing whitespace" $
parse sourceline "" "label: " `shouldParse` SourceLine (Just "label") Nothing Nothing
it "Parse label with underscore" $
parse sourceline "" "la_bel: " `shouldParse` SourceLine (Just "la_bel") Nothing Nothing
it "Parse label with underscores and numbers" $
parse sourceline "" "l4_b3l: " `shouldParse` SourceLine (Just "l4_b3l") Nothing Nothing
describe "Label and opcode parsing" $ do
it "Parse line with label and opcode" $
parse sourceline "" "label: lda" `shouldParse` SourceLine (Just "label") (Just "lda") Nothing
it "Parse line opcode only" $
parse sourceline "" "lda" `shouldParse` SourceLine Nothing (Just "lda") Nothing
describe "Opcodes and operands parsing" $ do
it "Parse an opcode with symbol operand" $
parse sourceline "" "lda label_2" `shouldParse` SourceLine Nothing (Just "lda") (Just "label_2")
it "Parse an opcode with binary operand" $
parse sourceline "" "lda %01101" `shouldParse` SourceLine Nothing (Just "lda") (Just "%01101")
it "Parse an opcode with decimal operand" $
parse sourceline "" "lda 1234" `shouldParse` SourceLine Nothing (Just "lda") (Just "1234")
it "Parse an opcode with hexdecimal operand" $
parse sourceline "" "lda $affe34" `shouldParse` SourceLine Nothing (Just "lda") (Just "$affe34")
it "Parse a labeled opcode with symbol operand" $
parse sourceline "" "label: lda label_2" `shouldParse` SourceLine (Just "label") (Just "lda") (Just "label_2")
it "Parse a labeled opcode with binary operand" $
parse sourceline "" "labe_l: lda %01101" `shouldParse` SourceLine (Just "labe_l") (Just "lda") (Just "%01101")
it "Parse a labeled opcode with decimal operand" $
parse sourceline "" "label_2: lda 1234" `shouldParse` SourceLine (Just "label_2") (Just "lda") (Just "1234")
it "Parse a labeled opcode with hexdecimal operand" $
parse sourceline "" "l4b3l: lda $affe34" `shouldParse` SourceLine (Just "l4b3l") (Just "lda") (Just "$affe34")
describe "Operand parsing" $ do
it "Parse a value/data list with decimal values" $
parse sourceline "" "lda 12,23,23,43 " `shouldParse` SourceLine Nothing (Just "lda") (Just "12,23,23,43")
it "Parse a value/data list with binary values" $
parse sourceline "" "lda %101,%111,%000,%001 " `shouldParse` SourceLine Nothing (Just "lda") (Just "%101,%111,%000,%001")
it "Parse a value/data list with hexdecimal values" $
parse sourceline "" "lda 1,$affe,$AfF3,$c3D4 " `shouldParse` SourceLine Nothing (Just "lda") (Just "1,$affe,$AfF3,$c3D4")
it "Parse a value/data list with spaces" $
parse sourceline "" "lda 1, $affe , $AfF3,$c3D4" `shouldParse` SourceLine Nothing (Just "lda") (Just "1,$affe,$AfF3,$c3D4")
it "Parse a value/data list with spaces and mixed values" $
parse sourceline "" "lda %101, 1234 , $AfF3,$c3D4" `shouldParse` SourceLine Nothing (Just "lda") (Just "%101,1234,$AfF3,$c3D4")
-- describe "Parse multiple lines" $ do
-- it "Parse a 3-line program" $
-- parse prog "" "label1: \n lda \nsta %10011001" `shouldParse` [SourceLine (Just "label1") Nothing Nothing,
-- SourceLine Nothing (Just "lda") (Just ""),
-- SourceLine Nothing (Just "sta") (Just "%10011001")]
和汇编文件一样,我想逐行解析源代码。上面的所有测试都通过了,除了注释掉的那个。 运行 prog
in ghci
与 parseTest
产生相同的结果,它 returns 没有结果并最终崩溃:
*Main Parser Syntax Text.Megaparsec> parseTest sourceline "lda # comment ignored"
SourceLine {label = Nothing, instr = Just "lda", operand = Just ""}
*Main Parser Syntax Text.Megaparsec> parseTest prog "lda \nsta %1010"
-- crashes
我假设我在我的代码中以某种方式 ab-/overusing lexeme
从解析的字符串中删除尾随空格。我错过了什么?
只要 sourceLine
和 scn
解析器匹配,sepEndBy sourceline scn
就会继续迭代。但是,这两个解析器都可以在不消耗任何输入的情况下成功完成,因此它们将始终匹配。由于 sourceLine
的所有分支都有一个 try
,任何解析错误都会导致解析器后退并匹配无限数量的空源代码行。即使没有解析错误,达到 eof 也会产生无限多的源代码行。