兆秒差距中的运算符优先级
Operator precedence in megaparsec
我在使用 Megaparsec 6 的 makeExprParser
助手时遇到问题。我似乎无法弄清楚如何在我期望的优先级别绑定二进制 ^
和一元 -
。
使用这个 makeExprParser
表达式解析器:
expressionParser :: Parser Expression
expressionParser =
makeExprParser termParser
[
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
我希望这些测试能够通过:
testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"
也就是说,-1^-2
应该解析为与 -(1^(-2))
相同的东西。这就是例如Python 解析它:
>>> 2**-2
0.25
>>> -2**-2
-0.25
>>> -2**2
-4
和Ruby:
irb(main):004:0> 2**-2
=> (1/4)
irb(main):005:0> -2**-2
=> (-1/4)
irb(main):006:0> -2**2
=> -4
但是这个 Megaparsec 解析器根本无法解析 1^-2
,而是给我有用的错误:
(TrivialError (SourcePos {sourceName = \"test.txt\", sourceLine = Pos 1, sourceColumn = Pos 3} :| []) (Just (Tokens ('-' :| \"\"))) (fromList [Tokens ('(' :| \"\"),Label ('i' :| \"nteger\")]))")
我读到说 "I could have taken any of these characters here, but that -
has me flummoxed"。
如果我像这样调整运算符 table 的某些优先级(将指数移到一元 - 之后):
expressionParser =
makeExprParser termParser
[
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
然后我不再遇到解析失败,但是 -1^2
错误地解析为 (-1)^2
(而不是正确的 -(1^2)
)。
这是一个完整的独立解析器来显示问题(它需要 HUnit,当然还需要 megaparsec):
module Hascas.Minimal where
import Data.Void (Void)
import Test.HUnit hiding (test)
import Text.Megaparsec hiding (ParseError)
import Text.Megaparsec.Char
import Text.Megaparsec.Expr
import qualified Text.Megaparsec as MP
import qualified Text.Megaparsec.Char.Lexer as L
data Expression
= Literal Integer
| MonOp MonoOperator Expression
| BinOp BinaryOperator Expression Expression
deriving (Read, Show, Eq, Ord)
data BinaryOperator
= BinaryPlus
| BinaryMinus
| BinaryDiv
| BinaryMult
| BinaryExp
deriving (Read, Show, Eq, Ord)
data MonoOperator
= MonoPlus
| MonoMinus
deriving (Read, Show, Eq, Ord)
type Parser a = Parsec Void String a
type ParseError = MP.ParseError (Token String) Void
spaceConsumer :: Parser ()
spaceConsumer = L.space space1 lineComment blockComment
where
lineComment = L.skipLineComment "//"
blockComment = L.skipBlockComment "/*" "*/"
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
symbol :: String -> Parser String
symbol = L.symbol spaceConsumer
expressionParser :: Parser Expression
expressionParser =
makeExprParser termParser
[
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
termParser :: Parser Expression
termParser = (
(try $ Literal <$> L.decimal)
<|> (try $ parens expressionParser))
parens :: Parser a -> Parser a
parens x = between (symbol "(") (symbol ")") x
main :: IO ()
main = do
-- just to show that it does work in the + case:
test expressionParser "1+(-2)" $
BinOp BinaryPlus (Literal 1) (MonOp MonoMinus $ Literal 2)
test expressionParser "1+-2" $
BinOp BinaryPlus (Literal 1 ) (MonOp MonoMinus $ Literal 2)
-- but not in the ^ case
test expressionParser "1^-2" $
BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
test expressionParser "-1^2" $
MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (Literal 2)
test expressionParser "-1^-2" $
MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
-- exponent precedence is weird
testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"
testEqual expressionParser "1^2^3^4" "1^(2^(3^(4))))"
where
test :: (Eq a, Show a) => Parser a -> String -> a -> IO ()
test parser input expected = do
assertEqual input (Right expected) $ parse (spaceConsumer >> parser <* eof) "test.txt" input
testEqual :: (Eq a, Show a) => Parser a -> String -> String -> IO ()
testEqual parser input expected = do
assertEqual input (p expected) (p input)
where
p i = parse (spaceConsumer >> parser <* eof) "test.txt" i
是否有可能让 Megaparsec 以其他语言所做的优先级别解析这些运算符?
makeExprParser termParser [precN, ..., prec1]
将生成一个表达式解析器,其工作方式是每个优先级别调用下一个更高级别的优先级。所以如果你手动定义它,你会有一个中缀 +
和 -
的规则,它使用 mult-and-div 规则作为操作数。这反过来将使用前缀规则作为操作数,并且将使用 ^
规则作为操作数。最后,^
规则使用 termParser
作为操作数。
这里要注意的重要一点是 ^
规则(或更一般地说:优先级高于前缀运算符的任何规则)调用的解析器在开始时不接受前缀运算符。因此前缀运算符不能出现在此类运算符的右侧(括号内除外)。
这基本上意味着 makeExprParser
不支持您的用例。
要解决此问题,您可以使用 makeExprParser
仅处理优先级低于前缀运算符的中缀运算符,然后手动处理前缀运算符和 ^
,以便正确^
的操作数将 "loop back" 到前缀运算符。像这样:
expressionParser =
makeExprParser prefixParser
[
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
prefixParser =
do
prefixOps <- many prefixOp
exp <- exponentiationParser
return $ foldr ($) exp prefixOps
where
prefixOp = MonOp MonoMinus <$ symbol "-" <|> MonOp MonoPlus <$ symbol "+"
exponentiationParser =
do
lhs <- termParser
-- Loop back up to prefix instead of going down to term
rhs <- optional (symbol "^" >> prefixParser)
return $ maybe lhs (BinOp BinaryExp lhs) rhs
请注意,与 makeExprParser
不同,这还允许多个连续的前缀运算符(如 --x
表示双重否定)。如果您不希望这样,请在 prefixParser
.
的定义中将 many
替换为 optional
我在使用 Megaparsec 6 的 makeExprParser
助手时遇到问题。我似乎无法弄清楚如何在我期望的优先级别绑定二进制 ^
和一元 -
。
使用这个 makeExprParser
表达式解析器:
expressionParser :: Parser Expression
expressionParser =
makeExprParser termParser
[
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
我希望这些测试能够通过:
testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"
也就是说,-1^-2
应该解析为与 -(1^(-2))
相同的东西。这就是例如Python 解析它:
>>> 2**-2
0.25
>>> -2**-2
-0.25
>>> -2**2
-4
和Ruby:
irb(main):004:0> 2**-2
=> (1/4)
irb(main):005:0> -2**-2
=> (-1/4)
irb(main):006:0> -2**2
=> -4
但是这个 Megaparsec 解析器根本无法解析 1^-2
,而是给我有用的错误:
(TrivialError (SourcePos {sourceName = \"test.txt\", sourceLine = Pos 1, sourceColumn = Pos 3} :| []) (Just (Tokens ('-' :| \"\"))) (fromList [Tokens ('(' :| \"\"),Label ('i' :| \"nteger\")]))")
我读到说 "I could have taken any of these characters here, but that -
has me flummoxed"。
如果我像这样调整运算符 table 的某些优先级(将指数移到一元 - 之后):
expressionParser =
makeExprParser termParser
[
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
然后我不再遇到解析失败,但是 -1^2
错误地解析为 (-1)^2
(而不是正确的 -(1^2)
)。
这是一个完整的独立解析器来显示问题(它需要 HUnit,当然还需要 megaparsec):
module Hascas.Minimal where
import Data.Void (Void)
import Test.HUnit hiding (test)
import Text.Megaparsec hiding (ParseError)
import Text.Megaparsec.Char
import Text.Megaparsec.Expr
import qualified Text.Megaparsec as MP
import qualified Text.Megaparsec.Char.Lexer as L
data Expression
= Literal Integer
| MonOp MonoOperator Expression
| BinOp BinaryOperator Expression Expression
deriving (Read, Show, Eq, Ord)
data BinaryOperator
= BinaryPlus
| BinaryMinus
| BinaryDiv
| BinaryMult
| BinaryExp
deriving (Read, Show, Eq, Ord)
data MonoOperator
= MonoPlus
| MonoMinus
deriving (Read, Show, Eq, Ord)
type Parser a = Parsec Void String a
type ParseError = MP.ParseError (Token String) Void
spaceConsumer :: Parser ()
spaceConsumer = L.space space1 lineComment blockComment
where
lineComment = L.skipLineComment "//"
blockComment = L.skipBlockComment "/*" "*/"
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
symbol :: String -> Parser String
symbol = L.symbol spaceConsumer
expressionParser :: Parser Expression
expressionParser =
makeExprParser termParser
[
[InfixR $ BinOp BinaryExp <$ symbol "^"],
[
Prefix $ MonOp MonoMinus <$ symbol "-",
Prefix $ MonOp MonoPlus <$ symbol "+"
],
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
termParser :: Parser Expression
termParser = (
(try $ Literal <$> L.decimal)
<|> (try $ parens expressionParser))
parens :: Parser a -> Parser a
parens x = between (symbol "(") (symbol ")") x
main :: IO ()
main = do
-- just to show that it does work in the + case:
test expressionParser "1+(-2)" $
BinOp BinaryPlus (Literal 1) (MonOp MonoMinus $ Literal 2)
test expressionParser "1+-2" $
BinOp BinaryPlus (Literal 1 ) (MonOp MonoMinus $ Literal 2)
-- but not in the ^ case
test expressionParser "1^-2" $
BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
test expressionParser "-1^2" $
MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (Literal 2)
test expressionParser "-1^-2" $
MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
-- exponent precedence is weird
testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"
testEqual expressionParser "1^2^3^4" "1^(2^(3^(4))))"
where
test :: (Eq a, Show a) => Parser a -> String -> a -> IO ()
test parser input expected = do
assertEqual input (Right expected) $ parse (spaceConsumer >> parser <* eof) "test.txt" input
testEqual :: (Eq a, Show a) => Parser a -> String -> String -> IO ()
testEqual parser input expected = do
assertEqual input (p expected) (p input)
where
p i = parse (spaceConsumer >> parser <* eof) "test.txt" i
是否有可能让 Megaparsec 以其他语言所做的优先级别解析这些运算符?
makeExprParser termParser [precN, ..., prec1]
将生成一个表达式解析器,其工作方式是每个优先级别调用下一个更高级别的优先级。所以如果你手动定义它,你会有一个中缀 +
和 -
的规则,它使用 mult-and-div 规则作为操作数。这反过来将使用前缀规则作为操作数,并且将使用 ^
规则作为操作数。最后,^
规则使用 termParser
作为操作数。
这里要注意的重要一点是 ^
规则(或更一般地说:优先级高于前缀运算符的任何规则)调用的解析器在开始时不接受前缀运算符。因此前缀运算符不能出现在此类运算符的右侧(括号内除外)。
这基本上意味着 makeExprParser
不支持您的用例。
要解决此问题,您可以使用 makeExprParser
仅处理优先级低于前缀运算符的中缀运算符,然后手动处理前缀运算符和 ^
,以便正确^
的操作数将 "loop back" 到前缀运算符。像这样:
expressionParser =
makeExprParser prefixParser
[
[
InfixL $ BinOp BinaryMult <$ symbol "*",
InfixL $ BinOp BinaryDiv <$ symbol "/"
],
[
InfixL $ BinOp BinaryPlus <$ symbol "+",
InfixL $ BinOp BinaryMinus <$ symbol "-"
]
]
prefixParser =
do
prefixOps <- many prefixOp
exp <- exponentiationParser
return $ foldr ($) exp prefixOps
where
prefixOp = MonOp MonoMinus <$ symbol "-" <|> MonOp MonoPlus <$ symbol "+"
exponentiationParser =
do
lhs <- termParser
-- Loop back up to prefix instead of going down to term
rhs <- optional (symbol "^" >> prefixParser)
return $ maybe lhs (BinOp BinaryExp lhs) rhs
请注意,与 makeExprParser
不同,这还允许多个连续的前缀运算符(如 --x
表示双重否定)。如果您不希望这样,请在 prefixParser
.
many
替换为 optional