有没有更易于维护的方法来处理我的数据类型?

Is there a more maintainable way to process my datatype?

我有一个使用以下数据类型定义的递归下降解析器的产品:

data CST 
    = Program CST CST
    | Block CST CST CST 
    | StatementList CST CST
    | EmptyStatementList
    | Statement CST
    | PrintStatement CST CST CST CST
    | AssignmentStatement CST CST CST
    | VarDecl CST CST
    | WhileStatement CST CST CST 
    | IfStatement CST CST CST 
    | Expr CST
    | IntExpr1 CST CST CST 
    | IntExpr2 CST
    | StringExpr CST CST CST
    | BooleanExpr1 CST CST CST CST CST
    | BooleanExpr2 CST 
    | Id CST
    | CharList CST CST 
    | EmptyCharList
    | Type CST 
    | Character CST
    | Space CST
    | Digit CST
    | BoolOp CST
    | BoolVal CST
    | IntOp CST
    | TermComponent Token
    | ErrorTermComponent (Token, Int)
    | NoInput

正如数据类型名称所暗示的那样,数据类型构造了一个具体的语法树。我想知道是否有一种更易于维护的方式来对这种类型进行模式匹配。例如,为了跟踪解析调用的执行,我有以下内容:

checkAndPrintParse :: CST -> IO ()
checkAndPrintParse (Program c1 c2) = do
    putStrLn "Parser: parseProgram" 
    checkAndPrintParse c1
    checkAndPrintParse c2
checkAndPrintParse (Block c1 c2 c3) = do
    putStrLn "Parser: parseBlock"
    checkAndPrintParse c1
    checkAndPrintParse c2
    checkAndPrintParse c3
checkAndPrintParse (StatementList c1 c2) = do
    putStrLn "Parser: parseStatementList"
    checkAndPrintParse c1
    checkAndPrintParse c2

等等。我查看了 fix function/pattern,但我不确定它是否适用于此处。

使用generic-deriving获取构造函数的名称:

  • 导出 Generic(来自 GHC.Generics
  • 呼叫 conNameOf :: CSTF -> String(来自 Generics.Deriving

使用recursion-schemes遍历一个递归类型:

  • makeBaseFunctor 导出递归类型的基函子。 CST 的基函子,称为 CSTF,是一个参数化类型,其形状与 CST 相同,但 CST 的递归出现被类型参数替换。
  • 学习使用 cata (it may be a bit mind bending at the beginning). In this case we want to recursively construct an IO () action from a CST, i.e., a function CST -> IO (). For that, the type of cata 变为 (CSTF (IO ()) -> IO ()) -> CST -> IO ()(使用 t ~ CSTa ~ IO ()),其中第一个参数定义了生成的递归函数的主体,以及结果的递归调用放置在基本仿函数的字段中。

因此,如果您的目标是编写递归函数 checkAndPrintParse,其中一种情况如下:

checkAndPrintParse (Program c1 c2) = do
  putStrLn "Parser: parseProgram" 
  checkAndPrintParse c1
  checkAndPrintParse c2

cata 会将其递归调用的结果放在 c1c2 上,以代替这些字段:

-- goal: find f such that   cata f = checkAndPrintParse

-- By definition of cata
cata f (Program c1 c2) = f (ProgramF (cata f c1) (cata f c2))

-- By the goal and the definition of checkAndPrintParse
cata f (Program c1 c2) = checkAndPrintParse (Program c1 c2) = do
  putStrLn "Parser: parseProgram" 
  checkAndPrintParse c1
  checkAndPrintParse c2

因此

f (ProgramF (cata f c1) (cata f c2)) = do
  putStrLn "Parser: parseProgram"
  cata f c1
  cata f c2

抽象 cata f c1cata f c2

f (ProgramF x1 x2) = do
  putStrLn "Parser: parserProgram"
  x1 >> x2

识别折叠(在 Foldable 意义上)

f t@(ProgramF _ _) = do
  putStrLn "Parser: parserProgram"
  sequence_ t

再次概括

f t = do
  putStrLn $ "Parser: " ++ conNameOf t  -- Prints "ProgramF" instead of "parserProgram"... *shrugs*
  sequence_ t

这就是我们给 cata 的论据。


{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TemplateHaskell #-}

import GHC.Generics
import Generics.Deriving (conNameOf)
import Data.Functor.Foldable
import Data.Functor.Foldable.TH (makeBaseFunctor)

data CST 
    = Program CST CST
    | Block CST CST CST 
    | StatementList CST CST
    | EmptyStatementList
    | Statement CST
    | PrintStatement CST CST CST CST
    | AssignmentStatement CST CST CST
    | VarDecl CST CST
    | WhileStatement CST CST CST 
    | IfStatement CST CST CST 
    | Expr CST
    | IntExpr1 CST CST CST 
    | IntExpr2 CST
    | StringExpr CST CST CST
    | BooleanExpr1 CST CST CST CST CST
    | BooleanExpr2 CST 
    | Id CST
    | CharList CST CST 
    | EmptyCharList
    | Type CST 
    | Character CST
    | Space CST
    | Digit CST
    | BoolOp CST
    | BoolVal CST
    | IntOp CST
    | TermComponent Token
    | ErrorTermComponent (Token, Int)
    | NoInput
    deriving Generic

data Token = Token

makeBaseFunctor ''CST

deriving instance Generic (CSTF a)

checkAndPrintParse :: CST -> IO ()
checkAndPrintParse = cata $ \t -> do
  putStrLn $ "Parser: " ++ conNameOf t
  sequence_ t

main = checkAndPrintParse $
  Program (Block NoInput NoInput NoInput) (Id NoInput)

输出:

Parser: ProgramF
Parser: BlockF
Parser: NoInputF
Parser: NoInputF
Parser: NoInputF
Parser: IdF
Parser: NoInputF