以 Haskell do-notation 生成唯一值

Question

为了生成 x86 汇编代码，我定义了一个名为 X86:

的自定义类型

data X86 a = X86 { code :: String, counter :: Integer, value :: (X86 a -> a) }

这种类型用在如下的do-notation中。这使得编写用于生成 if 语句、for 循环等的模板变得容易...

generateCode :: X86 ()
generateCode = do
  label1 <- allocateUniqueLabel
  label2 <- allocateUniqueLabel
  jmp label1
  label label1
  jmp label2
  label label2

指令定义如下：

jmp :: String -> X86 ()
jmp l = X86 { code = "jmp " ++ l ++ ";\n", counter = 0, value = const () }

label :: String -> X86 ()
label l = X86 { code = l ++ ":\n", counter = 0, value = const () }

完成的汇编文件打印如下：

printAsm :: X86 a -> String
printAsm X86{code=code} = code

main = do
  putStrLn (printAsm generateCode)

我按以下方式实现了 X86 monad。本质上，序列运算符按顺序连接汇编代码块并确保计数器递增。

instance Monad X86 where
  x >> y = X86 { code = code x ++ code y, counter = counter x + counter y, value = value y }
  x >>= f = x >> y
    where y = f (value x x)

问题是标签没有正确递增，所以它们不是唯一的！以下是输出：

jmp Label1;
Label1:
jmp Label1;
Label1:

我希望每个标签的输出具有唯一值：

jmp Label1;
Label1:
jmp Label2;
Label2:

为了完成示例，这里是 allocatedUniqueLabel 函数的实现：

allocateUniqueId :: X86 Integer
allocateUniqueId = X86 { code = "", counter = 1, value = counter }

allocateUniqueLabel :: X86 String
allocateUniqueLabel = do
  id <- allocateUniqueId
  return ("Label" ++ show id)

如何修复我的 X86 monad 以使标签唯一？

这是我试过的：

递增全局计数器。 => Haskell 不安全地允许 IO monad 之外的全局状态。
使用 State monad。 => 我研究了很多示例，但不明白如何将它们集成到我现有的 X86 monad 中。
在 monad 之外跟踪计数器。 => 我宁愿更新计数器 "behind the scenes"；否则，很多不使用标签的代码模板将需要手动传播计数器。

Answer 1

我们可以用mtl classes来形容X86代码是有效的程序。我们想要：

生成代码，这是Writer效果；
保持计数器，这是State效果。

我们担心最后实例化这些效果，并且在我们使用 MonadWriter 和 MonadState 约束的程序描述中。

import Control.Monad.State  -- mtl
import Control.Monad.Writer

分配新的标识符会使计数器递增，但不会生成任何代码。这仅使用 State 效果。

type Id = Integer

allocateUniqueLabel :: MonadState Id m => m String
allocateUniqueLabel = do
  i <- get
  put (i+1)  -- increment
  return ("Label" ++ show (i+1))

当然，我们有生成代码的操作，不需要关心当前状态。所以他们使用Writer效果。

jmp :: MonadWriter String m => String -> m ()
jmp l = tell ("jmp " ++ l ++ ";\n")

label :: MonadWriter String m => String -> m ()
label l = tell (l ++ ":\n")

实际程序看起来和原来的一样，但类型更通用。

generateCode :: (MonadState Id m, MonadWriter String m) => m ()
generateCode = do
  label1 <- allocateUniqueLabel
  label2 <- allocateUniqueLabel
  jmp label1
  label label1
  jmp label2
  label label2

效果是在我们运行这个程序时实例化的，这里使用runWriterT/runWriter和runStateT/runState（顺序不重要的是，这两种影响相互影响）。

type X86 = WriterT String (State Id)

runX86 :: X86 () -> String
runX86 gen = evalState (execWriterT gen) 1 -- start counting from 1
-- evalState and execWriterT are wrappers around `runStateT` and `runWriterT`:
-- - execWriterT: discards the result (of type ()), only keeping the generated code.
-- - evalState: discards the final state, only keeping the generated code,
--   and does some unwrapping after there are no effects to handle.

Answer 2

您可能想使用这个 monad 堆栈：

type X86 a = StateT Integer (Writer String) a

既然你有一个国家和一个作家，你也可以考虑使用RWS（reader-writer-state合二为一）：

type X86 a = RWS () String Integer a

我们挑第一个玩玩吧。我首先定义一个辅助函数来增加计数器 (monads cannot lawfully increment a counter "automatically"):

instr :: X86 a -> X86 a
instr i = do
    x <- i
    modify (+1)
    return x

那么您可以将 jmp 定义为：

jmp :: String -> X86 ()
jmp l = instr $ do
    lift (tell ("jmp " ++ l ++ ";\n"))
       -- 'tell' is one of Writer's operations, and then we 'lift'
       -- it into StateT

（那里的do是多余的，但是我怀疑会有一种用instr $ do开始指令定义的模式）

我不会为此推出我自己的 monad —— 这样做可能很有启发性，但我认为使用标准库你会得到更多的里程.

Answer 3

正如您现在可能从其他答案中了解到的那样，您的问题方法是，即使你在使用柜台，你仍然在本地生成标签。特别是

label1 <- allocateUniqueLabel
label label1

相当于

X86 { code = "Label1:\n", counter = 1, value = const () }

我们需要先assemble整个代码，生成标签，并且只之后（在某种意义上）使用标签生成实际代码。这就是其他答案通过存储计数器所暗示的在 State（或 RWS）单子中。

我们还可以解决另一个问题：您希望能够同时跳跃向前和向后。这很可能是你有单独的原因 allocateUniqueLabel 和 label 函数。但这允许设置相同标签两次。

实际上可以使用 do 符号与 "backwards" 绑定使用 MonadFix, 它定义了这个 monadic 操作：

mfix :: (a -> m a) -> m a

由于State和RWS都有MonadFix个实例，我们确实可以写代码像这样：

{-# LANGUAGE GeneralizedNewtypeDeriving, RecursiveDo #-}
module X86
    ( X86()
    , runX86
    , label
    , jmp
    ) where

import Control.Monad.RWS

-- In production code it'll be much faster if we replace String with
-- ByteString.
newtype X86 a = X86 (RWS () String Int a)
    deriving (Functor, Applicative, Monad, MonadFix)

runX86 :: X86 a -> String
runX86 (X86 k) = snd (execRWS k () 1)

newtype Label = Label { getLabel :: String }

label :: X86 Label
label = X86 $ do
    counter <- get
    let l = "Label" ++ show counter
    tell (l ++ ":\n")
    modify (+1)
    return (Label l)

jmp :: Label -> X86 ()
jmp (Label l) = X86 . tell $ "jmp " ++ l ++ ";\n"

并像这样使用它：

example :: X86 ()
example = do
    rec l1 <- label
        jmp l2
        l2 <- label
    jmp l1

有几点需要注意：

我们需要使用 RecursiveDo 扩展来启用 rec 关键字。
关键字rec 分隔了一个相互递归的定义块。在我们的例子中它也可以晚一行开始 (rec jmp l2)。 GHC 然后将其翻译成在内部使用 mfix。（使用已弃用的 mdo 关键字而不是 rec 会使代码更自然。）
我们将内部结构包装在 X86 新类型中。首先，隐藏总是好的内部实现，它允许以后轻松重构。二、mfix 要求传递给它的函数 a -> m a 不严格争论。效果不能依赖于参数，否则 mfix 分歧。这是满足我们功能的条件，但是如果内部结构暴露了，有人可以像这样定义一个人为的函数：
```
-- | Reset the counter to the specified label.
evilReset :: Label -> X86 ()
evilReset = X86 . put . read . drop 5 . getLabel
```
不仅破坏了labels的唯一性，还会导致如下代码挂起：
```
diverge :: X86 ()
diverge = do
    rec evilReset l2
        l2 <- label
    return ()
```

另一个非常相似的替代方法是使用 Rand monad 并生成标签 Random 实例 UUID。类似于 WriterT String Rand a，它也有一个 MonadFix 实例。

（从纯粹的学术角度来看，构建一个箭头而不是 monad 是可能的，这将实现 ArrowLoop, 但不允许依赖于值的状态修改，例如 evilReset。但是 X86 的封装实现了相同的目标，保持更友好的 do 语法。）

以 Haskell do-notation 生成唯一值

Generating a unique value in Haskell do-notation

monads

haskell

state-monad

do-notation