为什么这个收缩树看起来像使用过滤器时的样子

Question

我试图了解在使用 hedgehog 集成收缩.

时过滤器对生成器收缩树的影响

考虑以下函数：

{-# LANGUAGE OverloadedStrings #-}

import Hedgehog
import qualified Hedgehog.Gen as Gen

aFilteredchar:: Gen Char
aFilteredchar =
  Gen.filter (`elem` ("x" :: String)) (Gen.element "yx")

当打印收缩树时：

>>>  Gen.printTree aFilteredchar

我会得到如下所示的收缩树：

'x'
 └╼'x'
    └╼'x'
       └╼'x'
               ...

                   └╼<discard>

这是一棵非常深的树，仅包含 x，最后是 discard。

为什么收缩函数一直返回 x 而不是一个空列表，这表明没有进一步收缩的可能？

Answer 1

Gen 本质上是概率单子和树单子的组合，您观察到的行为主要来自树单子和 Gen.filter.

的定义

基本上，Gen.filter p g 是一个简单的单子循环，try 0 其中：

-- simplified body of filter
try k =
  if k > 100 then
    discard  -- empty tree
  else do
    x <- g
    if p x then
      pure x  -- singleton tree
    else
      try (k + 1)  -- keep looping

所以要理解你得到的树，你必须在这里理解 do 符号下的树 monad。

树单子

Gen内部使用的Tree type in hedgehog大致如下（如果你看hedgehog中的链接实现，设置m ~ Maybe）：

data Tree a = Empty | Node a [Tree a]  -- node label and children

还有许多其他类似 Tree 的类型是 monad，monadic 绑定 (>>=) 通常采用树替换的形式。

假设你有一棵树 t = Node x [t1, t2, ...] :: Tree a，还有一个 continuation/substitution k :: a -> Tree b，它将每个 node/variable x :: a 替换为树 k x :: Tree b .我们可以分两步描述t >>= k，fmap然后join，如下。首先 fmap 在每个节点标签上应用替换。所以我们得到一棵树，其中每个节点都被另一棵树标记。具体来说，说 k x = Node y [u1, u2, ...]:

fmap k t
=
Node
  (k x)                        -- node label
  [fmap k t1, fmap k t2, ...]  -- node children
=
Node
  (Node y [u1, u2, ...])       -- node label
  [fmap k t1, fmap k t2, ...]  -- node children

然后 join 步骤展平嵌套树结构，将标签内部的子项与外部的子项连接起来：

t >>= k
=
join (fmap k t)
=
Node
  y
  ([join (fmap k t1), join (fmap k t2), ...] ++ [u1, u2, ...])

要完成 Monad 实例，请注意我们有 pure x = Node x []。

try 循环

现在我们对树 monad 有了一些直觉，我们可以转向您的特定生成器。我们要计算上面的 try k，其中 p = (== 'x') 和 g = elements "yx"。我在这里挥手，但你应该想象 g 随机评估树 Node 'y' [] （生成 'y' 没有收缩），又名。 pure 'y'，或 Node 'x' [Node 'y' []]（生成 'x' 并缩小到 'y'；实际上，“elements 向左缩小”），并且每次出现 g是独立于其他的，所以我们重试时得到不同的结果。

让我们分别检查每个案例。如果 g = pure 'y' 会发生什么？假设 k <= 100 所以我们在顶层 if 的 else 分支中，下面已经简化了：

-- simplified body of filter
try k = do
    c <- pure 'y'     -- g = pure 'y'
    if c == 'x' then  -- p c = (c == 'x')
      pure c
    else
      try (k + 1)

-- since   (do c <- pure 'y' ; s c) = s 'y'  (monad law)   and   ('y' == 'x') = False

try k = try (k + 1)

所以 g 计算为 pure 'y' 的所有时间最终都简化为递归项 try (k + 1)，我们只剩下 g 计算的情况到另一棵树 Node 'x' [Node 'y' []]:

try k = do
  c <- Node 'x' [Node 'y' []]  -- g
  if c == 'x' then
    pure c
  else
    try (k + 1)

如前一节所述，单子绑定等价于以下内容，我们以一些等式推理结束。

try k = join (Node (s 'x') [Node (s 'y') []])
  where
    s c = if c == 'x' then pure c else try (k + 1)

try k = join (Node (pure 'x') [Node (try (k + 1)) []])
try k = join (Node (pure 'x') [pure (try (k + 1))]  -- simplifying join
try k = Node 'x' [join (pure (try (k + 1)))]        -- join . pure = id
try k = Node 'x' [try (k + 1)]

综上所述，从try 0开始，一半概率try k = try (k + 1)，另一半概率try k = Node 'x' [try (k + 1)]，最后停在try 100。这解释了您观察到的树。

try 0 = Node 'x' [Node 'x' [ ... ]]  -- about 50 nodes

（我相信这至少也为提供了部分答案，因为这表明缩小 Gen.filter 通常相当于从头开始重新运行生成器。）

Answer 2

虽然 Li-yao Xia's detailed answer correctly describes how this happens, it doesn't address the why; why does it re-run the generator after each shrink? The answer is that it shouldn't; this is a bug. See bug report Improve Filter 在 GitHub。

为什么这个收缩树看起来像使用过滤器时的样子

Why the does this shrink tree looks the way it does when using filter

haskell

quickcheck

property-based-testing

haskell-hedgehog

树单子

try 循环