为什么在枚举所有情况时通配符匹配不起作用？

Question

考虑这段代码：

{-# LANGUAGE GADTs #-}

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA _ = False

它编译并且工作正常。现在考虑这段代码：

{-# LANGUAGE GADTs #-}

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA PB = False
isA PC = False

编译失败：

Main.hs:8:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Int
          bound by a pattern with constructor: PA :: P Int,
                   in an equation for ‘isA’
          at Main.hs:8:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: True
      In an equation for ‘isA’: isA PA = True
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
  |
8 | isA PA = True
  |          ^^^^

Main.hs:9:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Double
          bound by a pattern with constructor: PB :: P Double,
                   in an equation for ‘isA’
          at Main.hs:9:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: False
      In an equation for ‘isA’: isA PB = False
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
  |
9 | isA PB = False
  |          ^^^^^

Main.hs:10:10: error:
    • Couldn't match expected type ‘p’ with actual type ‘Bool’
        ‘p’ is untouchable
          inside the constraints: t ~ Char
          bound by a pattern with constructor: PC :: P Char,
                   in an equation for ‘isA’
          at Main.hs:10:5-6
      ‘p’ is a rigid type variable bound by
        the inferred type of isA :: P t -> p
        at Main.hs:(8,1)-(10,14)
      Possible fix: add a type signature for ‘isA’
    • In the expression: False
      In an equation for ‘isA’: isA PC = False
    • Relevant bindings include isA :: P t -> p (bound at Main.hs:8:1)
   |
10 | isA PC = False
   |          ^^^^^

为什么？这是怎么回事？

编辑：添加类型签名 isA :: P t -> Bool 使其起作用，所以我现在的问题是：为什么类型推断在第二种情况下不起作用，而在第一种情况下起作用？

Answer 1

免责声明：我把它写成一个答案，因为它不适合在评论中。但我可能错了

GADTs 上的模式匹配时，此行为是预期的。最多 GHC 个 User manual：

type refinement is only carried out based on user-supplied type annotations. So if no type signature is supplied for eval, no type refinement happens, and lots of obscure error messages will occur

也来自用户手册：

When pattern-matching against data constructors drawn from a GADT, for example in a case expression, the following rules apply:

The type of the scrutinee must be rigid.
The type of the entire case expression must be rigid.
The type of any free variable mentioned in any of the case alternatives must be rigid.

注意：类型变量是刚性的iff它是由用户指定的。

到目前为止，当针对 GADT 进行模式匹配时，您必须提供类型签名（原因是 GADTs 上的类型推断很困难).因此，显然 isA 的第一个定义应该无法编译，但在 paper 中解释了 GADTs 的类型推断（第 6.4 节）：

We remarked in Section 4.3 that in PCON-R it would be unsound to use any unifier other than a most-general one. But must the refinement be a unifier at all? For example, even though the case expression could do refinement, no refinement is necessary to typecheck this function:

f :: Term a -> Int
f (Lit i) = i
f _       = 0

上面的例子正是你的情况！。在论文中，这被称为 pre-unifier 并且有一个关于它如何工作的非常技术性的解释但据我所知，在写作时：

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA PB = False
isA PC = False

编译器首先推导 isA :: P t -> p 并拒绝继续，因为类型变量不是严格的（即不是用户指定的）

而写作时：

data P t where
 PA :: P Int
 PB :: P Double
 PC :: P Char

isA PA = True
isA _  = False

编译器可以推断出任何类型推断都没有将 Bool 推断为返回类型那么普遍，因此它可以安全地推断出 isA :: P t -> Bool.

可能这对你来说和我一样晦涩难懂，但可以肯定的是，你要求的两个案例实际上已经记录在案，所以这可能是 GHC 开发人员期望的行为，而不是一个奇怪的错误。

Answer 2

在 GADT 不存在 中输入 case 构造（无论是显式 case 语句还是隐式 pattern-based 函数定义）时，各个备选方案:

pattern -> body

可以统一，把所有的模式都打出来，统一到被检查者的类型上，然后把所有的体打完，再和case表达式的类型统一起来。所以，在一个简单的例子中：

data U = UA | UB | UC
isA1 u = case u of
  UA -> True
  UB -> False
  x  -> False

我们可以初步键入模式 UA :: U、UB :: U、x :: a，使用类型相等 a ~ U 统一它们以推断受检者的类型 u :: U，并类似地将 True :: Bool 和两个 False :: Bool 统一为整个案例表达式的类型 Bool，将其与 isA 的类型统一以获得 isA :: U -> Bool.

请注意，统一过程可以专门化类型。在这里，模式的类型 x :: a 是通用的，但是在统一过程结束时，它已经专门化为 x :: U。这也可能发生在身体上。例如：

len mstr = case mstr of
  Nothing -> 0
  Just str -> length str

在这里，0 :: Num a => a 是多态的，但是因为 length return 是一个 Int，在过程结束时，主体（以及整个案例表达式) 已统一为类型 Int.

一般来说，通过统一，所有主体的通用、统一类型（以及整个 case 表达式的类型）将是 "most general" / "least restrictive" 类型，其中类型身体的所有概括。在某些情况下，这种类型可能是其中一个body的类型，但总的来说，所有的body都可以比"most general"统一类型更通用，但没有body 可以更严格。

在 GADT 的存在下，情况发生了变化。当 type-checking 用 GADT 构造 case 时，备选方案中的模式可以引入 "type refinement"，一组附加的类型变量绑定，用于 type-checking 备选方案的主体。（这就是 GADT 之所以有用的首要原因。）

因为不同备选方案的主体是在不同的改进下键入的，所以不可能进行简单的统一。例如，考虑微型 DSL 及其解释器：

data Term a where
  Lit :: Int -> Term Int
  IsZ :: Term Int -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a

eval :: Term a -> a
eval t = case t of
  Lit n -> n
  IsZ t -> eval t == 0
  If b t e -> if eval b then eval t else eval e

如果我们天真地统一主体 n :: Int、eval t == 0 :: Bool 和 if eval b then eval t else eval e :: a，程序将不会进行类型检查（最明显的是，因为 Int 和Bool不要统一！）。

一般来说，因为类型优化允许替代体的计算类型比最终类型更具体，所以没有明显的"most general" / "least restrictive" 所有主体都可以统一到的类型，就像没有 GADT 的 case 表达式一样。

相反，我们通常需要为整个 case 表达式提供一个 "target" 类型（例如，对于 eval，类型中的 return 类型 a签名），然后确定在构造函数引入的每个细化下（例如，IsZ 引入细化 a ~ Bool），body eval t == 0 :: Bool 是否具有关联的 a.

的细化

如果没有明确提供目标类型，那么我们能做的最好的——一般来说——是使用一个新的类型变量p作为目标，并尝试检查每个改进的类型。

这意味着，给定以下 isA2 没有类型签名的定义：

data P t where
  PA :: P Int
  PB :: P Double
  PC :: P Char

isA2 = \p -> case p of
  PA -> True
  PB -> False
  PC -> False

GHC 试图做的是键入 isA2 :: P t -> p。对于备选方案：

PA -> True

它键入 PA :: P t 给出细化 t ~ Int，并且在此细化下，它尝试键入 True :: p。不幸的是，在涉及无关类型变量 a 的任何细化下，p 都不是 Bool，我们得到一个错误。其他替代方案也会产生类似的错误。

实际上，我们还可以做一件事。如果存在不引入类型细化的替代方案，那么它们主体的计算类型 NOT 比最终类型更具体。因此，如果我们统一 "unrefined" 备选方案的 body 类型，则生成的类型会为改进的备选方案提供合法的统一目标。

这意味着，例如：

isA3 = \p -> case p of
  PA -> True
  x  -> False

第二个选择：

x -> False

是通过匹配模式 x :: P t 来键入的，它没有引入类型细化。 body 的未精化类型是 Bool，这个类型是统一其他备选方案的合适目标。

具体来说，第一种选择：

PA -> True

匹配类型优化 a ~ Int。在这种细化下，body True :: Bool 的实际类型与目标类型 Bool 的 "refinement" 匹配（即 "refined" 到 Bool），并且备选方案被确定为具有有效类型。

因此，直觉是，如果没有通配符替代方案，case 表达式的推断类型是任意类型变量 p，这太笼统了，无法与类型优化替代方案统一。但是，当您添加通配符 _ -> False 时，它会在 t 中引入更具限制性的 body 类型 Bool统一过程在没有任何类型细化的情况下由模式 _ 推导出来，可以通过提供更具限制性的类型 Bool 来通知统一算法，另一个类型细化的替代方案可以统一到该类型。

在上面，我听起来好像有一些 two-phase 方法，其中首先检查 "non-refining" 备选方案以确定目标类型，然后根据它检查精炼备选方案。

事实上，细化过程将新变量引入统一过程，即使它们已统一，也不会影响更大的类型上下文。因此，所有备选方案一次统一，但未精炼备选方案的统一会影响更大的类型上下文，而精炼备选方案的统一会影响一堆新变量，从而产生相同的最终结果，就好像未精炼备选方案和精炼备选方案是分开处理的一样。

为什么在枚举所有情况时通配符匹配不起作用？

Why does a wildcard match work when enumerating all cases doesn't?

haskell

type-inference

pattern-matching

gadt