dataToTag 参数的严格性

Question

在GHC.Prim中，我们发现了一个神奇的函数，名字叫dataToTag#:

dataToTag# :: a -> Int#

它根据它使用的数据构造函数将任何类型的值转换为整数。这用于加速 Eq、Ord 和 Enum 的派生实现。在 GHC 源代码中，docs for dataToTag# 说明该参数应该已经被评估：

The dataToTag# primop should always be applied to an evaluated argument. The way to ensure this is to invoke it via the 'getTag' wrapper in GHC.Base:
getTag :: a -> Int#
getTag !x = dataToTag# x

我认为我们需要在调用 dataToTag# 之前强制执行 x 的计算是完全合理的。我不明白的是为什么 bang 模式就足够了。 getTag 的定义只是语法糖：

getTag :: a -> Int#
getTag x = x `seq` dataToTag# x

但让我们转向 docs for seq:

A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. If you need to guarantee a specific order of evaluation, you must use the function pseq from the "parallel" package.

在 parallel 包的 Control.Parallel 模块中，文档 elaborate further:

... seq is strict in both its arguments, so the compiler may, for example, rearrange a `seq` b into b `seq` a `seq` b ...

鉴于 seq 不足以控制评估顺序，getTag 如何保证正常工作？

Answer 1

GHC 跟踪每个 primop 的某些信息。一个关键数据是 primop 是否 "can_fail"。该标志的原始含义是如果 primop 可以导致硬故障，则它可以失败。例如，如果索引超出范围，数组索引可能会导致分段错误，因此索引操作可能会失败。

如果 primop 可能失败，GHC 将限制围绕它的某些转换，特别是不会将其从任何 case 表达式中浮动。就比较糟糕了，比如，if

if n < bound
then unsafeIndex c n
else error "out of range"

被编译为

case unsafeIndex v n of
  !x -> if n < bound
        then x
        else error "out of range"

其中一个底部是例外；另一个是段错误。

dataToTag# 标记为 can_fail。所以 GHC 看到（在核心中）类似

getTag = \x -> case x of
           y -> dataToTag# y

（注意 case 在 Core 中是严格的。）因为 dataToTag# 被标记为 can_fail，它不会从任何 case 表达式中浮出。

dataToTag 参数的严格性

Strictness of dataToTag argument

haskell

ghc