显示的语义 w.r.t。转义字符

Question

考虑以下示例（λ> = ghci，$ = shell）：

λ> writeFile "d" $ show "d"
$ cat d
"d"

λ> writeFile "d" "d"
$ cat d
d

λ> writeFile "backslash" $ show "\"
$ cat backslash
"\"

λ> writeFile "backslash" "\"
$ cat backslash
\

λ> writeFile "cat" $ show "" -- U+1F408
$ cat cat
"8008"

λ> writeFile "cat" ""
$ cat cat

我理解"8008"的另一种方式只是另一种表示方式 "" 在 Haskell 源代码中。我的问题是：为什么 "" 示例的行为像反斜杠而不是喜欢 "d"？因为它是一个可打印的字符，它不应该表现得像一封信？

更一般地说，判断角色是否会出现的规则是什么显示为可打印字符还是转义码？我看了 Section 6.3 在 Haskell 2010 年语言报告中，但它没有指定确切的行为。

Answer 1

TL:DR; ASCII 范围 (0-127) 内的可打印字符将作为图形字符 `show`n。* 其他所有内容都将被转义。

* 双引号除外（因为它们用于字符串定界符）和反斜杠（因为转义需要它们）。

让我们看一下源代码来解决这个问题！

因为我们有 String = [Char]，所以我们应该寻找 instance Show Char 来源。可以找到 here。它被定义为：

-- | @since 2.01
instance  Show Char  where
    showsPrec _ '\'' = showString "'\''"
    showsPrec _ c    = showChar '\'' . showLitChar c . showChar '\''

    showList cs = showChar '"' . showLitString cs . showChar '"'

所以显示 String（使用 showList）基本上是一个包装 ShowLitString，显示 Char 是 ShowLitChar 的包装。让我们看看这些功能。

showLitString :: String -> ShowS
-- | Same as 'showLitChar', but for strings
-- It converts the string to a string using Haskell escape conventions
-- for non-printable characters. Does not add double-quotes around the
-- whole thing; the caller should do that.
-- The main difference from showLitChar (apart from the fact that the
-- argument is a string not a list) is that we must escape double-quotes
showLitString []         s = s
showLitString ('"' : cs) s = showString "\\"" (showLitString cs s)
showLitString (c   : cs) s = showLitChar c (showLitString cs s)
   -- [explanatory comments ...]

如您所料，showLitString 主要是一个包装器 showLitChar。 [注意：如果你不熟悉 ShowS 类型，这是一个很好的 answer 明白为什么可能会有用。] 不是我们想要的，所以让我们去 showLitChar（我已经省略了与问题无关的定义部分。

-- | Convert a character to a string using only printable characters,
-- using Haskell source-language escape conventions.  For example:
-- [...]
showLitChar                :: Char -> ShowS
showLitChar c s | c > '\DEL' =  showChar '\' (protectEsc isDec (shows (ord c)) s)
-- ^ Pattern matched for cat
showLitChar '\DEL'         s =  showString "\DEL" s
showLitChar '\'           s =  showString "\\" s
-- ^ Pattern matched for backslash
showLitChar c s | c >= ' '   =  showChar c s
-- ^ Pattern matched for d
-- Some more escape codes
showLitChar '\a'           s =  showString "\a" s
-- similarly for '\b', '\f', '\n', '\r', '\t', '\v' etc.
-- showLitChar ... = ...

现在你知道问题出在哪里了。 ord c是一个int，取第一个对于所有 non-ASCII 个字符 (ord '\DEL' == 127)。对于 ASCII 范围内的字符，打印可打印字符并其余的都逃脱了。外面的字符全部转义。

代码没有回答问题的 "why" 部分。那个答案（我认为）在我们看到的第一条评论中：

-- | @since 2.01
instance  Show Char  where

~~如果我猜的话，这种行为一直存在以保持向后兼容性。~~我不需要猜测：请参阅评论以获得一些好的答案。

奖金

我们可以使用 GHC 的 Github 镜像在线 git blame ;)。让我们来看看这段代码是什么时候写的 (blame link)。相关的 commit 是 15 岁 (!)。但是，它确实提到了 Unicode。

区分不同类型Unicode字符的功能存在于 Data.Char 模块中。查看 source:

isPrint    c = iswprint (ord c) /= 0

foreign import ccall unsafe "u_iswprint"
  iswprint :: Int -> Int

如果您跟踪引入 iswprint 的提交，您将着陆 here。该承诺是 13 年前做出的。 ~~也许那两年写了足够的代码，但他们没有想打破？我不知道。如果一些 GHC 开发人员可以对此有更多的了解，那太棒了 :)。~~ Daniel Wagner 和 Paul Johnson 在评论中指出了一个很好的理由 - 使用 non-Unicode 系统操作一定是高优先级（~15 年以前）因为 Unicode 在当时还比较新。

显示的语义 w.r.t。转义字符

Semantics of show w.r.t. escape characters

haskell

ghc

TL:DR; ASCII 范围 (0-127) 内的可打印字符将作为图形字符 `show`n。* 其他所有内容都将被转义。

奖金

显示的语义 w.r.t。转义字符

Semantics of show w.r.t. escape characters

haskell

ghc

TL:DR; ASCII 范围 (0-127) 内的可打印字符将作为图形字符 shown。* 其他所有内容都将被转义。

奖金

TL:DR; ASCII 范围 (0-127) 内的可打印字符将作为图形字符 `show`n。* 其他所有内容都将被转义。