在类型系统中建模串行格式,如 Servant

Model a serial format in the type system, like Servant

我正在开发一个 API 集成,它会忽略 XML 或 JSON 的存在,而只支持附加字符数据。 (Metro2 格式,如果有兴趣)

我在简化,但是想象一下,一个人需要这样连载:

数字字段始终右对齐并用零填充。文本字段始终左对齐并填充 space。

例如:

"0032PERSONDAVID WILCOX        19820711035"

我可以在类型系统中表达这个吗?就像 servant 那样?是这样的吗?

newtype ByteLength = ByteLength Int
newtype Age = Age Int
-- etc

type PersonMessage
     = Field ByteLength '0
    :| Field "PERSON" '5
    :| Field Name '11
    :| Field Date '21
    :| Field Age '29

-- :| is a theoretical type operator, like :> in servant
-- the number is the expected offset
-- the length of the field is implicit in the type

我可以静态检查我的序列化实现是否与类型匹配吗?

我可以静态检查第 3 个字段 (Name) 的偏移量是 11 吗?前面字段的长度加起来是 11?我假设没有,因为这似乎需要完全依赖类型支持。

这是在正确的轨道上吗?

instance ToMetro Age where
   -- get the length into the type system using a type family?
   field = Numeric '3

   -- express how this is encoded. Would need to use the length from the type family. Or if that doesn't work, put it in the constructor.
   toMetro age = Numeric age

更新:我想静态验证的函数示例:

personToMetro :: Person -> PersonMessage
personToMetro p = error "Make sure that what I return is a PersonMessage"

只是为了给你一些灵感,就照 Servant 做的那样,为你支持的不同组合器设置不同的类型:

{-# LANGUAGE GADTs, DataKinds, KindSignatures, TypeOperators, ScopedTypeVariables #-}

module Seriavant where

import GHC.TypeLits
import Data.Proxy
import Data.List (stripPrefix)

data Skip (n :: Nat) = Skip deriving Show
data Token (n :: Nat) = Token String deriving Show
data Lit (s :: Symbol) = Lit deriving Show

data (:>>) a b = a :>> b deriving Show
infixr :>>

class Deserialize a where
    deserialize :: String -> Maybe (a, String)

instance (KnownNat n) => Deserialize (Skip n) where
    deserialize s = do
        (_, s') <- trySplit (natVal (Proxy :: Proxy n)) s
        return (Skip, s')

instance (KnownNat n) => Deserialize (Token n) where
    deserialize s = do
        (t, s') <- trySplit (natVal (Proxy :: Proxy n)) s
        return (Token t, s')

instance (KnownSymbol lit) => Deserialize (Lit lit) where
    deserialize s = do
        s' <- stripPrefix (symbolVal (Proxy :: Proxy lit)) s
        return (Lit, s')

instance (Deserialize a, Deserialize b) => Deserialize (a :>> b) where
    deserialize s = do
        (x, s') <- deserialize s
        (y, s'') <- deserialize s'
        return (x :>> y, s'')

trySplit :: Integer -> [a] -> Maybe ([a], [a])
trySplit 0 xs = return ([], xs)
trySplit n (x:xs) = do
    (xs', ys) <- trySplit (n-1) xs
    return (x:xs', ys)
trySplit _ _ = Nothing

是的,这非常简单,但它已经允许您这样做

type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4

testDeserialize :: String -> Maybe MyFormat
testDeserialize = fmap fst . deserialize

它是这样工作的:

*Seriavant> testDeserialize "1"
Nothing
*Seriavant> testDeserialize "1234PERSON Foo "
Just (Token "1234" :>> (Lit :>> (Skip :>> Token "Foo ")))

编辑:结果我完全误读了这个问题,Sean 要求序列化,而不是反序列化......但是我们当然也可以这样做:

class Serialize a where
    serialize :: a -> String

instance (KnownNat n) => Serialize (Skip n) where
    serialize Skip = replicate (fromIntegral $ natVal (Proxy :: Proxy n)) ' '

instance (KnownNat n) => Serialize (Token n) where
    serialize (Token t) = pad (fromIntegral $ natVal (Proxy :: Proxy n)) ' ' t

instance (KnownSymbol lit) => Serialize (Lit lit) where
    serialize Lit = symbolVal (Proxy :: Proxy lit)

instance (Serialize a, Serialize b) => Serialize (a :>> b) where
    serialize (x :>> y) = serialize x ++ serialize y

pad :: Int -> a -> [a] -> [a]
pad 0 _x0 xs = xs
pad n x0 (x:xs) = x : pad (n-1) x0 xs
pad n x0 [] = replicate n x0

(当然,所有这些 String 串联等的性能都很糟糕,但这不是这里的重点)

*Seriavant> serialize ((Token "1234" :: Token 4) :>> (Lit :: Lit "FOO") :>> (Skip :: Skip 2) :>> (Token "Bar" :: Token 10))
"1234FOO  Bar       "

当然,如果我们知道格式,我们就可以避免那些讨厌的类型注释:

type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4

testSerialize :: MyFormat -> String
testSerialize = serialize
*Seriavant> testSerialize (Token "1234" :>> Lit :>> Skip :>> Token "Bar")
"1234PERSON Bar "