在类型系统中建模串行格式,如 Servant
Model a serial format in the type system, like Servant
我正在开发一个 API 集成,它会忽略 XML 或 JSON 的存在,而只支持附加字符数据。 (Metro2 格式,如果有兴趣)
我在简化,但是想象一下,一个人需要这样连载:
- 在位置 0,4 个字符:消息中的字节数
- 位置 5:6 个字符:
"PERSON"
硬编码
- 在位置 11:20 个字符:名称,左对齐和 space-填充
- 位置 21:8 个字符:生日,
YYYYMMDD
- 在位置 29:3 个字符:年龄,右对齐和零填充
数字字段始终右对齐并用零填充。文本字段始终左对齐并填充 space。
例如:
"0032PERSONDAVID WILCOX 19820711035"
我可以在类型系统中表达这个吗?就像 servant 那样?是这样的吗?
newtype ByteLength = ByteLength Int
newtype Age = Age Int
-- etc
type PersonMessage
= Field ByteLength '0
:| Field "PERSON" '5
:| Field Name '11
:| Field Date '21
:| Field Age '29
-- :| is a theoretical type operator, like :> in servant
-- the number is the expected offset
-- the length of the field is implicit in the type
我可以静态检查我的序列化实现是否与类型匹配吗?
我可以静态检查第 3 个字段 (Name
) 的偏移量是 11
吗?前面字段的长度加起来是 11?我假设没有,因为这似乎需要完全依赖类型支持。
这是在正确的轨道上吗?
instance ToMetro Age where
-- get the length into the type system using a type family?
field = Numeric '3
-- express how this is encoded. Would need to use the length from the type family. Or if that doesn't work, put it in the constructor.
toMetro age = Numeric age
更新:我想静态验证的函数示例:
personToMetro :: Person -> PersonMessage
personToMetro p = error "Make sure that what I return is a PersonMessage"
只是为了给你一些灵感,就照 Servant 做的那样,为你支持的不同组合器设置不同的类型:
{-# LANGUAGE GADTs, DataKinds, KindSignatures, TypeOperators, ScopedTypeVariables #-}
module Seriavant where
import GHC.TypeLits
import Data.Proxy
import Data.List (stripPrefix)
data Skip (n :: Nat) = Skip deriving Show
data Token (n :: Nat) = Token String deriving Show
data Lit (s :: Symbol) = Lit deriving Show
data (:>>) a b = a :>> b deriving Show
infixr :>>
class Deserialize a where
deserialize :: String -> Maybe (a, String)
instance (KnownNat n) => Deserialize (Skip n) where
deserialize s = do
(_, s') <- trySplit (natVal (Proxy :: Proxy n)) s
return (Skip, s')
instance (KnownNat n) => Deserialize (Token n) where
deserialize s = do
(t, s') <- trySplit (natVal (Proxy :: Proxy n)) s
return (Token t, s')
instance (KnownSymbol lit) => Deserialize (Lit lit) where
deserialize s = do
s' <- stripPrefix (symbolVal (Proxy :: Proxy lit)) s
return (Lit, s')
instance (Deserialize a, Deserialize b) => Deserialize (a :>> b) where
deserialize s = do
(x, s') <- deserialize s
(y, s'') <- deserialize s'
return (x :>> y, s'')
trySplit :: Integer -> [a] -> Maybe ([a], [a])
trySplit 0 xs = return ([], xs)
trySplit n (x:xs) = do
(xs', ys) <- trySplit (n-1) xs
return (x:xs', ys)
trySplit _ _ = Nothing
是的,这非常简单,但它已经允许您这样做
type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4
testDeserialize :: String -> Maybe MyFormat
testDeserialize = fmap fst . deserialize
它是这样工作的:
*Seriavant> testDeserialize "1"
Nothing
*Seriavant> testDeserialize "1234PERSON Foo "
Just (Token "1234" :>> (Lit :>> (Skip :>> Token "Foo ")))
编辑:结果我完全误读了这个问题,Sean 要求序列化,而不是反序列化......但是我们当然也可以这样做:
class Serialize a where
serialize :: a -> String
instance (KnownNat n) => Serialize (Skip n) where
serialize Skip = replicate (fromIntegral $ natVal (Proxy :: Proxy n)) ' '
instance (KnownNat n) => Serialize (Token n) where
serialize (Token t) = pad (fromIntegral $ natVal (Proxy :: Proxy n)) ' ' t
instance (KnownSymbol lit) => Serialize (Lit lit) where
serialize Lit = symbolVal (Proxy :: Proxy lit)
instance (Serialize a, Serialize b) => Serialize (a :>> b) where
serialize (x :>> y) = serialize x ++ serialize y
pad :: Int -> a -> [a] -> [a]
pad 0 _x0 xs = xs
pad n x0 (x:xs) = x : pad (n-1) x0 xs
pad n x0 [] = replicate n x0
(当然,所有这些 String
串联等的性能都很糟糕,但这不是这里的重点)
*Seriavant> serialize ((Token "1234" :: Token 4) :>> (Lit :: Lit "FOO") :>> (Skip :: Skip 2) :>> (Token "Bar" :: Token 10))
"1234FOO Bar "
当然,如果我们知道格式,我们就可以避免那些讨厌的类型注释:
type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4
testSerialize :: MyFormat -> String
testSerialize = serialize
*Seriavant> testSerialize (Token "1234" :>> Lit :>> Skip :>> Token "Bar")
"1234PERSON Bar "
我正在开发一个 API 集成,它会忽略 XML 或 JSON 的存在,而只支持附加字符数据。 (Metro2 格式,如果有兴趣)
我在简化,但是想象一下,一个人需要这样连载:
- 在位置 0,4 个字符:消息中的字节数
- 位置 5:6 个字符:
"PERSON"
硬编码 - 在位置 11:20 个字符:名称,左对齐和 space-填充
- 位置 21:8 个字符:生日,
YYYYMMDD
- 在位置 29:3 个字符:年龄,右对齐和零填充
数字字段始终右对齐并用零填充。文本字段始终左对齐并填充 space。
例如:
"0032PERSONDAVID WILCOX 19820711035"
我可以在类型系统中表达这个吗?就像 servant 那样?是这样的吗?
newtype ByteLength = ByteLength Int
newtype Age = Age Int
-- etc
type PersonMessage
= Field ByteLength '0
:| Field "PERSON" '5
:| Field Name '11
:| Field Date '21
:| Field Age '29
-- :| is a theoretical type operator, like :> in servant
-- the number is the expected offset
-- the length of the field is implicit in the type
我可以静态检查我的序列化实现是否与类型匹配吗?
我可以静态检查第 3 个字段 (Name
) 的偏移量是 11
吗?前面字段的长度加起来是 11?我假设没有,因为这似乎需要完全依赖类型支持。
这是在正确的轨道上吗?
instance ToMetro Age where
-- get the length into the type system using a type family?
field = Numeric '3
-- express how this is encoded. Would need to use the length from the type family. Or if that doesn't work, put it in the constructor.
toMetro age = Numeric age
更新:我想静态验证的函数示例:
personToMetro :: Person -> PersonMessage
personToMetro p = error "Make sure that what I return is a PersonMessage"
只是为了给你一些灵感,就照 Servant 做的那样,为你支持的不同组合器设置不同的类型:
{-# LANGUAGE GADTs, DataKinds, KindSignatures, TypeOperators, ScopedTypeVariables #-}
module Seriavant where
import GHC.TypeLits
import Data.Proxy
import Data.List (stripPrefix)
data Skip (n :: Nat) = Skip deriving Show
data Token (n :: Nat) = Token String deriving Show
data Lit (s :: Symbol) = Lit deriving Show
data (:>>) a b = a :>> b deriving Show
infixr :>>
class Deserialize a where
deserialize :: String -> Maybe (a, String)
instance (KnownNat n) => Deserialize (Skip n) where
deserialize s = do
(_, s') <- trySplit (natVal (Proxy :: Proxy n)) s
return (Skip, s')
instance (KnownNat n) => Deserialize (Token n) where
deserialize s = do
(t, s') <- trySplit (natVal (Proxy :: Proxy n)) s
return (Token t, s')
instance (KnownSymbol lit) => Deserialize (Lit lit) where
deserialize s = do
s' <- stripPrefix (symbolVal (Proxy :: Proxy lit)) s
return (Lit, s')
instance (Deserialize a, Deserialize b) => Deserialize (a :>> b) where
deserialize s = do
(x, s') <- deserialize s
(y, s'') <- deserialize s'
return (x :>> y, s'')
trySplit :: Integer -> [a] -> Maybe ([a], [a])
trySplit 0 xs = return ([], xs)
trySplit n (x:xs) = do
(xs', ys) <- trySplit (n-1) xs
return (x:xs', ys)
trySplit _ _ = Nothing
是的,这非常简单,但它已经允许您这样做
type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4
testDeserialize :: String -> Maybe MyFormat
testDeserialize = fmap fst . deserialize
它是这样工作的:
*Seriavant> testDeserialize "1" Nothing *Seriavant> testDeserialize "1234PERSON Foo " Just (Token "1234" :>> (Lit :>> (Skip :>> Token "Foo ")))
编辑:结果我完全误读了这个问题,Sean 要求序列化,而不是反序列化......但是我们当然也可以这样做:
class Serialize a where
serialize :: a -> String
instance (KnownNat n) => Serialize (Skip n) where
serialize Skip = replicate (fromIntegral $ natVal (Proxy :: Proxy n)) ' '
instance (KnownNat n) => Serialize (Token n) where
serialize (Token t) = pad (fromIntegral $ natVal (Proxy :: Proxy n)) ' ' t
instance (KnownSymbol lit) => Serialize (Lit lit) where
serialize Lit = symbolVal (Proxy :: Proxy lit)
instance (Serialize a, Serialize b) => Serialize (a :>> b) where
serialize (x :>> y) = serialize x ++ serialize y
pad :: Int -> a -> [a] -> [a]
pad 0 _x0 xs = xs
pad n x0 (x:xs) = x : pad (n-1) x0 xs
pad n x0 [] = replicate n x0
(当然,所有这些 String
串联等的性能都很糟糕,但这不是这里的重点)
*Seriavant> serialize ((Token "1234" :: Token 4) :>> (Lit :: Lit "FOO") :>> (Skip :: Skip 2) :>> (Token "Bar" :: Token 10)) "1234FOO Bar "
当然,如果我们知道格式,我们就可以避免那些讨厌的类型注释:
type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4
testSerialize :: MyFormat -> String
testSerialize = serialize
*Seriavant> testSerialize (Token "1234" :>> Lit :>> Skip :>> Token "Bar") "1234PERSON Bar "