我们如何编写一个通用函数来检查 Serde 序列化和反序列化?

How can we write a generic function for checking Serde serialization and deserialization?

在一个涉及自定义 Serde (1.0) 序列化和反序列化方法的项目中,我依靠这个测试例程来检查序列化对象并返回是否会产生等效对象。

// let o: T = ...;
let buf: Vec<u8> = to_vec(&o).unwrap();
let o2: T = from_slice(&buf).unwrap();
assert_eq!(o, o2);

执行此内联操作效果很好。我朝着可重用性迈出的下一步是为此目的制作一个函数 check_serde

pub fn check_serde<T>(o: T)
where
    T: Debug + PartialEq<T> + Serialize + DeserializeOwned,
{
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: T = from_slice(&buf).unwrap();
    assert_eq!(o, o2);
}

这适用于拥有类型,但不适用于具有生命周期范围 (Playground) 的类型:

check_serde(5);
check_serde(vec![1, 2, 5]);
check_serde("five".to_string());
check_serde("wait"); // [E0279]

错误:

error[E0279]: the requirement `for<'de> 'de : ` is not satisfied (`expected bound lifetime parameter 'de, found concrete lifetime`)
  --> src/main.rs:24:5
   |
24 |     check_serde("wait"); // [E0277]
   |     ^^^^^^^^^^^
   |
   = note: required because of the requirements on the impl of `for<'de> serde::Deserialize<'de>` for `&str`
   = note: required because of the requirements on the impl of `serde::de::DeserializeOwned` for `&str`
   = note: required by `check_serde`

由于我希望使该函数适用于这些情况(包括带有字符串切片的结构),我尝试了一个具有显式对象反序列化生命周期的新版本:

pub fn check_serde<'a, T>(o: &'a T)
where
    T: Debug + PartialEq<T> + Serialize + Deserialize<'a>,
{
    let buf: Vec<u8> = to_vec(o).unwrap();
    let o2: T = from_slice(&buf).unwrap();
    assert_eq!(o, &o2);
}

check_serde(&5);
check_serde(&vec![1, 2, 5]);
check_serde(&"five".to_string());
check_serde(&"wait"); // [E0405]

此实现会导致另一个问题,无法编译 (Playground)。

error[E0597]: `buf` does not live long enough
  --> src/main.rs:14:29
   |
14 |     let o2: T = from_slice(&buf).unwrap();
   |                             ^^^ does not live long enough
15 |     assert_eq!(o, &o2);
16 | }
   | - borrowed value only lives until here
   |
note: borrowed value must be valid for the lifetime 'a as defined on the function body at 10:1...
  --> src/main.rs:10:1
   |
10 | / pub fn check_serde<'a, T>(o: &'a T)
11 | |     where T: Debug + PartialEq<T> + Serialize + Deserialize<'a>
12 | | {
13 | |     let buf: Vec<u8> = to_vec(o).unwrap();
14 | |     let o2: T = from_slice(&buf).unwrap();
15 | |     assert_eq!(o, &o2);
16 | | }
   | |_^

我已经预料到了这一点:这个版本意味着序列化内容(以及反序列化对象)与输入对象一样长,但事实并非如此。缓冲区只能在函数的范围内存在。

我的第三次尝试旨在构建原始输入的拥有版本,从而避免具有不同生命周期边界的反序列化对象的问题。 ToOwned 特征似乎适合这个用例。

pub fn check_serde<'a, T: ?Sized>(o: &'a T)
where
    T: Debug + ToOwned + PartialEq<<T as ToOwned>::Owned> + Serialize,
    <T as ToOwned>::Owned: Debug + DeserializeOwned,
{
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: T::Owned = from_slice(&buf).unwrap();
    assert_eq!(o, &o2);
}

这使得该函数现在适用于纯字符串切片,但不适用于包含它们的复合对象 (Playground):

check_serde(&5);
check_serde(&vec![1, 2, 5]);
check_serde(&"five".to_string());
check_serde("wait");
check_serde(&("There's more!", 36)); // [E0279]

再次,我们偶然发现了与第一个版本相同的错误类型:

error[E0279]: the requirement `for<'de> 'de : ` is not satisfied (`expected bound lifetime parameter 'de, found concrete lifetime`)
  --> src/main.rs:25:5
   |
25 |     check_serde(&("There's more!", 36)); // [E0279]
   |     ^^^^^^^^^^^
   |
   = note: required because of the requirements on the impl of `for<'de> serde::Deserialize<'de>` for `&str`
   = note: required because of the requirements on the impl of `for<'de> serde::Deserialize<'de>` for `(&str, {integer})`
   = note: required because of the requirements on the impl of `serde::de::DeserializeOwned` for `(&str, {integer})`
   = note: required by `check_serde`

当然,我不知所措。我们如何构建一个通用函数,使用 Serde 序列化一个对象并将其反序列化回一个新对象?特别是,这个功能可以在 Rust 中实现(稳定的还是夜间的),如果可以,我的实现缺少哪些调整?

已经表明,如果没有泛型关联类型,我们将无法有效地做到这一点。建立反序列化对象的深层所有权是一种可能的解决方法,我在此处进行了描述。

第三次尝试非常接近灵活的解决方案,但由于 std::borrow::ToOwned 的工作方式,它还不够。该特征不适合检索对象的深度拥有版本。例如,尝试为 &str 使用 ToOwned 的实现,会给你另一个字符串切片。

let a: &str = "hello";
let b: String = (&a).to_owned(); // expected String, got &str

同样,包含字符串切片的结构的 Owned 类型不能是包含 String 的结构。在代码中:

#[derive(Debug, PartialEq, Serialize, Deserialize)]
struct Foo<'a>(&str, i32);

#[derive(Debug, PartialEq, Serialize, Deserialize)]
struct FooOwned(String, i32);

我们无法为 Foo 实施 ToOwned 来提供 FooOwned,因为:

  • 如果我们推导CloneToOwnedT: Clone的实现只适用于Owned = Self
  • 即使使用 ToOwned 的自定义实现,特征也要求拥有的类型可以借用到原始类型中(由于约束 Owned: Borrow<Self>)。也就是说,我们应该可以从一个FooOwned中取出一个&Foo(&str, i32),但是它们的内部结构不同,所以这是不可能的。

这意味着,为了遵循第三种方法,我们需要一个不同的特征。让我们有一个新特征 ToDeeplyOwned,它将一个对象变成一个完全拥有的对象,不涉及任何切片或引用。

pub trait ToDeeplyOwned {
    type Owned;
    fn to_deeply_owned(&self) -> Self::Owned;
}

这里的目的是从任何东西中产生一个深拷贝。似乎没有一个简单的包罗万象的实现,但可以使用一些技巧。首先,我们可以将它实现到所有 T: ToDeeplyOwned.

的引用类型
impl<'a, T: ?Sized + ToDeeplyOwned> ToDeeplyOwned for &'a T {
    type Owned = T::Owned;
    fn to_deeply_owned(&self) -> Self::Owned {
        (**self).to_deeply_owned()
    }
}

在这一点上,我们必须有选择地将它实现为我们知道可以的非引用类型。我写了一个宏来使这个过程不那么冗长,它在内部使用 to_owned()

macro_rules! impl_deeply_owned {
    ($t: ty, $t2: ty) => { // turn $t into $t2
        impl ToDeeplyOwned for $t {
            type Owned = $t2;
            fn to_deeply_owned(&self) -> Self::Owned {
                self.to_owned()
            }
        }
    };
    ($t: ty) => { // turn $t into itself, self-contained type
        impl ToDeeplyOwned for $t {
            type Owned = $t;
            fn to_deeply_owned(&self) -> Self::Owned {
                self.to_owned()
            }
        }
    };
}

要使问题中的示例起作用,我们至少需要这些:

impl_deeply_owned!(i32);
impl_deeply_owned!(String);
impl_deeply_owned!(Vec<i32>);
impl_deeply_owned!(str, String);

一旦我们在 Foo/FooOwned 上实现了必要的特征并调整 serde_check 以使用新特征,代码现在可以成功编译和运行 (Playground) :

#[derive(Debug, PartialEq, Serialize)]
struct Foo<'a>(&'a str, i32);

#[derive(Debug, PartialEq, Clone, Deserialize)]
struct FooOwned(String, i32);

impl<'a> ToDeeplyOwned for Foo<'a> {
    type Owned = FooOwned;

    fn to_deeply_owned(&self) -> FooOwned {
        FooOwned(self.0.to_string(), self.1)
    }
}

impl<'a> PartialEq<FooOwned> for Foo<'a> {
    fn eq(&self, o: &FooOwned) -> bool {
        self.0 == o.0 && self.1 == o.1
    }
}

pub fn check_serde<'a, T: ?Sized>(o: &'a T)
where
    T: Debug + ToDeeplyOwned + PartialEq<<T as ToDeeplyOwned>::Owned> + Serialize,
    <T as ToDeeplyOwned>::Owned: Debug + DeserializeOwned,
{
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: T::Owned = from_slice(&buf).unwrap();
    assert_eq!(o, &o2);
}

// all of these are ok
check_serde(&5);
check_serde(&vec![1, 2, 5]);
check_serde(&"five".to_string());
check_serde("wait");
check_serde(&"wait");
check_serde(&Foo("There's more!", 36));

不幸的是,您需要的是 Rust 尚未实现的功能:泛型关联类型。

让我们看看 check_serde 的不同变体:

pub fn check_serde<T>(o: T)
where
    for<'a> T: Debug + PartialEq<T> + Serialize + Deserialize<'a>,
{
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: T = from_slice(&buf).unwrap();
    assert_eq!(o, o2);
}

fn main() {
    check_serde("wait"); // [E0279]
}

这里的问题是o2不能是T类型:o2指的是buf,是一个局部变量,但是无法推断出类型参数受限于函数主体的生命周期的类型。我们希望 T&str 一样没有 特定的生命周期。

对于通用关联类型,这可以通过类似这样的方式解决(显然我无法测试它,因为它尚未实现):

trait SerdeFamily {
    type Member<'a>: Debug + for<'b> PartialEq<Self::Member<'b>> + Serialize + Deserialize<'a>;
}

struct I32Family;
struct StrFamily;

impl SerdeFamily for I32Family {
    type Member<'a> = i32; // ignoring a parameter is allowed
}

impl SerdeFamily for StrFamily {
    type Member<'a> = &'a str;
}

pub fn check_serde<'a, Family>(o: Family::Member<'a>)
where
    Family: SerdeFamily,
{
    let buf: Vec<u8> = to_vec(&o).unwrap();
    // `o2` is of type `Family::Member<'b>`
    // with a lifetime 'b different from 'a
    let o2: Family::Member = from_slice(&buf).unwrap();
    assert_eq!(o, o2);
}

fn main() {
    check_serde::<I32Family>(5);
    check_serde::<StrFamily>("wait");
}

简单(但有点笨拙)的解决方案:从函数外部提供 buf

pub fn check_serde<'a, T>(o: &'a T, buf: &'a mut Vec<u8>)
where
    T: Debug + PartialEq<T> + Serialize + Deserialize<'a>,
{
    *buf = to_vec(o).unwrap();
    let o2: T = from_slice(buf).unwrap();
    assert_eq!(o, &o2);
}

buf 可以与 Cursor

重复使用
pub fn check_serde_with_cursor<'a, T>(o: &'a T, buf: &'a mut Vec<u8>)
where
    T: Debug + PartialEq<T> + Serialize + Deserialize<'a>,
{
    buf.clear();
    let mut cursor = Cursor::new(buf);
    to_writer(&mut cursor, o).unwrap();
    let o2: T = from_slice(cursor.into_inner()).unwrap();
    assert_eq!(o, &o2);
}

更新 (04.09.2021):

最新的每晚有一些关于 GAT 的修复,基本上允许原始示例:

#![feature(generic_associated_types)]

use serde::{Deserialize, Serialize};
use serde_json::{from_slice, to_vec};
use std::fmt::Debug;

trait SerdeFamily {
    type Member<'a>:
        Debug +
        for<'b> PartialEq<Self::Member<'b>> +
        Serialize +
        Deserialize<'a>;
}

struct I32Family;
struct StrFamily;

impl SerdeFamily for I32Family {
    type Member<'a> = i32;
}

impl SerdeFamily for StrFamily {
    type Member<'a> = &'a str;
}

fn check_serde<F: SerdeFamily>(o: F::Member<'_>) {
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: F::Member<'_> = from_slice(&buf).unwrap();
    assert_eq!(o, o2);
}

fn main() {
    check_serde::<I32Family>(5);
    check_serde::<StrFamily>("wait");
}

上面的例子现在编译:playground.


截至目前,可以在 Rust Nightly 上实现此功能(使用显式变通解决方法):

#![feature(generic_associated_types)]

use serde::{Deserialize, Serialize};
use serde_json::{from_slice, to_vec};
use std::fmt::Debug;

trait SerdeFamily {
    type Member<'a>: Debug + PartialEq + Serialize + Deserialize<'a>;
    
    // https://internals.rust-lang.org/t/variance-of-lifetime-arguments-in-gats/14769/19
    fn upcast_gat<'short, 'long: 'short>(long: Self::Member<'long>) -> Self::Member<'short>;
}

struct I32Family;
struct StrFamily;

impl SerdeFamily for I32Family {
    type Member<'a> = i32; // we can ignore parameters

    fn upcast_gat<'short, 'long: 'short>(long: Self::Member<'long>) -> Self::Member<'short> {
        long
    }
}

impl SerdeFamily for StrFamily {
    type Member<'a> = &'a str;

    fn upcast_gat<'short, 'long: 'short>(long: Self::Member<'long>) -> Self::Member<'short> {
        long
    }
}

fn check_serde<F: SerdeFamily>(o: F::Member<'_>) {
    let buf: Vec<u8> = to_vec(&o).unwrap();
    let o2: F::Member<'_> = from_slice(&buf).unwrap();
    assert_eq!(F::upcast_gat(o), o2);
}

fn main() {
    check_serde::<I32Family>(5);
    check_serde::<StrFamily>("wait");
}

Playground