Rust中的序列化:Serde
- Serde是什么?
- 什么是序列化序列化?
- Serde运行机制
- Serde Data Model
- Vistor Api
- Serializer Api
- Deserializer Api
- 具体示例流程分析
- 具体步骤:
- 那么依次这个结论是如何得出的呢?
- 什么是'de?
- 总结
Serde是什么?
Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.
名字是序列化和反序列化的缩写,serde就是一种高效且通用的Rust数据结构序列化反序列化框架。
什么是序列化序列化?
序列化指的是将定义的结构化数据转换成更容易存储和或传输的形式,如字节流。而反序列化则是将如流式数据重新恢复成本来的样子,方便开发者解析和处理逻辑。通常情况下序列化反序列化使用在网络通信上。如我们熟知的protobuf等。
Serde运行机制
如下图:
Serde Data Model
The Serde data model is the API by which data structures and data formats interact. You can think of it as Serde’s type system.
Serde 数据模型是DataType(DataStruct)与 DataFormat交互的Api,你可以认为它就是Serde的类型系统,
其中包含了Serialze与DeSerialze的Api,同时也有Vistor的Api,可以说,每一种类型的Api都对应了一批Api函数,每一个Api函数又会对应一种类型。
也就是Serde Data Model 是整个转换过程中的中间环节,DataType和DataFormat之间是不互知的,双方都只需要将各自的数据通过Serde Data Model Api转换成Serde Data类型。
Vistor Api
fn expecting(&self, formatter: &mut Formatter<'_>) -> Result;
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_i8<E>(self, v: i8) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_i16<E>(self, v: i16) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_i32<E>(self, v: i32) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_i64<E>(self, v: i64) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_i128<E>(self, v: i128) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_u16<E>(self, v: u16) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_u32<E>(self, v: u32) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_u128<E>(self, v: u128) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_f32<E>(self, v: f32) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_f64<E>(self, v: f64) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_char<E>(self, v: char) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_borrowed_str<E>(self, v: &'de str) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_borrowed_bytes<E>(self, v: &'de [u8]) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_byte_buf<E>(self, v: Vec<u8>) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_none<E>(self) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_some<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
where D: Deserializer<'de> { ... }
fn visit_unit<E>(self) -> Result<Self::Value, E>
where E: Error { ... }
fn visit_newtype_struct<D>(
self,
deserializer: D,
) -> Result<Self::Value, D::Error>
where D: Deserializer<'de> { ... }
fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error>
where A: SeqAccess<'de> { ... }
fn visit_map<A>(self, map: A) -> Result<Self::Value, A::Error>
where A: MapAccess<'de> { ... }
fn visit_enum<A>(self, data: A) -> Result<Self::Value, A::Error>
where A: EnumAccess<'de> { ... }
Serializer Api
// Provided methods
fn serialize_i128(self, v: i128) -> Result<Self::Ok, Self::Error> { ... }
fn serialize_u128(self, v: u128) -> Result<Self::Ok, Self::Error> { ... }
fn collect_seq<I>(self, iter: I) -> Result<Self::Ok, Self::Error>
where I: IntoIterator,
<I as IntoIterator>::Item: Serialize { ... }
fn collect_map<K, V, I>(self, iter: I) -> Result<Self::Ok, Self::Error>
where K: Serialize,
V: Serialize,
I: IntoIterator<Item = (K, V)> { ... }
fn collect_str<T>(self, value: &T) -> Result<Self::Ok, Self::Error>
where T: ?Sized + Display { ... }
fn is_human_readable(&self) -> bool { ... }
fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error>;
fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error>;
fn serialize_i16(self, v: i16) -> Result<Self::Ok, Self::Error>;
fn serialize_i32(self, v: i32) -> Result<Self::Ok, Self::Error>;
fn serialize_i64(self, v: i64) -> Result<Self::Ok, Self::Error>;
fn serialize_u8(self, v: u8) -> Result<Self::Ok, Self::Error>;
fn serialize_u16(self, v: u16) -> Result<Self::Ok, Self::Error>;
fn serialize_u32(self, v: u32) -> Result<Self::Ok, Self::Error>;
fn serialize_u64(self, v: u64) -> Result<Self::Ok, Self::Error>;
fn serialize_f32(self, v: f32) -> Result<Self::Ok, Self::Error>;
fn serialize_f64(self, v: f64) -> Result<Self::Ok, Self::Error>;
fn serialize_char(self, v: char) -> Result<Self::Ok, Self::Error>;
fn serialize_str(self, v: &str) -> Result<Self::Ok, Self::Error>;
fn serialize_bytes(self, v: &[u8]) -> Result<Self::Ok, Self::Error>;
fn serialize_none(self) -> Result<Self::Ok, Self::Error>;
fn serialize_some<T>(self, value: &T) -> Result<Self::Ok, Self::Error>
where T: ?Sized + Serialize;
fn serialize_unit(self) -> Result<Self::Ok, Self::Error>;
..
Deserializer Api
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_i8<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_i16<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_i64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_u8<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_u16<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_u32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_u64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_f32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_f64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_char<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_bytes<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_byte_buf<V>(
self,
visitor: V,
) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_unit_struct<V>(
self,
name: &'static str,
visitor: V,
) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
fn deserialize_newtype_struct<V>(
self,
name: &'static str,
visitor: V,
) -> Result<V::Value, Self::Error>
where V: Visitor<'de>;
...
具体示例流程分析
具体步骤:
- 初始化工程:
cargo init whatserde
- 将serde引入cargo.toml
serde = { version = "1", features = ["derive"] }
- main.rs
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
struct MyTestData {
a: u64,
b: String,
}
fn main() {
println!("Hello, world!");
}
- 使用cargo expand展开代码
没有安装cargo-expand的开发者可根据Link说明安装expand
Link
cargo expand > expand.rs
执行后将得到一份展开后的代码如下(部分展示):
- 序列化代码:
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
#[allow(unused_extern_crates, clippy::useless_attribute)]
extern crate serde as _serde;
#[automatically_derived]
impl _serde::Serialize for MyTestData {
fn serialize<__S>(
&self,
__serializer: __S,
) -> _serde::__private::Result<__S::Ok, __S::Error>
where
__S: _serde::Serializer,
{
let mut __serde_state = _serde::Serializer::serialize_struct(
__serializer,
"MyTestData",
false as usize + 1 + 1,
)?;
_serde::ser::SerializeStruct::serialize_field(
&mut __serde_state,
"a",
&self.a,
)?;
_serde::ser::SerializeStruct::serialize_field(
&mut __serde_state,
"b",
&self.b,
)?;
_serde::ser::SerializeStruct::end(__serde_state)
}
}
};
可以看到,序列化器先去序列化struct在分别支持序列化字段a和b,最后以end结尾,嵌套类的DataType也是如此,层层递进的序列化最后以end为标识符,表示到达结尾。
2)反序列化代码:
#[doc(hidden)]
const FIELDS: &'static [&'static str] = &["a", "b"];
_serde::Deserializer::deserialize_struct(
__deserializer,
"MyTestData",
FIELDS,
__Visitor {
marker: _serde::__private::PhantomData::<MyTestData>,
lifetime: _serde::__private::PhantomData,
},
)
观察deserialize_struct 是元组结构体,其中的包含__deserializer反序列化器,结构体名称,字段FILEDS,Visitor访问器,其中,
FILEDS:
const FIELDS: &'static [&'static str] = &["a", "b"];
为Visitor提供了访问顺序,visitor便会按照顺序依次访问下面的字段,在通过反序列化器调用对应的反序列化接口将字段解析,直到没有下一个字段。
那么依次这个结论是如何得出的呢?
代码如下:
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
#[allow(unused_extern_crates, clippy::useless_attribute)]
extern crate serde as _serde;
#[automatically_derived]
impl<'de> _serde::Deserialize<'de> for MyTestData {
fn deserialize<__D>(
__deserializer: __D,
) -> _serde::__private::Result<Self, __D::Error>
where
__D: _serde::Deserializer<'de>,
{
#[allow(non_camel_case_types)]
#[doc(hidden)]
enum __Field {
__field0,
__field1,
__ignore,
}
#[doc(hidden)]
struct __FieldVisitor;
可以观察到,有枚举值__filed0,__filed1,__ignore。这侧面印证了serde通过FILEDS顺序来使用__FiledVisitor访问每一个字段并反序列化。
------------------------__ignore是什么?
默认情况下,serde支持序列化方传来的DataType类型有增加(但不能减少),这会大大提高兼容性,(这有点像protobuf中的默认option),反序列化所需要的字段都存在,反序列化就不会出问题。
Serde支持了许多的Attributes,来限制或者扩展:
#[serde(rename = “?”)] 字段重命名。
#[serde(bound = “T : MyTrait”)] 限制只有实现了某种特征才能被序列化反序列化。
#[serde(default)] 即给予字段一个默认值,如果它为空的话。而不是报错。
#[serde(crate= “ …”)],即作为crate引入时可根据此标签重命名依赖包名称和导入。
具体的可以参考这里
什么是’de?
注意到,在反序列化中引入了一个生命周期【'de】,一般情况下,我们常见的生命周期要么是【'static】要么是单字符【`a】
来看看官方给出的解释:
This lifetime is what enables Serde to safely perform efficient zero-copy deserialization across a variety of data formats, something that would be impossible or recklessly unsafe in languages other than Rust.
Zero-copy deserialization means deserializing into a data structure, like the User struct below, that borrows string or byte array data from the string or byte array holding the input. This avoids allocating memory to store a string for each individual field and then copying string data out of the input over to the newly allocated field. Rust guarantees that the input data outlives the period during which the output data structure is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output data structure still refers to it.
也就是说,这个因为Rust的生命周期规则,Rust可以安全高效的使用零Copy反序列化方案,而这在其他语言中几乎必然是不安全的。
#[derive(Deserialize)]
struct User<'a> {
id: u32,
name: &'a str,
screen_name: &'a str,
location: &'a str,
}
Rust保证了在作用于下输入数据的寿命必然输出数据结构的寿命,这意味着在输出结构仍引用它的情况下是不可能出现悬垂指针的,保证了程序的安全和高效。
总结
以上便讨论完毕基本的Serde原理,后续计划会继续讨论如何实现Custom 序列化反序列化。