具有编译时成员和版本检查的快速二进制序列化器

Alexander Lednev

5.00/5 (4投票s)

2019年10月28日

CPOL

4分钟阅读

6510

123

这是一个具有编译时成员和版本检查的快速二进制序列化器。

下载源代码 - 6.1 KB

引言

让我们稍微思考一下二进制序列化。这个问题相当普遍，每个人至少解决过一次。但如果我们给这个问题增加一些严格的限制呢？

我们的二进制序列化器必须支持版本控制，并尝试在兼容的版本之间进行降级查找。
我们希望知道当前版本是否没有与任何序列化器版本兼容。
我们不想为 POD 结构描述每个字段的序列化。
我们确实希望在序列化结构成员时支持递归。
每个序列化的结构版本都可以有不同的字段，这些字段可能不存在于其他版本的结构中。
当然，我们想要**惊人的**速度，否则呢 :), 所以以上所有内容都应该在编译时进行检查。

任务很清楚，似乎没有什么复杂的 :。

背景

让我们来解决这个问题。我们的序列化器需要支持 4 个通用操作：

序列化
反序列化（如果无法从中恢复有用数据，谁还需要我们的数据缓冲区）
上述操作所需的空间
移动中间对象

我们如何序列化数据？当然是通过一致地复制所有可序列化的成员。
那么数据数组和字符串呢？很简单，我们先放入元素的数量，然后逐个序列化所有元素。

也就是说，int32 将被序列化为 4 个字节。
XXXX

数组和字符串将如下表示（对于 x64）：
SSSSSSSS DDDDDDDDDDD，其中 S 是项目数量，D 是数据项

Using the Code

让我们定义一些结构版本。

#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 30

编译器必须检查我们当前的结构版本，然后尝试找到最接近的实现。我们以后也必须记住 POD 结构检查，因为它们都可以通过简单复制来序列化。在这里，您可以考虑平台和缓冲区中数据的字节序。我们的序列化代码的通用视图将是：

template<typename __Type, uint32_t __IsPOD> struct Core {
    template<int32_t __Maj, int32_t __Min> struct Serializer {
        static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
            return false;
        }
    };
};

template<typename __Type> struct Core<__Type, 1> {
    template<int32_t __Maj, int32_t __Min> struct Serializer {
        static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
            const uint32_t typeSz = bsr_size(t);
            if (size >= typeSz) {
                ::memcpy(buffer, &t, typeSz);
                buffer += typeSz;
                size   -= typeSz;
                return true;
            }
            SERIALIZER_ASSERT(!"Too less buffer size for serialization!");
            return false;
        }
    };
};

调用标准函数 is_pod 将返回类型是否为 POD。在 C++14（和 C++11）中，POD。
POD 结构是一个非联合类，它既是平凡类又是标准布局类，并且没有非 POD 结构、非 POD 联合（或此类类型的数组）的非静态数据成员。简而言之，POD 没有非平凡构造函数、非平凡复制和移动构造函数、非平凡析构函数、继承、私有和保护成员、非平凡复制赋值和移动运算符、虚函数、非 POD 成员。

可以调用序列化例程的函数将是：

template<typename __Type> SERIALIZER_INLINE bool bsr_serialize
                          (__Type& t, uint8_t*& buffer, uint32_t& size) {
    SERIALIZER_ASSERT(buffer != nullptr);
    SERIALIZER_ASSERT(size >= bsr_size(t));
    return Core<__Type, std::is_pod<__Type>::value>::Serializer
               <API_VERSION_MAJOR, API_VERSION_MINOR>::proc(t, buffer, size);
}

现在我们需要查看每个类型的序列化器模板特化。首先，我们需要默认实现，因为我们需要在编译器查找兼容版本时递减版本。类似这样：

template<> struct Core<Test, 0> {
    template<int32_t __Maj, int32_t __Min>  struct Serializer {
        static bool proc(Test& t, uint8_t*& buffer, uint32_t& size) {
            static_assert(__Min >= 0, __FUNCTION__ " 
            is not defined for this version.");    // Just in case, we want to be 
                                                   // good programmers, 
                                                   // and therefore we should not 
                                                   // even believe ourselves.
            return Serializer<__Maj, __Min - 1>::proc(t, buffer, size);
        }
    };
};

将 struct Serializer 代码放入宏中很有用，因为它必须为每种类型的每个模板特化声明。

这只是一个带有次要版本递减的递归调用，并以 static_assert 停止。因此，如果我们找不到任何东西，static_assert 将帮助我们检测到。总体方案很清楚。有一个 Core 类，还有一个针对可序列化的非 POD 类型的特化，它在查找合适候选者时递减版本。如果找不到，它会进入默认实现。

现在，根据第 5 点，我们需要考虑这样一个事实：不同版本的结构中可能不存在某些类成员。SFINAE（“Substitution Failure Is Not An Error”）原则将有助于我们。简而言之，在定义函数重载时，错误的模板实例化不会导致编译错误，而是从最合适的重载候选中被丢弃。有关更多信息，请参阅文档。以下宏定义了一个结构，它可以帮助我们在编译时检查成员是否存在。

#define SFINAE_DECLARE_MEMBER(parent,type,name) \
    template<typename T> struct __sfiname_has_mem_ ## parent ## name { \
        struct Fallback { type name; }; \
        struct Derived : T, Fallback { }; \
        template<typename C, C> struct ChT; \
        template<typename C> static char(&f(ChT<type Fallback::*, &C::name>*))[1]; \
        template<typename C> static char(&f(...))[2]; \
        static bool const value = sizeof(f<Derived>(0)) == 2; \
    };

这段代码的结果可以用作模板参数。以下代码将仅在模板参数不是 0 时才调用序列化函数，即，类/结构的成员存在。

template<int enabled> struct InternalSerialize { 
    template<typename __Type> static bool proc(__Type& t, uint8_t*& buffer, uint32_t& size) {
        bool res = false;
        DEFINE_INIT_SIZE;
        res = binary_serialization::bsr_serialize(t.name, buffer, size);
        CHECK_BEC_SIZE(unique,type,name);
        return res;
    }
};
template<> struct InternalSerialize<0> {
    template<typename __Type> static bool proc
            (__Type& /*t*/, uint8_t*& /*buffer*/, uint32_t& /*size*/) {
        SERIALIZER_ASSERT(!"Unexpected serialize routine!");
        return false;
    }
};

请注意，此代码必须为每个可序列化的 __Type 定义。因此，使用宏很有用。
一些有用的可序列化类型需要额外的函数实现，例如（我将不考虑自定义分配器，因为您可以轻松修改代码来使用它们）。

template<typename __Type> SERIALIZER_INLINE 
      bool bsr_serialize(__Type& t, uint8_t*& buffer, uint32_t& size);
template<typename __Type> SERIALIZER_INLINE 
      bool bsr_serialize(std::vector<__Type>& t, uint8_t*& buffer, uint32_t& size);
template<uint32_t __Sz>   SERIALIZER_INLINE 
      bool bsr_serialize(wchar_t(&t)[__Sz], uint8_t*& buffer, uint32_t& size);
template<typename __Type> SERIALIZER_INLINE 
      bool bsr_serialize(std::basic_string<__Type>& t, uint8_t*& buffer, uint32_t& size);

结果，我们可以验证成员的存在，并在成功时运行序列化，该序列化器考虑了 POD 类型并检查了版本。我们同样定义了反序列化、大小和移动函数。您可以在附件中找到最终代码。

示例用法

让我们谈谈用法。我在最终代码中定义了几个宏，以便更轻松地使用。
例如，我们可以使用以下结构：

// Now we have version 1.30
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 30

struct Test {
    std::vector<int> id;
    // std::vector<int> id_2;     <--- This field was removed in version 1.2, 
    // but it exists in version 1.1 (different versions of struct can be declared 
    // in different namespaces).
    std::string      login;
};

namespace binary_serialization {
#   include "binary_serializer.hpp"

    // We declare each serializeble member of this struct.
    DECLARE_SERIALIZABLE_MEMBER(Test, std::vector<int>, id);
    DECLARE_SERIALIZABLE_MEMBER(Test, std::vector<int>, id_2);    // <- This too.
    DECLARE_SERIALIZABLE_MEMBER(Test, std::string,      login);

    // Now we must declare template specialization with Test structure, 
    // non-POD, or even POD...as you wish
    template<> struct Core<Test, 0> {
        // Put default implementation of serialization
        typedef Test Type_t;
        DEFAULT_IMPLEMENTATION(Type_t);

        // And declare all operations for desired versions.
        template<> struct Serializer<1, 2> {                       // <--- version is here
            static bool proc(Type_t& t, uint8_t*& buffer, uint32_t& size) {
                return
                    _INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_SERIALIZE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Serializer<1, 1> {
            static bool proc(Type_t& t, uint8_t*& buffer, uint32_t& size) {
                return
                    _INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_SERIALIZE(Test, Type_t, std::vector<int>, id_2) &&  // <- we have
                                                        // id_2 in 1.1 version, remember that?
                    _INTERNAL_SERIALIZE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Move<1, 1> {
            static bool proc(Type_t& src, Type_t& dst) {
                return
                    _INTERNAL_MOVE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_MOVE(Test, Type_t, std::vector<int>, id_2) &&
                    _INTERNAL_MOVE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Move<1, 2> {
            static bool proc(Type_t& src, Type_t& dst) {
                return
                    _INTERNAL_MOVE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_MOVE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Deserializer<1, 1> {
            static bool proc(Type_t& t, const uint8_t*& buffer, uint32_t& size) {
                return
                    _INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id_2) &&
                    _INTERNAL_DESERIALIZE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Deserializer<1, 2> {
            static bool proc(Type_t& t, const uint8_t*& buffer, uint32_t& size) {
                return
                    _INTERNAL_DESERIALIZE(Test, Type_t, std::vector<int>, id) &&
                    _INTERNAL_DESERIALIZE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Size<1, 1> {
            static uint32_t proc(Type_t& t) {
                return
                    _INTERNAL_SIZE(Test, Type_t, std::vector<int>, id) +
                    _INTERNAL_SIZE(Test, Type_t, std::vector<int>, id_2) +
                    _INTERNAL_SIZE(Test, Type_t, std::string, login);
            }
        };
        template<> struct Size<1, 2> {
            static uint32_t proc(Type_t& t) {
                return
                    _INTERNAL_SIZE(Test, Type_t, std::vector<int>, id) +
                    _INTERNAL_SIZE(Test, Type_t, std::string, login);
            }
        };
    };
}

我们测试的主函数将如下所示：

int main() {
    uint8_t  buffer[1024];
    {
        Test t_1 = { { 1, 2, 3, 4, 5 }, "test_login" };
        uint8_t* buffer_ptr = buffer;
        uint32_t buffer_size = sizeof(buffer);
        binary_serialization::bsr_serialize(t_1, buffer_ptr, buffer_size);
    }
    {
        Test t_1;
        uint8_t const* buffer_ptr = buffer;
        uint32_t buffer_size = sizeof(buffer);
        binary_serialization::bsr_deserialize(t_1, buffer_ptr, buffer_size);

        printf("%s", t_1.login.c_str());
    }
    return 0;
}

这个程序只是创建并初始化一个对象，将其序列化到缓冲区，然后反序列化到另一个对象。

在反汇编中，您可以（Visual Studio 2017，使用 /O2 优化）看到：

00000000011710FC  mov         ecx,8  
0000000001171101  xor         eax,eax  
0000000001171103  mov         r10,qword ptr [rsp+58h]  
0000000001171108  mov         r8,qword ptr [t_1]  
000000000117110D  sub         r10,r8  
0000000001171110  sar         r10,2  
0000000001171114  test        r10,r10  
0000000001171117  je          main+0BCh (0117112Ch)  
0000000001171119  nop         dword ptr [rax]  
0000000001171120  add         rcx,4                            ; <- here we calculate 
                                                        the size for each element if 'id'
0000000001171124  inc         rax  
0000000001171127  cmp         rax,r10  
000000000117112A  jb          main+0B0h (01171120h)  
000000000117112C  cmp         ecx,400h  
0000000001171132  ja          main+156h (011711C6h)  
0000000001171138  mov         qword ptr [rsp+20h],r10  
000000000117113D  movsd       xmm0,mmword ptr [rsp+20h]  
0000000001171143  movsd       mmword ptr [rbp-70h],xmm0        ; <- size serialization
0000000001171148  lea         r9,[rbp-68h]  
000000000117114C  mov         ecx,3F8h

00000000003B1151  xor         edx,edx  
00000000003B1153  test        r10,r10  
00000000003B1156  je          main+114h (03B1184h)  
00000000003B1158  cmp         ecx,4                            ; <- the loop begin, 
                                                 where vector elements are serializing
00000000003B115B  jb          main+156h (03B11C6h)  
00000000003B115D  mov         eax,dword ptr [r8+rdx*4]  
00000000003B1161  mov         dword ptr [r9],eax               ; <- store id[i] to the buffer
00000000003B1164  add         r9,4  
00000000003B1168  add         ecx,0FFFFFFFCh  
00000000003B116B  inc         rdx                              ; <- loop counter
00000000003B116E  mov         rax,qword ptr [rsp+58h]  
00000000003B1173  mov         r8,qword ptr [t_1]  
00000000003B1178  sub         rax,r8  
00000000003B117B  sar         rax,2  
00000000003B117F  cmp         rdx,rax
00000000003B1182  jb          main+0E8h (03B1158h)             ; <- jump to the begin of the 
                                                                    loop if counter is below
00000000003B1184  mov         r8,qword ptr [rsp+78h]  
00000000003B1189  mov         qword ptr [rsp+20h],r8  
00000000003B118E  cmp         ecx,8  
00000000003B1191  jb          main+156h (03B11C6h)  

00000000003B1193  movsd       xmm0,mmword ptr [rsp+20h]        ; <- move the string size 
                                                                    to the buffer
00000000003B1199  movsd       mmword ptr [r9],xmm0  
00000000003B119E  add         ecx,0FFFFFFF8h  
00000000003B11A1  mov         eax,ecx  
00000000003B11A3  cmp         r8,rax  
00000000003B11A6  ja          main+156h (03B11C6h)  
00000000003B11A8  test        r8,r8  
00000000003B11AB  je          main+156h (03B11C6h)  
00000000003B11AD  lea         rdx,[rsp+68h]  
00000000003B11B2  cmp         qword ptr [rbp-80h],10h  
00000000003B11B7  cmovae      rdx,qword ptr [rsp+68h]  
00000000003B11BD  lea         rcx,[r9+8]  
00000000003B11C1  call        memcpy (03B2B23h)                ; <- and copy the string data

00000000003B11C6  lea         rcx,[t_1]  
00000000003B11CB  call        Test::~Test (03B1280h)  
00000000003B11D0  xorps       xmm0,xmm0

您可以将类型序列化器移到另一个头文件中，并在命名空间下定义不同的类型版本，并像这样为这些结构使用单个序列化器：

namespace ver_2 {
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 2
    
    struct Test {
        std::vector<int> id;
        std::string      login;
    };
#   include "serializer_base.hpp"

#undef API_VERSION_MAJOR 
#undef API_VERSION_MINOR 
}

namespace ver_1 {
#define API_VERSION_MAJOR 1
#define API_VERSION_MINOR 1

    struct Test {
        std::vector<int> id;
        std::vector<int> id_2;
        std::string      login;
    };
#   include "serializer_base.hpp"

#undef API_VERSION_MAJOR 
#undef API_VERSION_MINOR 
}

代码已在 Visual Studio 2017 中进行了测试。

祝你编码愉快！

历史

2019 年 10 月 29 日：初始版本