丢失的字节之谜

Coral Kashri

0/5 (0投票)

2023年4月22日

CPOL

6分钟阅读

1688

在本文中，您将看到当结构体中包含某些类型时，如何保存字节和对齐。

那是一个漫长而黑暗（模式）的夜晚。我最后记得的是我使用的指针。我只剩下微小的空间来使用，只有8个字节。正好是一个指针的大小。我想保护它，于是使用了std::unique_ptr。一切似乎都在正常工作，然后我明白了。std::unique_ptr必须为它的删除器存储额外的空间。我们的冒险由此开始。

第一章：现象

在调查一种现象之前，我们必须深入挖掘，了解我们面前的具体情况。

因此，这种现象只在使用按值捕获的空 Lambda 函数或空仿函数作为删除器时发生。在任何其他情况下，std::unique_ptr的大小都会改变。无论是通过引用传递仿函数，还是使用std::function / 函数指针作为删除器。

所以这似乎是一种优化，但是什么优化呢？至少在 C++20 之前，当你将一个实例存储为类成员时，你没有任何可用的优化。

我脑海中闪过的第一个念头是：“编译器是否使用了某种不为人知的隐藏优化技术？”

没有比显而易见的事实更具欺骗性的了。
夏洛克·福尔摩斯

第二章：已知技术

永远要寻找可能的替代方案，并为此做好准备。
夏洛克·福尔摩斯

每当我注意到一个未知或无法解释的现象时，我做的第一件事就是寻找类似的行为。幸运的是，存在一种（在 C++20 之前）和另一种（在 C++20 中新增的）可能解释这种现象。

空基类优化

当一个派生类拥有一个不包含任何非static数据成员的基类时，就会应用这种优化。在这种情况下，编译器可以优化空间使用，避免为基类分配一个字节（尽管规则强制任何实例至少为 1 字节大小）。例如（摘自 cppreference）

struct Base {}; // empty class
 
struct Derived1 : Base {
    int i;
};
 
int main()
{
    // the size of any object of empty class type is at least 1
    static_assert(sizeof(Base) >= 1);
 
    // empty base optimization applies
    static_assert(sizeof(Derived1) == sizeof(int));
}

这种优化有一些限制，有时其行为是编译器特定的。例如，如果第一个非static数据成员的类型与基类相同或派生自基类，则此优化不适用。

struct Base {}; // empty class
 
struct Derived1 : Base {
    int i;
};
 
struct Derived2 : Base {
    Base c;     // Base, occupies 1 byte, followed by padding for i
    int i;
};
 
struct Derived3 : Base {
    Derived1 c; // derived from Base, occupies sizeof(int) bytes
    int i;
};
 
int main()
{
    // empty base optimization does not apply,
    // base occupies 1 byte, Base member (c) occupies 1 byte
    // followed by 2 bytes of padding to satisfy int alignment requirements
    static_assert(sizeof(Derived2) == 2*sizeof(int));
 
    // empty base optimization does not apply,
    // base takes up at least 1 byte plus the padding
    // to satisfy alignment requirement of the first member (whose
    // alignment is the same as int)
    static_assert(sizeof(Derived3) == 3*sizeof(int));
}

有关此优化的更多信息，请参阅 cppreference – 空基类优化。

[[no_unique_address]] // C++20 起

此属性将空基类优化向前推进了几步，允许我们在不继承的情况下对包含的数据成员执行相同的优化（甚至更多）。此外，被标记的数据成员的任何尾部填充都可以用于其他数据成员。摘自cppreference的示例

struct Empty {}; // empty class
 
struct X {
    int i;
    Empty e;
};
 
struct Y {
    int i;
    [[no_unique_address]] Empty e;
};
 
struct Z {
    char c;
    [[no_unique_address]] Empty e1, e2;
};
 
struct W {
    char c[2];
    [[no_unique_address]] Empty e1, e2;
};
 
int main()
{
    // the size of any object of empty class type is at least 1
    static_assert(sizeof(Empty) >= 1);
 
    // at least one more byte is needed to give e a unique address
    static_assert(sizeof(X) > sizeof(int));
 
    // empty member optimized out
    std::cout << "sizeof(Y) == sizeof(int) is " << std::boolalpha 
              << (sizeof(Y) == sizeof(int)) << '\n';
 
    // e1 and e2 cannot share the same address because they have the
    // same type, even though they are marked with [[no_unique_address]]. 
    // However, either may share address with c.
    static_assert(sizeof(Z) >= 2);
 
    // e1 and e2 cannot have the same address, but one of them can share with
    // c[0] and the other with c[1]
    std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}

关于测试(sizeof(W) == 2)的说明：目前 gcc、clang 和 msvc 对此测试返回false。尚不清楚这是功能不完整，还是 cppreference 的错误。

再说一个说明（感谢 obsidian_golem）：在 MSVC 编译器中，此属性不存在，而是应使用属性 [[msvc::no_unique_address]]。

第三章：连接可能与不可能的桥梁

使用 [[no_unique_address]] 可以轻松实现此优化。但是编译器只能在编译为 C++20 标准时使用此功能，并且由于此优化在旧标准版本中也已应用，因此我们的研究尚未完成。

现在，我们剩下空基类优化的第一个选项。现在是时候深入研究编译器，看看它们究竟是如何做到的。

LLVM 技术

LLVM 使用他们称之为（可能取自 boost）compressed_pair 的东西。通过为对中的每个元素使用一个包装器，它们可以决定是继承类型还是将其包含在内。包装器写法如下：

template <class _Tp, int _Idx,
          bool _CanBeEmptyBase = is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>
struct __compressed_pair_elem {
public:
    const _Tp& get() { return __value_; }
private:
    _Tp __value_;
};
template <class _Tp, int _Idx>
struct __compressed_pair_elem<_Tp, _Idx, true> : private _Tp {
public:
    const _Tp& get() { return *this; }
};

让我们将其拆分成多个部分。

template <class _Tp, int _Idx,
          bool _CanBeEmptyBase = is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>

在这里，我们可以看到类类型的标识符。_Tp是我们想要存储的类型（无论是删除器还是托管对象）。_Idx用作类键（我们稍后会看到它的用途）。还有一个有趣的非类型模板参数_CanBeEmptyBase，其名称暗示了它的作用（类似于空基类优化？）。它获得的默认值是：

is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>

这可能有点令人困惑，但别忘了我们是在std命名空间内。所以我们实际上在这里使用的是std::is_empty和std::__libcpp_is_final函数。

如果满足某些条件，std::is_empty的值为true。对我们理解最重要的条件是，该类型（及其基类层次结构）内部没有非静态数据成员。有关更多条件和解释，您可以参考 cppreference。

std::__libcpp_is_final是这里获得的另一个线索，因为final类不能被继承到另一个类。

因此，要使_CanBeEmptyBase参数值为true的组合是：类型为空且不是final。这意味着如果我们能够继承它并对其使用空基类优化。

在这个类的默认情况下，我们将使用第一个实现（它将类型作为private数据成员：_Tp __value_;。而_CanBeEmptyBase为true的特例是从该类型派生出来的：

template <class _Tp, int _Idx>
struct __compressed_pair_elem<_Tp, _Idx, true> : private _Tp

现在我们有了compressed_pair_element，我们可以从中构建一个compressed_pair：

template <class _T1, class _T2>
class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
                          private __compressed_pair_elem<_T2, 1> {
    typedef _LIBCPP_NODEBUG_TYPE __compressed_pair_elem<_T1, 0> _Base1;
    typedef _LIBCPP_NODEBUG_TYPE __compressed_pair_elem<_T2, 1> _Base2;
    _LIBCPP_INLINE_VISIBILITY
  typename _Base1::reference first() _NOEXCEPT {
    return static_cast<_Base1&>(*this).__get();
    }
    _LIBCPP_INLINE_VISIBILITY
  typename _Base1::const_reference first() const _NOEXCEPT {
    return static_cast<_Base1 const&>(*this).__get();
    }
    _LIBCPP_INLINE_VISIBILITY
  typename _Base2::reference second() _NOEXCEPT {
    return static_cast<_Base2&>(*this).__get();
    }
    _LIBCPP_INLINE_VISIBILITY
  typename _Base2::const_reference second() const _NOEXCEPT {
    return static_cast<_Base2 const&>(*this).__get();
    }
};

我承认，初看之下它可能有点令人生畏（而且在源代码中它们还声明了构造函数和交换函数，会更令人畏惧）。但是，让我们稍微清理一下实现，看看它究竟是关于什么的。

template <class _T1, class _T2>
class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
                          private __compressed_pair_elem<_T2, 1> {
    typedef __compressed_pair_elem<_T1, 0> _Base1;
    typedef __compressed_pair_elem<_T2, 1> _Base2;
    inline typename _Base1::reference first() noexcept() {
    return static_cast<_Base1&>(*this).__get();
    }
    inline typename _Base1::const_reference first() const noexcept() {
    return static_cast<_Base1 const&>(*this).__get();
    }
    inline typename _Base2::reference second() noexcept() {
    return static_cast<_Base2&>(*this).__get();
    }
    inline typename _Base2::const_reference second() const noexcept() {
    return static_cast<_Base2 const&>(*this).__get();
    }
};

现在，我们这里有一个类，它两次继承自compressed_pair_elem。以下是_Idx的解释：这对于启用继承_compressed_pair_elem是必需的，该类多次持有相同的内部类型。因为我们不能两次继承同一个类，所以我们必须给类一个额外的唯一键，LLVM 选择了一个整数（它也可以与变长模板和__make_tuple_indices或不同的类似方法结合）。请注意，在此compressed_pair的特定实现中，还有一个额外的static_assert禁止两种类型相同。

// NOTE: This static assert should never fire because __compressed_pair
// is *almost never* used in a scenario where it's possible for T1 == T2.
// (The exception is std::function where it is possible that the function
//  object and the allocator have the same type).
static_assert((!is_same<_T1, _T2>::value),
  "__compressed_pair cannot be instantiated when T1 and T2 are the same type; "
  "The current implementation is NOT ABI-compatible with the previous "
  "implementation for this configuration");

现在，我们所要做的就是将这个杰出的产物用于unique_ptr类，并可能节省一些字节。

template <class _Tp, class _Dp = default_delete<_Tp> >
class unique_ptr {
public:
    typedef _Tp element_type;
    typedef _Dp deleter_type;
    typedef typename __pointer_type<_Tp, deleter_type>::type pointer;
private:
    __compressed_pair<pointer, deleter_type> __ptr_;
    /* ... */
};

GCC 技术

在 GCC 中，技术略有不同，但基于相同的优化类型。在unique_ptr类中，他们使用了以下数据成员声明：

tuple<_Tp, _Dp> _M_t;

在某些编译器中，元组使用空基类优化，现在是时候看看 GCC 中的元组了。

// Use the Empty Base-class Optimization for empty, non-final types.
template<typename _Tp>
    using __empty_not_final
    = __conditional_t<__is_final(_Tp), false_type,
		      __is_empty_non_tuple<_Tp>>;
template<size_t _Idx, typename _Head,
	   bool = __empty_not_final<_Head>::value>
    struct _Head_base;
#if __has_cpp_attribute(__no_unique_address__)
    template<size_t _Idx, typename _Head>
    struct _Head_base<_Idx, _Head, true>
    {
        static constexpr _Head&
      _M_head(_Head_base& __b) noexcept { return __b._M_head_impl; }
        static constexpr const _Head&
      _M_head(const _Head_base& __b) noexcept { return __b._M_head_impl; }
        [[__no_unique_address__]] _Head _M_head_impl;
    };
#else
    template<size_t _Idx, typename _Head>
    struct _Head_base<_Idx, _Head, true>
    : public _Head
    {
        static constexpr _Head&
      _M_head(_Head_base& __b) noexcept { return __b; }
        static constexpr const _Head&
      _M_head(const _Head_base& __b) noexcept { return __b; }
    };
#endif
template<size_t _Idx, typename _Head>
struct _Head_base<_Idx, _Head, false>
{
    static constexpr _Head&
      _M_head(_Head_base& __b) noexcept { return __b._M_head_impl; }
    static constexpr const _Head&
      _M_head(const _Head_base& __b) noexcept { return __b._M_head_impl; }
    _Head _M_head_impl;
};

这里有一个转折——GCC 检查了它们是否允许使用no_unique_address属性，并且在true的情况下，它们实现了一个与不使用空基类优化的类特例非常相似的类，只是带有这个属性。

现在我们在_Head_base中有了优化（类似于 LLVM 实现中的_compressed_pair_elem），并且我们可以在_Tuple_impl中继承它：

template<size_t _Idx, typename... _Elements>
struct _Tuple_impl;
/**
 * Recursive tuple implementation. Here we store the @c Head element
 * and derive from a @c Tuple_impl containing the remaining elements
 * (which contains the @c Tail).
 */
template<size_t _Idx, typename _Head, typename... _Tail>
struct _Tuple_impl<_Idx, _Head, _Tail...>
  : public _Tuple_impl<_Idx + 1, _Tail...>,
    private _Head_base<_Idx, _Head>
{ /* ... */ };

在这里，我们可以看到以经典的递归方式解包尾部，并且每个解包出来的_Head都被_Head_base类包装，该类被继承到_Tuple_impl。

第四章：总有更多的河流要跨越，更多的桥梁要建造

std::unique_ptr只是众多接受分配器或删除器的 std 容器之一。它们中的每一个都可能利用这种能力来节省一些字节。我们在这里不讨论其他容器是如何实现的，但思路是一样的。

希望您阅读愉快。如果您有任何想法，或者觉得这些知识对您的开发有所帮助，欢迎在评论区分享。