使用 StringBuilder 可将 C++ 性能提高 4350%！

Pablo Aliskevicius

4.97/5 (41投票s)

2013年9月3日

CPOL

5分钟阅读

171667

1075

字符串连接产生的内存重新分配会造成性能瓶颈。 .NET 有 System.Text.StringBuilder，JavaScript 有 Array.join，而我们有 string::reserve。

下载源代码 - 5.2 KB

介绍

这一切总是始于一个愤怒的客户打来的电话。

你拿到一个程序，它不是在运行，而是在爬行。你检查了常规的可疑点：文件 I/O，数据库访问，甚至是 Web 服务调用：所有这些都没有效果。

你使用了你喜欢的性能分析工具，它告诉你罪魁祸首是一个微小的小函数，它将一长串字符串写入一个文本文件。

你通过将所有字符串连接成一个长字符串，然后一次性写入，而不是每次写入几个字节执行成百上千次写入来改进这个函数。

仍然不行。

你测量写入文件所需的时间。快如闪电。然后，你测量连接所有字符串所需的时间。

漫长无比。

怎么回事？如何解决？

你知道 .NET 程序员为此目的使用一个名为 StringBuilder 的类。这是一个起点。

背景

如果你搜索 “C++ StringBuilder”，你会找到不少答案。其中一些建议使用 std::accumulate，它 **几乎** 能满足你的需求。

#include <iostream>// for std::cout, std::endl
#include <string>  // for std::string
#include <vector>  // for std::vector
#include <numeric> // for std::accumulate
int main()
{
    using namespace std;
    vector<string> vec = { "hello", " ", "world" };
    string s = accumulate(vec.begin(), vec.end(), s);
    cout << s << endl; // prints 'hello world' to standard output. 
    return 0;
}

到目前为止一切都好：问题开始于当你要连接的字符串数量超过几个时，内存重新分配开始堆积。

std::string 在 reserve() 函数中提供了 **一个** 解决方案的基础设施。它正是为了我们的目的而设计的：一次分配，随心所欲地连接。

字符串连接可能会用一种笨拙的、沉重的工具无情地打击性能。上次我被这个特殊的怪兽咬过留下的旧伤疤，我启动了 Indigo（我想玩玩 C++11 中一些闪亮的新玩具），然后写下了这个 `StringBuilder` 的部分实现。

// Subset of http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
template <typename chr>
class StringBuilder {
    typedef std::basic_string<chr> string_t;
    // Tried also vector and list. Deque wan, albeit by a narrow margin.
    typedef std::deque<string_t> container_t;
    // Reuse the size type in the string.
    typedef typename string_t::size_type size_type;
    container_t m_Data;
    size_type m_totalSize;
    void append(const string_t &src) {
        m_Data.push_back(src);
        m_totalSize += src.size();
    }
    // No copy constructor, no assignement.
    StringBuilder(const StringBuilder &);
    StringBuilder & operator = (const StringBuilder &);
public:
    StringBuilder(const string_t &src) {
        if (!src.empty()) {
            m_Data.push_back(src);
        }
        m_totalSize = src.size();
    }
    StringBuilder() {
        m_totalSize = 0;
    }

    StringBuilder & Append(const string_t &src) {
        append(src);
        return *this; // allow chaining.
    }
        // This one lets you add any STL container to the string builder. 
    template<class inputIterator>
    StringBuilder & Add(const inputIterator &first, const inputIterator &afterLast) {
        // std::for_each and a lambda look like overkill here.
                // <b>Not</b> using std::copy, since we want to update m_totalSize too.
        for (inputIterator f = first; f != afterLast; ++f) {
            append(*f);
        }
        return *this; // allow chaining.
    }
    StringBuilder & AppendLine(const string_t &src) {
        static chr lineFeed[] { 10, 0 }; // C++ 11. Feel the love!
        m_Data.push_back(src + lineFeed);
        m_totalSize += 1 + src.size();
        return *this; // allow chaining.
    }
    StringBuilder & AppendLine() {
        static chr lineFeed[] { 10, 0 }; 
        m_Data.push_back(lineFeed);
        ++m_totalSize;
        return *this; // allow chaining.
    }

    // Like C# StringBuilder.ToString()
    // Note the use of reserve() to avoid reallocations. 
    string_t ToString() const {
        string_t result;
        // The whole point of the exercise!
        // If the container has a lot of strings,
        // reallocation (each time the result grows) will take a serious toll,
        // both in performance and chances of failure.
        // I measured (in code I cannot publish) fractions
        // of a second using 'reserve', and almost two minutes using +=.
        result.reserve(m_totalSize + 1);
        //    result = std::accumulate(m_Data.begin(), m_Data.end(), result);
              // This would lose the advantage of 'reserve'

        for (auto iter = m_Data.begin(); iter != m_Data.end(); ++iter) { 
            result += *iter;
        }
        return result;
    }

    // like javascript Array.join()
    string_t Join(const string_t &delim) const {
        if (delim.empty()) {
            return ToString();
        }
        string_t result;
        if (m_Data.empty()) {
            return result;
        }
        // Hope we don't overflow the size type.
        size_type st = (delim.size() * (m_Data.size() - 1)) + m_totalSize + 1;
        result.reserve(st);
                // If you need reasons to love C++11, here is one.
        struct adder {
            string_t m_Joiner;
            adder(const string_t &s): m_Joiner(s) {
                // This constructor is NOT empty.
            }
            // This functor runs under accumulate() without
            // reallocations, if 'l' has reserved enough memory. 
            string_t operator()(string_t &l, const string_t &r) {
                l += m_Joiner;
                l += r;
                return l;
            }
        } adr(delim);
        auto iter = m_Data.begin(); 
                // Skip the delimiter before the first element in the container.
        result += *iter; 
        return std::accumulate(++iter, m_Data.end(), result, adr);
    }

}; // class StringBuilder

有趣的部分

函数 ToString() 使用 std::string::reserve() 来最小化重新分配。下面你可以看到性能测试的结果。

函数 join() 使用 std::accumulate()，带有一个自定义的函数对象，它利用了其第一个操作数已经预留了内存这一事实。

你可能会问：为什么是 std::deque 而不是 std::vector 作为 StringBuilder::m_Data？通常的做法是使用 std::vector，除非你有充分的理由使用其他容器。
在第一个版本中，我使用了 list，基于以下两点：

字符串总是会附加到容器的末尾。std::list 可以做到这一点而无需重新分配：由于 vector 是使用连续内存块实现的，使用 vector 可能会产生内存重新分配。
std::list 对于顺序访问来说相当好，而对 m_Data 的唯一访问就是顺序访问。

在第二个版本中，在测试了 vector、list 和 deque 之后，我保留了 deque，它的速度略快一些。

测量性能

为了测试性能，我从维基百科上选了一个页面，并将部分内容硬编码到一个字符串向量中。

然后，我编写了两个测试函数。第一个函数使用了标准函数 clock()。

第二个函数使用了更精确的 Posix 函数 clock_gettime()，并且还测试了 StringBuilder::Join()。

最后，有一个 main() 函数调用所有其他函数，在控制台显示结果，然后执行性能测试：

Release version screenshot

上图是标题的来源。

这些测试对我来说足够好了；它们表明 std::accumulate 在处理字符串时可能非常慢，而 std::string::reserve() 可以节省大量时间。但存在明显的缺陷，正如你在下面 Arash Partow 的评论中可以看到的那样，所以我根据他的建议重写了它们。我还从文本文件中加载了一个向量，而不是硬编码它，并添加了一些在评论中提出的替代解决方案。这是结果：

Second version screenshot

第一个结论是：任何人想到的 **所有** 方法都比 std::accumulate 快几个数量级。

第二个结论是 std::basic_ostringstream 最快，所以我们应该尽可能使用它。

（不）使用代码

不要使用这段代码。

说真的。我是认真的。不要使用这段代码。

在使用代码之前，请考虑使用 ostringstream。正如你在下面 mrjeff 的评论和上面的测试结果中看到的，它比本文的代码快得多。

评论中 Oktalist 提出的另一个选项，结合了一行代码的简洁性和非常好的性能：参见上面图片中的 lambda 结果。

你可能会问：如果代码不是要使用的，那么它有什么用处？

我想表达的重点是：不要对字符串使用 std::accumulate。

你可能会觉得有使用代码的冲动，如果：

你正在编写将被有 C# 经验的程序员维护的代码，并且你想提供一个熟悉的接口。在这种情况下，你应该考虑使用一个适配器到 std::basic_ostringstream，以及一个类似 StringBuilder 的接口。
你正在编写将在不久的将来转换为 .NET 的代码，并且你想指示一个可能的路径。同样，考虑封装一个 std::basic_ostringstream。
由于某种原因，你不想（或不能）#include <sstream>。几年前，一些流 I/O 实现相当笨重；如果你的编译器有几年历史了，你可能没有选择。

如果你仍然决定使用这段代码，只需按照 main() 中的做法：创建一个 StringBuilder 实例，使用 Append()、AppendLine() 和 Add() 填充它，然后调用 ToString() 来获取结果。

像这样

int main() {
    ////////////////////////////////////
    // 8-bit characters (ANSI): example of chaining.
    ////////////////////////////////////
    StringBuilder<char> ansi;
    ansi.Append("Hello").Append(" ").AppendLine("World");
    std::cout << ansi.ToString();

    ////////////////////////////////////
    // Wide characters (Unicode)
    ////////////////////////////////////
    // http://en.wikipedia.org/wiki/Cargo_cult
    std::vector<std::wstring> cargoCult
    {
        L"A", L" cargo", L" cult", L" is", L" a", 
          L" kind", L" of", L" Melanesian", 
          L" millenarian", L" movement",
          // many more lines here...
          L" applied", L" retroactively", L" to", L" movements", 
          L" in", L" a", L" much", L" earlier", L" era.\n"
    };
    StringBuilder<wchar_t> wide;
    wide.Add(cargoCult.begin(), cargoCult.end()).AppendLine();
        // use ToString(), just like .net
    std::wcout << wide.ToString() << std::endl;
    // javascript-like join.
    std::wcout << wide.Join(L" _\n") << std::endl;

    // Performance tests...

    return 0;
}

再次强调：在连接超过几个字符串时，要警惕 std::accumulate。

等一下！

你可能会问：你是在向我们推销过早优化吗？

我没有。我同意过早优化是坏的。这次优化不是过早的：它是及时的。它是基于经验的：我过去曾与这个特殊的怪兽搏斗过。

基于经验的优化（不摔倒两次在同一块石头上）不是过早的。

当我们优化性能时，“常规嫌疑犯”包括磁盘 I/O、网络访问（数据库、Web 服务）和最内层循环；对此，我们应该加上内存分配，它是糟糕性能的凯撒·索泽。

谢谢

首先，我要感谢 Guy Rutenberg，感谢他在 Linux 上进行精确性能分析的代码。

我真的想感谢所有评论、指出错误和提出替代方案的人。我特别感动（以积极的方式）有两位会员在这篇文章下发表了他们的第一条评论：感谢你们分享知识。

感谢维基百科，让“触手可及的信息”的梦想成为现实。

感谢您抽出宝贵时间阅读本文。希望您喜欢：无论如何，请分享您的意见。

历史

2013 年 9 月 3 日：首次发布。
2013 年 9 月 12 日：根据评论中的建议进行了改进。