在 CPU 上增强并发排列和组合

Shao Voon Wong

4.25/5 (6投票s)

2017年1月28日

CPOL

4分钟阅读

13576

172

在 CPU 上计算并发排列和组合

下载 concurrent_permcomb-0.1.1.zip - 20.6 KB

主要动机
要求
已测试的编译器
使用 GCC 和 Clang 构建示例
没有 CMakeList？
用于计算返回的总结果的公式
局限性
示例
为什么不传入 begin 和 end 迭代器？
如何在回调中使用 thread_index 参数？
取消
生成多少个线程？
如何跨物理上独立的处理器分配工作？
基准测试结果
4 个线程上的收益递减
历史

主要动机

即将推出的 C++17 并发 STL 算法不包括并行的 next_permutation 和 next_combination。compute_all_perm 和 compute_all_comb 在底层使用 next_permutation 和 next_combination 来查找所有排列和组合。多年前，我编写了一个只处理整数类型的并行库，但它暴露了许多内部工作原理。这个库封装了细节，可以对任何数据类型进行排列/组合。

注意：带有谓词的函数重载可用。

注意：工作仍在进行中。该库尚未提交给 Boost 审查，也不是 Boost 库的一部分。

要求

必需：C++11
可选：Boost Multiprecision
注意：对于 factorial(n) (n > 20)，库需要 Boost Multiprecision。您可以使用任意整数（最安全）或固定宽度大整数来容纳最大的阶乘。

// arbitrary integer
boost::multiprecision::cpp_int;
// 256 bit integer
boost::multiprecision::int256_t;

已测试的编译器

Visual C++ 2015
GCC 5.4 Ubuntu 16.04
Clang 3.8 Ubuntu 16.04

使用 GCC 和 Clang 构建示例

g++     CalcPerm.cpp -std=c++11 -lpthread -O2
g++     CalcComb.cpp -std=c++11 -lpthread -O2

clang++ CalcPerm.cpp -std=c++11 -lpthread -O2
clang++ CalcComb.cpp -std=c++11 -lpthread -O2

没有 CMakeList？

这是一个仅包含头文件的库。

用于计算返回的总结果的公式

在计算器中计算此值，并使用总和来确定要使用的最大整数类型。

Total Permutation: n!
Total Combination: n! / (r! (n - r)!)

使用 compute_factorial 来计算排列总数。
使用 compute_total_comb 来计算组合总数。

局限性

next_permutation 支持重复元素，但 compute_all_perm 和 compute_all_comb 不支持。请确保每个元素都是唯一的。还要确保总结果大于生成的线程数。

示例

compute_all_perm 函数和回调签名如下所示。回调应捕获所有异常并返回 false。如果异常在回调外部传播，则将调用 error_callback 并且该线程的处理将提前停止。

// compute_all_perm function signature
template<typename int_type, typename container_type, typename callback_type, 
    typename error_callback_type, typename predicate_type = no_predicate_type>
bool compute_all_perm(int_type thread_cnt, const container_type& cont, callback_type callback, 
    error_callback_type err_callback, predicate_type pred = predicate_type());

// callback_type signature
template<typename container_type>
struct callback_t
{
    bool operator()(const int thread_index, const container_type& cont)
    {
        return true; // can return false to cancel processing in current thread
    }
};

// error_callback_type example
template<typename container_type>
struct error_callback_t
{
    void operator()(const int thread_index, const container_type& cont, const std::string& error)
    {
        std::cerr << error << std::endl;
    }
};

// predicate_type example
template<typename T>
struct predicate_t
{
    bool operator()(T a, T b)
    {
        return a < b;
    }
};

关于如何使用 compute_all_perm 的示例

#include "../permcomb/concurrent_perm.h"

void main()
{
    std::string results(11, 'A');
    std::iota(results.begin(), results.end(), 'A');
    
    int64_t thread_cnt = 2;

    concurrent_perm::compute_all_perm(thread_cnt, results, 
        [](const int thread_index, const std::string& cont) 
            { return true; } /* evaluation callback */,
        [](const int thread_index, const std::string& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */           
        );
}

关于如何使用带有谓词的 compute_all_perm 的示例

#include "../permcomb/concurrent_perm.h"

void main()
{
    std::string results(11, 'A');
    std::iota(results.begin(), results.end(), 'A');
    
    int64_t thread_cnt = 2;

    concurrent_perm::compute_all_perm(thread_cnt, results, 
        [](const int thread_index, const std::string& cont) 
            { return true; } /* evaluation callback */,
        [](const int thread_index, const std::string& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */,            
        [](char a, char b) 
            { return a < b; } /* predicate */
        );
}

compute_all_comb 函数和回调签名如下所示。回调应捕获所有异常并返回 false。如果异常在回调外部传播，则将调用 error_callback 并且该线程的处理将提前停止。

// compute_all_comb function signature
template<typename int_type, typename container_type, typename callback_type, 
    typename error_callback_type, typename predicate_type = no_predicate_type>
bool compute_all_comb(int_type thread_cnt, uint32_t subset, const container_type& cont, 
    callback_type callback, error_callback_type err_callback, predicate_type pred = predicate_type())

// callback_type signature
template<typename container_type>
struct callback_t
{
    bool operator()(const int thread_index, const size_t fullset_size, const container_type& cont)
    {
        return true; // can return false to cancel processing in current thread
    }
};

// error_callback_type example
template<typename container_type>
struct error_callback_t
{
    void operator()(const int thread_index, const size_t fullset_size, 
        const container_type& cont, const std::string& error)
    {
        std::cerr << error << std::endl;
    }
};

// predicate_type example
template<typename T>
struct predicate_t
{
    bool operator()(T a, T b)
    {
        return a == b;
    }
};

关于如何使用 compute_all_comb 的示例

#include "../permcomb/concurrent_comb.h"

void main()
{
    std::vector<int> fullset_vec(20);
    std::iota(fullset_vec.begin(), fullset_vec.end(), 0);
    uint32_t subset = 10;
    
    int64_t thread_cnt = 2;
    
    concurrent_comb::compute_all_comb(thread_cnt, subset, fullset_vec, 
        [] (const int thread_index, uint32_t fullset, const std::vector<int>& cont) 
            { return true; } /* evaluation callback */,
        [] (const int thread_index, uint32_t fullset, const std::vector<int>& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */,
        );
}

关于如何使用带有谓词的 compute_all_comb 的示例

#include "../permcomb/concurrent_comb.h"

void main()
{
    std::vector<int> fullset_vec(20);
    std::iota(fullset_vec.begin(), fullset_vec.end(), 0);
    uint32_t subset = 10;
    
    int64_t thread_cnt = 2;
    
    concurrent_comb::compute_all_comb(thread_cnt, subset, fullset_vec, 
        [] (const int thread_index, uint32_t fullset, const std::vector<int>& cont) 
            { return true; } /* evaluation callback */,
        [] (const int thread_index, uint32_t fullset, const std::vector<int>& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */,
        [] (int a, int b) 
            { return a == b; } /* predicate */
        );
}

为什么不传入 begin 和 end 迭代器？

库需要知道容器类型才能在工作线程中实例化副本。从迭代器类型，我们无法知道容器。迭代器类型不兼容：例如，string 和 vector 迭代器不可互换；用户传递 string 迭代器但库将 vector 迭代器传递给回调是不正确的。

如何在回调中使用 thread_index 参数？

thread_index 是一个从零开始的连续数字。例如，当 thread_cnt 为 4 时，thread_index 将为 [0..3]。 thread_cnt 的数据类型必须足够大以容纳所需的最大的阶乘。

#include "../permcomb/concurrent_perm.h"

void main()
{
    std::string results(11, 'A');
    std::iota(results.begin(), results.end(), 'A');
    
    int64_t thread_cnt = 4;
    int matched[4] = {0,0,0,0};

    concurrent_perm::compute_all_perm(thread_cnt, results, 
        [&matched](const int thread_index, const std::string& cont) /* evaluation callback */
            {
                if(...) 
                    ++matched[thread_index];
                return true;
            },
        [] (const int thread_index, const std::string& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */, 
            
        );
            
    int total_matched = matched[0] + matched[1] + matched[2] + matched[3];
    // display total_matched
}

我将留给读者来修复上面示例中的错误共享。

#include "../permcomb/concurrent_perm.h"

void main()
{
    std::string results(11, 'A');
    std::iota(results.begin(), results.end(), 'A');
    
    int64_t thread_cnt = 4;
    
    int_type cpu_cnt = 2;
    int_type cpu_index = 0; /* 0 or 1 */
    int cpu_index_n = static_cast<int>(cpu_index);

    concurrent_perm::compute_all_perm_shard(cpu_index, cpu_cnt, thread_cnt, results, 
        [](const int thread_index, const std::string& cont) /* evaluation callback */
            {
                return true;
            },
        [] (const int thread_index, const std::string& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */, 
            
        );
}

#include "../permcomb/concurrent_comb.h"

void main()
{
    std::vector<uint32_t> fullset(fullset_size);
    std::iota(fullset.begin(), fullset.end(), 0);
    
    int64_t thread_cnt = 4;
    
    int_type cpu_cnt = 2;
    int_type cpu_index = 0; /* 0 or 1 */
    int cpu_index_n = static_cast<int>(cpu_index);

    concurrent_comb::compute_all_comb_shard(cpu_index, cpu_cnt, thread_cnt, subset_size, fullset, 
        [](const int thread_index, const size_t fullset_cnt, const std::vector<uint32_t>& cont) 
            { /* evaluation callback */
                return true;
            },
        [] (const int thread_index, const std::vector<uint32_t>& cont, const std::string& error) 
            { std::cerr << error; } /* error callback */, 
            
        );
}

基准测试结果

在 Windows 10 上使用带有 Visual C++ 的 Intel i7 6700 CPU，具有 16 GB RAM

Results for permutation of 11 elements:
Total permutations computed: 39,916,800 - 1
int64_t type used.

next_permutation:  163ms
     1 thread(s):  175ms
     2 thread(s):   95ms
     3 thread(s):   48ms
     4 thread(s):   50ms

Results for combination of 14 out of 28 elements:
Total combinations computed: 40,116,600 - 1
int128_t type used.
 
next_combination:  789ms
     1 thread(s):  808ms
     2 thread(s):  434ms
     3 thread(s):  316ms
     4 thread(s):  242ms

4 个线程上的收益递减

主要的嫌疑是 Intel i7 6700 CPU 是一个 4 核处理器，其他应用程序正在运行。需要一个具有超过 4 个内核的多核 CPU 来查看收益递减问题是否仍然存在！

代码托管在 Github

历史

2017 年 1 月 28 日：首次发布
2017 年 4 月 5 日：添加了更多错误处理

在 CPU 上增强并发排列和组合

目录