在GCC中为FreeRTOS启用C++11多线程

Piotr Grygorczuk

5.00/5 (23投票s)

2019年2月27日

CPOL

23分钟阅读

76336

1240

在GCC中为FreeRTOS启用C++11多线程功能

下载源代码 - 703.6 KB

引言

随附的库实现了一个接口，可在FreeRTOS中启用C++多线程。包括：

创建线程 - std::thread, std::jthread
锁定 - std::mutex, std::condition_variable, 等。
时间 - std::chrono, std::sleep_for, 等。
Futures - std::assync, std::promise, std::future, 等。
std::notify_all_at_thread_exit
C++20信号量、锁存器、屏障和原子等待与通知

通过与GCC的自定义集成，该库提供了API来设置线程自定义属性，例如堆栈大小。

我尚未测试所有功能。我知道 thread_local 不起作用。它会编译但不会创建线程特有的存储。

此实现仅适用于GNU C编译器 (GCC)。已通过以下环境测试：

GCC 11.3和10.2用于ARM 32位 (cmake生成Eclipse项目)
FreeRTOS 10.4.3
Windows 10
Qemu 6.1.0

尽管我尚未尝试ARM和RISCV以外的任何平台，但我相信它应该能工作。它仅依赖于FreeRTOS。如果FreeRTOS在您的目标设备上运行，那么我相信这个库也能运行。

此库不旨在由您的应用程序直接访问。它是C++和FreeRTOS之间的接口。您的应用程序应该直接使用STL。STL将在底层使用所提供的库。因此，除了以下两个文件外，不应将任何文件包含在您的应用程序源文件中：

freertos_time.h 用于设置系统时间，
和 thread_with_attributes.h 用于创建具有自定义线程属性（例如堆栈大小）的线程

附件中是一个示例 cmake 项目。目标是NXP K64F Cortex M4微控制器。可以从命令行构建：

cmake ../FreeRTOS_cpp11 -G "Eclipse CDT4 - Unix Makefiles" -Dk64frdmevk=1
cmake --build .

另一个例子是 ARM Versatile Express Cortex-A9，用于在QEMU而不是物理硬件上运行程序。可以从命令行构建：

$ cmake ../FreeRTOS_cpp11 -G "Eclipse CDT4 - Unix Makefiles" -Darmca9=1
$ cmake --build .

背景

C++11标准引入了统一的多线程接口。标准只定义了接口。如何实现则由编译器供应商决定。多线程需要一个在低层运行的任务调度程序，这意味着需要一个操作系统。调度程序和操作系统都超出了C++标准的定义。显然，编译器供应商的实现通常只涵盖最流行的操作系统，例如Windows或Linux。

那么嵌入式世界、微控制器和资源受限系统呢？嗯......嵌入式操作系统种类繁多，要为所有这些提供实现肯定是不可能的。操作系统供应商应该为不同的编译器提供实现吗？也许吧。不幸的是，C++在嵌入式世界并不流行。供应商专注于纯C。人们期望当使用C++编译器时，代码也能编译。除此之外，不需要更多。多线程库则不同。它需要一个额外的层来将操作系统与C++连接起来。

FreeRTOS是一个小型实时操作系统。核心库更像是一个任务调度程序，带有一些同步资源访问的工具。它有一些扩展库，如TCP/IP堆栈和文件系统。这个操作系统在小型微控制器的嵌入式世界中非常流行。它是免费的，并以源代码形式提供。这个小型RTOS的优点是性能好、占用空间小和API简单。尽管它是用C实现的，但许多程序员会用C++创建自己的API封装。

C++语言在嵌入式世界并不流行。我认为这是一个错误。C++拥有标准C所拥有的一切，加上许多使代码更容易表达算法、更安全、更快的优秀特性。使用FreeRTOS主要是管理资源。主要是创建和释放句柄，将正确的类型作为参数传递等。我发现很多时候，我不是专注于算法，而是在检查内存泄漏或不正确的数据类型。将代码封装在C++类中可以将开发提升到不同的水平。

C++中的多线程接口非常清晰且易于使用。不利的一面是，它在底层有点繁重。如果嵌入式应用程序需要经常创建和销毁新任务，或者控制堆栈大小和优先级，这可能不是最好的选择。C++接口不提供这些功能。然而，如果是偶尔启动一个工作线程，或者实现一个在系统中某个地方休眠等待任务处理的调度队列，这个接口就能胜任。最后，并非所有嵌入式应用程序都是硬实时应用程序。

那么，如何让FreeRTOS与C++多线程接口协同工作呢？

你好世界！

构建项目与为ARM构建项目的常规方式没有太大不同。使用该库无需理解其实现方式。应用程序不得直接访问源代码。它由GCC实现本身调用。也就是说，用户应用程序只使用std命名空间中的组件。

与往常一样，需要FreeRTOS源代码和处理器启动代码。以下定义也应放置在 FreeRTOSConfig.h 文件中

#define configNUM_THREAD_LOCAL_STORAGE_POINTERS 1

#define pdMS_TO_TICKS( xTimeInMs ) \
     ( ( TickType_t ) ( ( ( TickType_t ) ( xTimeInMs ) * \
     ( TickType_t ) configTICK_RATE_HZ ) / ( TickType_t ) 1000 ) )

#ifndef pdTICKS_TO_MS
#define pdTICKS_TO_MS(ticks) \
  ((((long long)(ticks)) * (configTICK_RATE_HZ)) / 1000)
#endif

然后，以下文件需要包含在项目中：

condition_variable.h         --> Helper class to implement std::condition_variable
critical_section.h           --> Helper class wrap FreeRTOS critical section
                                 (it is for the internal use only)
freertos_time.cpp            --> Setting and reading system wall/clock time
freertos_time.h              --> Declaration
freertos_thread_attributes.h --> Thread 'attributes' definition
thread_with_attributes.h     --> Helper API to create std::thread and
                                 std::jthread with custom attributes
thread_gthread.h             --> Helper class to integrate FreeRTOS with std::thread
thread.cpp                   --> Definitions required by std::thread class
gthr_key.cpp                 --> Definition required by futures
gthr_key.h                   --> Declarations
gthr_key_type.h              --> Helper class for local thread storage
bits/gthr-default.h          --> FreeRTOS GCC Hook (thread and mutex, see below)

future.cc                    --> Taken as is from GCC code
mutex.cc                     --> Taken as is from GCC code
condition_variable.cc        --> Taken as is from GCC code
libatomic.c                  --> Since GCC11 atomic is not included in GCC build
                                 for certain platforms. Need to provide it.

简单的示例应用程序可以像这样：

#include <condition_variable>
#include <mutex>
#include <thread>
#include <queue>
#include <chrono>

int main()
{
  std::queue<int> q;
  std::mutex m;
  std::condition_variable cv;

  std::this_thread::sleep_for(std::chrono::seconds(1));

  std::thread processor{[&]() {
    std::unique_lock<std::mutex> lock{m};

    while (1)
    {
      cv.wait(lock, [&q] { return q.size() > 0; });
      int i = q.front();
      q.pop();
      lock.unlock();

      if (i == 0)
        return;

      lock.lock();
    }
  }};

  for (int i = 100; i >= 0; i--)
  {
    m.lock();
    q.push(i);
    m.unlock();
    cv.notify_one();
  }

  processor.join();
}

GCC钩子

为了使库正常工作，GCC必须看到线程接口的实现。

有趣的文件是位于GCC安装目录中的 gthr.h。这是我的ARM发行版中的内容：

./include/c++/8.2.1/arm-none-eabi/arm/v5te/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/arm/v5te/softfp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v6-m/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7+fp/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7+fp/softfp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7-m/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7e-m/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7e-m+dp/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7e-m+dp/softfp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7e-m+fp/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v7e-m+fp/softfp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.base/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.main/nofp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.main+dp/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.main+dp/softfp/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.main+fp/hard/bits/gthr.h
./include/c++/8.2.1/arm-none-eabi/thumb/v8-m.main+fp/softfp/bits/gthr.h

每个文件都一样。不同的目录与不同的ARM核心有关。例如，CortexM4应与v7e-m+xx链接，xx取决于浮点配置。文件本身有很多注释掉的代码。这是给实现者的说明。它说明了哪些函数必须实现才能在系统中提供多线程。

文件末尾看起来像这样：

...

#ifndef _GLIBCXX_GTHREAD_USE_WEAK
#define _GLIBCXX_GTHREAD_USE_WEAK 1
#endif
#endif
#include <bits/gthr-default.h>

#ifndef _GLIBCXX_HIDE_EXPORTS
#pragma GCC visibility pop
#endif

...

该文件包含来自 gthr-default.h 的默认实现。此文件与 gthr.h 位于同一目录中。对于没有系统的系统，默认实现会是什么？是的，空函数。那么，如何用库中的实现替换默认实现呢？

此库有自己的 gthr-default.h 文件，其中包含所需的代码，并精确地存储在 FreeRTOS/cpp11_gcc/bits 目录中。

此文件仅由 gthr.h 包含。因此，只要编译器知道 cpp11_gcc 的路径，它也会在 bits 目录中找到默认实现。 cpp11_gcc 的路径在随附的cmake脚本中给出。

库实现

互斥体

互斥体的实现可能是最简单的，因为FreeRTOS API包含所有函数，几乎可以直接转换为GCC接口。完整的实现在 gthr-default.h 中。这里只是一个示例：

typedef xSemaphoreHandle __gthread_mutex_t;

static inline void __GTHREAD_MUTEX_INIT_FUNCTION(__gthread_mutex_t *mutex){
  *mutex = xSemaphoreCreateMutex(); }

static inline int __gthread_mutex_destroy(__gthread_mutex_t *mutex){
  vSemaphoreDelete(*mutex);  return 0; }

static inline int __gthread_mutex_lock(__gthread_mutex_t *mutex){
  return (xSemaphoreTake(*mutex, portMAX_DELAY) == pdTRUE) ? 0 : 1; }

static inline int __gthread_mutex_unlock(__gthread_mutex_t *mutex){
  return (xSemaphoreGive(*mutex) == pdTRUE) ? 0 : 1; }

一旦这些函数被定义，就可以使用 std 命名空间中互斥体的所有不同变体（例如， unique_mutex, lock_guard 等）。除了 timed_mutex。这个需要访问系统时间，本文后面会介绍。

条件变量

使用FreeRTOS实现条件变量以匹配 std 接口有点棘手。首先，最好了解条件变量是什么以及它在系统中是如何（或应该如何）实现的。一篇很好的文章在这里。

不深入细节，实现是线程集合，等待某个条件允许它们退出等待状态。这是一种事件形式——线程等待事件，模块发送通知，线程被唤醒。接口提供了通知一个线程或所有线程的功能。

FreeRTOS 有几种不同的挂起和恢复任务（线程）的方式。**事件组（Event Groups）**看起来很有希望。它维护一个等待线程列表，并在事件通知时唤醒所有线程。然而，这个接口似乎没有提供唤醒单个任务的方法。另一种是**直接任务通知（Direct To Task Notifications）**。另一方面，这需要一个处理线程列表的实现。这种方法效率较低，但至少有可能满足 std::condition_variable 接口。

仔细查看 std::condition_variable 类。实现位于 condition_variable 头文件中。以下代码片段并非完整的类。仅包含其有趣的部分：

/// condition_variable
class condition_variable
{
  typedef __gthread_cond_t        __native_type;

  __native_type           _M_cond;

...
public:

  condition_variable() noexcept;
  ~condition_variable() noexcept;

  void
  notify_one() noexcept;

  void
  notify_all() noexcept;

  void
  wait(unique_lock<mutex>& __lock) noexcept;

  template<typename _Predicate>
  void
  wait(unique_lock<mutex>& __lock, _Predicate __p)
  {
      while (!__p())
        wait(__lock);
  }
...
};

类有一个成员变量 _M_cond，它是原生操作系统接口的句柄——在本例中是FreeRTOS接口。还有一些成员函数必须由外部库实现。这是一个后门，用于对原生句柄进行操作。带有谓词的 wait 已实现。这里只显示它，因为稍后解释一个细节时会用到它。

需要两件事：一个等待任务队列和一个用于同步该队列访问的信号量。两者都必须存储在 condition_variable 类内部的一个句柄中。

在库的 condition_variable.h 文件中，单句柄被实现为 free_rtos_std::cv_task_list 类。它是 std::list 和 FreeRTOS 信号量（ semaphore 类在同一个文件中）的包装器。

class cv_task_list
{
public:
  using __gthread_t = free_rtos_std::gthr_freertos;
  using thrd_type = __gthread_t::native_task_type;
  using queue_type = std::list<thrd_type>;

  cv_task_list() = default;

  void remove(thrd_type thrd) { _que.remove(thrd); }
  void push(thrd_type thrd) { _que.push_back(thrd); }
  void pop() { _que.pop_front(); }
  bool empty() const { return _que.empty(); }

  ~cv_task_list()
  {
    lock();
    _que = queue_type{};
    unlock();
  }

  // no copy and no move
  cv_task_list &operator=(const cv_task_list &r) = delete;
  cv_task_list &operator=(cv_task_list &&r) = delete;
  cv_task_list(cv_task_list &&) = delete;
  cv_task_list(const cv_task_list &) = delete;

  thrd_type &front() { return _que.front(); }
  const thrd_type &front() const { return _que.front(); }
  thrd_type &back() { return _que.back(); }
  const thrd_type &back() const { return _que.back(); }

  void lock() { _sem.lock(); }
  void unlock() { _sem.unlock(); }

private:
  queue_type _que;
  semaphore _sem;
};

一旦这个类被定义，本地处理程序也需要被定义。它在 gthr-default.h 中完成，与互斥量一起。

typedef free_rtos_std::cv_task_list __gthread_cond_t;

现在，类 std::condition_variable 可以将 cv_task_list 类视为本地处理程序。太棒了！是时候处理缺失的函数了。

实现在 condition_variable.cc 文件中。此文件是GCC存储库的一部分，是 gthr-default.h 文件中原生实现的接口。

需要实现的函数是：

__gthread_cond_wait
__gthread_cond_timedwait
__gthread_cond_signal
__gthread_cond_broadcast
__gthread_cond_destroy

__gthread_cond_destroy 无事可做，为空。

wait 函数是条件变量的秘密所在（下面的代码片段）。当两个互斥量都被占用时，它会将当前线程的句柄保存到队列中！第一个互斥量在 wait 调用之外被占用，用于保护条件（请查看带有谓词的 condition_variable::wait 的实现）。这很重要——这是一个契约，保证一次只有一个线程检查条件。第二个互斥量保护线程队列。它确保调用 notify_one/all 的不同线程不会同时修改队列。

一旦线程的句柄被推入队列，线程就准备好挂起。挂起可能会阻塞执行，因此，两个互斥量必须解锁，并让其他线程有机会执行。 ulTaskNotifyTake 是一个 FreeRTOS 函数，它会将任务切换到等待状态，直到 xTaskNotifyGive 函数被调用。值得注意的是，当第二次解锁返回时，上下文可能会切换。此时，另一个线程可能会调用 notify_one/all。在这种情况下，已经被推入队列的任务甚至在开始挂起之前就会从队列中移除。这是正确的行为。根据FreeRTOS文档，在这种情况下调用 ulTaskNotifyTake 不会挂起任务。

无论任务是否被挂起，当 ulTaskNotifyTake 返回时，这意味着 xTaskNotifyGive 至少被调用了一次。这意味着条件必须再次测试，这意味着保护条件的互斥锁必须再次被获取。然而，可能在此期间一些其他线程获得了对条件的访问。因此，立即锁定可能会再次锁定线程。

接下来的两个函数 broadcast 和 signal 几乎相同。两者都锁定对队列的访问，从队列中移除任务并唤醒该任务。区别在于 signal 只唤醒一个任务，而 broadcast 则循环唤醒所有任务。

static inline int __gthread_cond_wait(__gthread_cond_t *cond, __gthread_mutex_t *mutex)
{
  // Note: 'mutex' is taken before entering this function

  cond->lock();
  cond->push(__gthread_t::native_task_handle());
  cond->unlock();

  __gthread_mutex_unlock(mutex);
  ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
  __gthread_mutex_lock(mutex); // lock and return
  return 0;
}

static inline int __gthread_cond_signal(__gthread_cond_t *cond)
{
  cond->lock();
  if (!cond->empty())
  {
    auto t = cond->front();
    cond->pop();
    xTaskNotifyGive(t);
  }
  cond->unlock();
  return 0;
}

static inline int __gthread_cond_broadcast(__gthread_cond_t *cond)
{
  cond->lock();
  while (!cond->empty())
  {
    auto t = cond->front();
    cond->pop();
    xTaskNotifyGive(t);
  }
  cond->unlock();
  return 0;
}

__gthread_cond_timedwait 的功能与 wait 版本相同，不同之处在于会将以毫秒为单位的超时时间传递给 ulTaskNotifyTake。

线程

C++11标准定义了如下代码片段所示的线程接口。需要注意的重要部分是 id 被定义为 thread 类的一部分。

namespace std {
  class thread;

  ...

  namespace this_thread {

    thread::id get_id() noexcept;
    void yield() noexcept;
    template <class Clock, class Duration>
        void sleep_until(const chrono::time_point<Clock, Duration>& abs_time);
    template <class Rep, class Period>
        void sleep_for(const chrono::duration<Rep, Period>& rel_time);
  }
}

来源: cppreference.com

现在，看看你的GCC中的 <thread> 头文件。文件很长，所以代码片段只包含重要部分。

class thread
{
public:
  // Abstract base class for types that wrap arbitrary functors to be
  // invoked in the new thread of execution.
  struct _State
  {
    virtual ~_State();
    virtual void _M_run() = 0;
  };
  using _State_ptr = unique_ptr<_State>;

  typedef __gthread_t			native_handle_type;

  /// thread::id
  class id
  {
    native_handle_type	_M_thread;

    ...
  };

  void
  join();

  void
  detach();

  // Returns a value that hints at the number of hardware thread contexts.
  static unsigned int
  hardware_concurrency() noexcept;

private:
  id				_M_id;
  ...
  void
    _M_start_thread(_State_ptr, void (*)());
  ...
};

_State 类用于传递用户线程函数。 native_handle_type 是底层线程数据持有者类型。我的库中的代码必须精确定义此类型才能挂接到GCC实现。很容易注意到，这与条件变量接口中的方法相同。 id 是线程句柄的存储位置（ _M_thread）。最后，还有一些函数缺少定义：

thread::_State::~_State()
thread::hardware_concurency
thread::join
thread::detach
thread::_M_start_thread

实现位于 thread.cpp 中。前两个函数没有什么特别之处，所以：

namespace std{

  thread::_State::~_State() = default;

  // Returns the number of concurrent threads supported by the implementation.
  // The value should be considered only a hint.
  //
  // Return value
  //    Number of concurrent threads supported. If the value is not well defined
  //    or not computable, returns 0.
  unsigned int thread::hardware_concurrency() noexcept
  {
    return 0; // not computable
  }
}

还记得 gthr-default.h 文件吗？就是实现互斥接口的那个文件？这个文件有许多函数定义来支持线程。现在它们可以用来实现 std::thread 类缺失的定义。任务函数首次可见（ __execute_native_thread_routine ）。这是一个内部线程函数。用户线程函数在内部被调用。定义将在稍后描述。

仔细看一下 _M_start_thread。这里有趣的是 state 参数。 _State_ptr 是一个 unique_pointer<T>，旨在保存用户线程函数。 unique_pointer 中保存的原始指针被传递给原生线程函数。这很重要。这意味着所有权被传递了。现在，原生线程函数负责释放它。因此，线程函数必须执行！ join 根据定义会阻塞。 detach 必须等待线程启动。

namespace std{

  void thread::_M_start_thread(_State_ptr state, void (*)())
  {
    const int err = __gthread_create(
        &_M_id._M_thread, __execute_native_thread_routine, state.get());

    if (err)
      __throw_system_error(err);

    state.release();
  }

join 和 detach 都很简单。关于线程比较的一个注意事项。在这两个函数的典型实现中， _M_ids ( thread::id 类型) 被直接比较。然而，重载的 compare 运算符会复制其参数。如果线程句柄只是一个指针，这没问题。如果句柄是一个带有几个成员的类，则不太好。因此，为了优化，线程被直接比较。复制次数更少，汇编代码看起来也更好。

void thread::join()
{
  id invalid;
  if (_M_id._M_thread != invalid._M_thread)
    __gthread_join(_M_id._M_thread, nullptr);
  else
    __throw_system_error(EINVAL);

  // destroy the handle explicitly - next call to join/detach will throw
  _M_id = std::move(invalid);
}

void thread::detach()
{
  id invalid;
  if (_M_id._M_thread != invalid._M_thread)
    __gthread_detach(_M_id._M_thread);
  else
    __throw_system_error(EINVAL);

  // destroy the handle explicitly - next call to join/detach will throw
  _M_id = std::move(invalid);
}

那么，FreeRTOS是如何连接到线程句柄的呢？需要RTOS的两个特性——RTOS任务句柄本身和事件组句柄。 join 函数必须阻塞，直到线程函数执行完毕。事件组完美地匹配了这一要求。与条件变量的情况一样，这两个句柄必须存储在一个通用句柄中。

线程函数与std::thread实例

如果不是 detach 函数，一切都会很美好。有一个问题必须解决。通用句柄包含两个句柄。为了清楚起见，暂时忘掉通用句柄。所以，有两个句柄——线程句柄和事件句柄。

当新线程启动时，资源会被分配。那么资源何时应该释放呢？如果 std::thread 实例在线程执行期间一直存在，那么析构函数应该是正确的位置。然而，由于 detach 函数，线程的执行可能比 std::thread 实例的生命周期更长。句柄应该在 thread 函数本身中释放吗？那么如果 thread 函数先完成怎么办？ join 函数必须访问事件句柄。句柄必须存在。尽管应该可以检查句柄是否有效。我尝试过这种方法，并遇到了竞态条件——谁先获取—— thread 函数销毁句柄还是 join 获取句柄。因为 join 必须阻塞在该 handle 上，所以同步成为一个挑战。我认为存在一个更简单的解决方案。

解决方案是两个句柄具有不同的生命周期。糟糕的部分是，两个句柄必须存储在同一个通用句柄中。 rtos 任务句柄在 thread 函数结束时释放。事件句柄在 join/detach 调用结束时释放。如下所示：

_M_id = std::move(invalid);

原生线程函数在这里：

namespace std{

  static void __execute_native_thread_routine(void *__p)
  {
    __gthread_t local{*static_cast<__gthread_t *>(__p)}; //copy

    { // we own the arg now; it must be deleted after run() returns
      thread::_State_ptr __t{static_cast<thread::_State *>(local.arg())};
      local.notify_started(); // copy has been made; tell we are running
      __t->_M_run();
    }

    if (free_rtos_std::s_key)
      free_rtos_std::s_key->CallDestructor(__gthread_t::self().native_task_handle());

    local.notify_joined(); // finished; release joined threads
  }
}

句柄作为 void 指针传递，因此需要进行类型转换，同时还会进行一次复制。此外，状态被重新放回 unique_pointer __t 中。现在，是时候通知线程已开始执行，然后调用用户任务了。当用户任务函数返回时，状态将被删除（通过作用域），这意味着线程已完成其功能。删除线程局部数据并通知已连接的线程。就是这样。

原生句柄实现

原生线程句柄定义为 __gthread。定义来自 gthr-default.h 文件：

typedef free_rtos_std::gthr_freertos __gthread_t;

gthr_freertos 类是通用句柄，它在内部持有两个句柄，即 rtos 任务和事件句柄。该类在 thread_gthread.h 文件中定义，并包含在 gthr-default.h 中。

class gthr_freertos
{
  friend std::thread;

  enum
  {
    eEvStoragePos = 0,
    eStartedEv = 1 << 22,
    eJoinEv = 1 << 23
  };

public:
  typedef void (*task_foo)(void *);
  typedef TaskHandle_t native_task_type;

  gthr_freertos(const gthr_freertos &r);
  gthr_freertos(gthr_freertos &&r);
  ~gthr_freertos() = default;

  bool create_thread(task_foo foo, void *arg);

  void join();
  void detach();

  void notify_started();
  void notify_joined();

  static gthr_freertos self();
  static native_task_type native_task_handle();

  bool operator==(const gthr_freertos &r) const;
  bool operator!=(const gthr_freertos &r) const;
  bool operator<(const gthr_freertos &r) const;

  void *arg();
  gthr_freertos &operator=(const gthr_freertos &r) = delete;

private:
  gthr_freertos() = default;
  gthr_freertos(native_task_type thnd, EventGroupHandle_t ehnd);

  gthr_freertos &operator=(gthr_freertos &&r);

  void move(gthr_freertos &&r);

  void wait_for_start();

  native_task_type _taskHandle{nullptr};
  EventGroupHandle_t _evHandle{nullptr};
  void *_arg{nullptr};
  bool _fOwner{false};
};

没有必要描述所有函数。下面是我认为最重要的函数的描述。

临界区

gthr_freertos 类函数中使用临界区。这个简单的实现实际上是禁用和启用中断。如果这在您的应用程序中不可接受，则应更改此类的实现。

namespace free_rtos_std
{
struct critical_section
{
  critical_section() { taskENTER_CRITICAL(); }
  ~critical_section() { taskEXIT_CRITICAL(); }
};
}

类定义在 critical_section.h 文件中。

创建线程

创建 FreeRTOS 任务需要分配两个句柄。它们不是在构造函数中创建的，而是在 create_thread 函数中创建的。如果资源不足，程序将终止。或者，该函数可以返回 false。默认情况下，将为堆栈分配512个字。这在ARM上是2KB。标准C++接口不允许定义堆栈大小。因此，如果应用程序需要更多堆栈，则必须修改此代码。请注意，此更改将应用于所有线程。当使用futures时需要2KB。如果没有futures，我曾用1KB的系统运行过。

此库允许为每个线程设置自定义属性（包括堆栈大小）。

临界区禁用中断。如前所述，当原生 thread 函数完成时，它将删除 thread 的句柄。因此，这里临界区确保在事件句柄存储在 thread 的局部存储中之前，线程不会开始。

bool gthr_freertos::create_thread(task_foo foo, void *arg)
{
  _arg = arg;

  _evHandle = xEventGroupCreate();
  if (!_evHandle)
    std::terminate();

  {
    critical_section critical;

    auto &attr = internal::attributes_lock::_attrib;
    xTaskCreate(foo, attr.taskName, attr.stackWordCount, 
                this, attr.priority, &_taskHandle);
    if (!_taskHandle)
      std::terminate();

    vTaskSetThreadLocalStoragePointer(_taskHandle, eEvStoragePos, _evHandle);
    _fOwner = true;
  }

  return true;
}

线程属性¹

可以创建具有自定义属性的 std::thread 和 std::jthread 实例。 thread_with_attributes.h 文件提供了创建这些线程的API。有两个模板函数， std_jthread 用于创建 std::jthread， std_thread 用于创建 std::thread。

namespace free_rtos_std
{
  template <typename... Args>
  std::thread std_thread(const free_rtos_std::attributes &attr, Args &&...args)
  {
    free_rtos_std::internal::attributes_lock lock{attr};
    return std::thread(std::forward<Args>(args)...);
  }

  template <typename... Args>
  std::jthread std_jthread(const free_rtos_std::attributes &attr, Args &&...args)
  {
    free_rtos_std::internal::attributes_lock lock{attr};
    return std::jthread(std::forward<Args>(args)...);
  }
}

free_rtos_std::attributes 结构包含 FreeRTOS 任务属性。即：

任务名称
任务堆栈大小
任务优先级

其工作方式是存在一个用默认值初始化的全局“属性”实例。当使用C++标准API创建 std::thread 时，会使用这些默认属性值。当需要具有自定义属性的线程时， std_thread 函数将创建 attributes_lock 的实例，该实例将默认值与提供的自定义值进行交换。

attributes_lock 派生自 critial_section。这样，对全局属性的访问是线程安全的。当 gthr_freertos::create_thread 执行时，它会创建一个临界区。此时，更新属性被禁用（调度器被禁用，不会发生上下文切换）。另一方面，当 attributes_lock 被创建时，它会阻止创建任何其他线程。只有此线程会使用自定义属性。当 attributes_lock 被销毁时，默认值将被恢复。

Join

Join 等待原生 thread 函数通知事件。“ while”循环确保它不是一个虚假事件。无需同步任何东西。即使 thread 已完成执行， thread 函数也不会释放事件句柄。

void gthr_freertos::join()
{
  while (0 == xEventGroupWaitBits(_evHandle,
                                  eJoinEv | eStartedEv,
                                  pdFALSE,
                                  pdTRUE,
                                  portMAX_DELAY))
    ;
}

Detach

分离将移除事件句柄。这只能在 thread 开始执行后进行。函数 std::detach 或 std::~thread 将销毁句柄。原生 thread 函数必须首先复制此实例以保留存储在 _arg 中的状态指针。

事件句柄存储在任务的局部存储中。现在必须将其设置为无效句柄。临界区用于确保在访问存储时任务不会被删除。但是，任务可能已经不存在了。必须测试是否是这种情况。

void gthr_freertos::detach()
{ 
  wait_for_start();

  { 
    critical_section critical;

    if (eDeleted != eTaskGetState(_taskHandle))
    {
      vTaskSetThreadLocalStoragePointer(_taskHandle, eEvStoragePos, nullptr);
      vEventGroupDelete(_evHandle);
      _fOwner = false;
    }
  }
}

发送通知

两个通知都从原生线程函数发送。第一个通知表示线程已启动且所有必要的副本已制作完成。第二个通知表示用户线程函数已完成，现在可以将两个线程连接起来。

启动通知没什么可做的。只需在事件组中设置一位即可。

通知连接线程更为困难。有可能线程已被分离，没有人等待连接。这意味着事件句柄已被删除。‘this’实例中的事件句柄只是一个副本，可能指向已释放的内存。在这种情况下，有效信息存储在局部存储中。如果句柄无效，则表示线程已被分离，可以安全退出而无需发送通知。

最后，任务可以被删除。 FreeRTOS 允许将 nullptr 作为参数传递以删除“ this”任务。因为任务被删除，函数将不会返回。因此，临界区有自己的作用域——它必须在删除任务之前被销毁。从那一刻起，任何任务句柄，在任何副本中都是无效的。可以使用 FreeRTOS API eTaskGetState 函数进行测试。

void gthr_freertos::notify_started() 
{ 
  xEventGroupSetBits(_evHandle, eStartedEv);
}

void notify_joined()
{ 
  {
    critical_section critical;

    auto evHnd = static_cast<EventGroupHandle_t>(
        pvTaskGetThreadLocalStoragePointer(_taskHandle, eEvStoragePos));

    if (evHnd)
      xEventGroupSetBits(evHnd, eJoinEv);
  }

  // vTaskDelete does not return
  vTaskDelete(nullptr);
}

转移所有权

std::thread 在函数之间多次传递句柄。经常进行复制。所有权与所有权标志一起传递。代码确保只有当类是所有者时，句柄才会被销毁。 gthr_freertos 类有一个默认的析构函数，它不会触及句柄。句柄在 move 运算符中销毁。这发生在 join/detach 函数的最后一行。

gthr_freertos::gthr_freertos(const gthr_freertos &r)
{
  critical_section critical;

  _taskHandle = r._taskHandle;
  _evHandle = r._evHandle;
  _arg = r._arg;
  _fOwner = false; 
}

gthr_freertos &gthr_freertos::operator=(gthr_freertos &&r)
{
  if (this == &r)
    return *this;

  taskENTER_CRITICAL();

  if (_fOwner)
  { 
    if (eDeleted != eTaskGetState(_taskHandle))
      vTaskDelete(_taskHandle);
    if (_evHandle)
      vEventGroupDelete(_evHandle);
    _fOwner = false;
  }
  else if (r._fOwner)
  {
    taskEXIT_CRITICAL();
    r.wait_for_start();
    taskENTER_CRITICAL();
  }

  move(std::forward<gthr_freertos>(r));
  taskEXIT_CRITICAL();
  return *this;
}

特点

我不得不承认，为了支持futures，我耍了点小聪明。简单地说，我只是包含了GCC仓库中的文件。也就是 mutex.cc 和 future.cc。

仅仅复制文件是不够的。要使futures正常工作，还需要实现一些额外的函数。

一次

函数 std::call_once 调用低级函数 __gthread_once。实现在 gthr-default.h 中。当函数被调用时，必须将外部标志设置为 true。对该标志的访问通过互斥量同步。当标志已设置时，函数不会被调用。

static int __gthread_once(__gthread_once_t *once, void (*func)(void))
{
  static __gthread_mutex_t s_m = xSemaphoreCreateMutex();
  if (!s_m)
    return 12; //POSIX error: ENOMEM

  __gthread_once_t flag{true};
  xSemaphoreTakeRecursive(s_m, portMAX_DELAY);
  std::swap(*once, flag);
  xSemaphoreGiveRecursive(s_m);

  if (flag == false)
    func();

  return 0;
}

线程退出时

我在STL中发现了两个函数，它们要求在线程执行完成后执行用户代码。它们是 std::notify_all_at_thread_exit 和 std::promise::set_value_at_thread_exit 系列函数。我不确定是否还有更多。

同样，GCC实现正在访问 ghtr-default.h 中的函数。调用被重定向到我的实现。

typedef free_rtos_std::Key *__gthread_key_t;

static int __gthread_key_create(__gthread_key_t *keyp, void (*dtor)(void *))
{  return free_rtos_std::freertos_gthread_key_create(keyp, dtor);}

static int __gthread_key_delete(__gthread_key_t key)
{  return free_rtos_std::freertos_gthread_key_delete(key);}

static void *__gthread_getspecific(__gthread_key_t key)
{  return free_rtos_std::freertos_gthread_getspecific(key);}

static int __gthread_setspecific(__gthread_key_t key, const void *ptr)
{  return free_rtos_std::freertos_gthread_setspecific(key, ptr);}

这些函数提供了一种存储线程特定数据的方法。

说实话，我不确定我的实现是否符合要求。我多次阅读POSIX对这些函数的描述，发现它们模棱两可。

我的理解是， key_create 在 thread 函数中只调用一次，它创建一个单一的键。然后运行该函数的每个 thread 都可以将它们的特定数据存储和加载到该键中。所以，键是与 thread 处理程序关联的 thread 数据容器。在我的代码中，它被实现为一个无序映射。

另外，请注意 _key_create 的第二个参数。根据 POSIX 描述，这是一个析构函数，当 thread 退出并且相关数据不是 null 时，它将被调用。

该键定义在 gthr_key_type.h 中。有一个映射用于存储数据，指向析构函数的指针以及一个用于同步映射的 mutex。

struct Key
{
  using __gthread_t = free_rtos_std::gthr_freertos;
  typedef void (*DestructorFoo)(void *);

  Key() = delete;
  explicit Key(DestructorFoo des) : _desFoo{des} {}

  void CallDestructor(__gthread_t::native_task_type task);

  std::mutex _mtx;
  DestructorFoo _desFoo;
  std::unordered_map<__gthread_t::native_task_type, const void *> _specValue;
};

然后键的创建就像：

namespace free_rtos_std
{
Key *s_key;

int freertos_gthread_key_create(Key **keyp, void (*dtor)(void *))
{
  // There is only one key for all threads. If more keys are needed
  // a list must be implemented.
  assert(!s_key);
  s_key = new Key(dtor);

  *keyp = s_key;
  return 0;
}
}

存储和加载值只是简单的映射操作。函数在 gthr_key.cpp 中实现。

最后缺少的是如何将其挂钩到线程销毁。 Key 结构有一个特殊的函数 CallDestructor。该函数查找关联的线程特定数据。如果找到，则将其从存储中移除，并调用先前注册的析构函数。

void CallDestructor(__gthread_t::native_task_type task)
{
  void *val;

  {
    std::lock_guard lg{_mtx};

    auto item{_specValue.find(task)};
    if (item == _specValue.end())
      return;

    val = const_cast<void *>(item->second);
    _specValue.erase(item);
  }

  if (_desFoo && val)
    _desFoo(val);
}

此函数从 thread.cpp 中的 std::__execute_native_thread_routine 调用，紧接着用户线程函数返回之后。

namespace free_rtos_std
{
extern Key *s_key;
}

static void __execute_native_thread_routine(void *__p)
{
  ...
  // at this stage __t->_M_run() has finished execution

  if (free_rtos_std::s_key)
    free_rtos_std::s_key->CallDestructor(__gthread_t::self().native_task_handle());
  ...
}

就是这样。从现在开始， std::promise, std::future 等将正常工作。

thread_local

我没能让它工作。遗憾。

用于独立系统（裸机，无操作系统）的GCC在编译时 __gthread_active_p 函数返回0。我的实现返回 1，但GCC看到的是 0。最有可能是在GCC构建时函数被内联了。零表示线程系统不活跃。在这种情况下，会创建一个变量的单个实例，而不是每个线程一个。

如果还有其他功能无法工作，请告诉我。

系统时间

C++线程的最后一部分是 sleep_for 和 sleep_until 函数。第一个很简单，只需要 thread.cpp 文件中定义的一个函数。它假设 FreeRTOS 中的一个时钟周期是一毫秒。时间转换为时钟周期， FreeRTOS API vTaskDelay 完成这项工作。

void this_thread::__sleep_for(chrono::seconds sec, chrono::nanoseconds nsec)
{
  long ms = nsec.count() / 1'000'000;
  if (sec.count() == 0 && ms == 0 && nsec.count() > 0)
    ms = 1; // round up to 1 ms => if sleep time != 0, sleep at least 1ms

  vTaskDelay(pdMS_TO_TICKS(chrono::milliseconds(sec).count() + ms));
}

第二个函数实际上已经实现了。但是，它需要系统时间才能运行。 sleep_until 调用 gettimeofday，然后 gettimeofday 调用 _gettimeofday。这个函数必须使用 FreeRTOS API 来实现。

为了获取当日时间，最好能够首先设置当日时间。因此，提供了一个额外的函数来设置时间。据我所知， ctime 头文件没有提供设置时间的标准函数。因此提供了我自己的实现。这两个函数都在 freertos_time.cpp 文件中。

算法非常简单。系统滴答计数器是一个时间计数器。然后需要一个全局变量来保存实时和滴答计数器之间的偏移量。该变量必须是线程安全的。

namespace free_rtos_std
{

class wall_clock
{
public:
  struct time_data
  {
    timeval offset;
    TickType_t ticks;
  };

  static time_data time()
  { //atomic
    critical_section critical;
    return time_data{_timeOffset, xTaskGetTickCount()};
  }

  static void time(const timeval &time)
  { //atomic
    critical_section critical;
    _timeOffset = time;
  }

private:
  static timeval _timeOffset;
};

timeval wall_clock::_timeOffset;

}

设置时间变得容易。只需存储滴答数和时间之间的差值：

using namespace std::chrono;
void SetSystemClockTime(
    const time_point<system_clock, system_clock::duration> &time)
{
  auto delta{time - time_point<system_clock>(
                        milliseconds(pdTICKS_TO_MS(xTaskGetTickCount())))};
  long long sec{duration_cast<seconds>(delta).count()};
  long usec = 
       duration_cast<microseconds>(delta).count() - sec * 1'000'000; //narrowing type

  free_rtos_std::wall_clock::time({sec, usec});
}

读取时间是反向操作——添加偏移量和滴答数。

timeval operator+(const timeval &l, const timeval &r);

extern "C" int _gettimeofday(timeval *tv, void *tzvp)
{
  (void)tzvp;

  auto t{free_rtos_std::wall_clock::time()};

  long long ms{pdTICKS_TO_MS(t.ticks)};
  long long sec{ms / 1000};
  long usec = (ms - sec * 1000) * 1000; //narrowing type

  *tv = t.offset + timeval{sec, usec};

  return 0; // return non-zero for error
}

摘要

这个库中隐藏FreeRTOS在通用句柄背后有一些巧妙之处，但总的来说，我相信这是一个干净的解决方案。我对性能有所怀疑。涉及到一些复制操作。此外，在某些地方中断被禁用。然而，正如我一开始提到的，并非所有嵌入式应用程序都是安全关键型或（硬）实时应用程序。我可能错了，但我相信想要实时应用程序的人一开始就不会使用 std::thread。

我相信这个库的主要优点是相同的通用 C++ 接口。我发现它便于在 Visual Studio 中实现和调试某些算法，然后无缝地移植到目标板。

thread_local 问题令人失望。我脑中唯一的想法是分叉GCC并重新编译它，让 __gthread_active_p 返回 1。它会奏效吗？它会破坏编译器吗？我不知道。如果你尝试了，请告诉我。

我的目标是让C++多线程可以通过FreeRTOS API使用。因此，我没有费心去实现POSIX C接口。因此，我相信 gthr-default.h 中的代码在纯C项目中无法编译（甚至没有尝试过）。

历史

2022年11月23日：更新了GCC11.3并启用了C++20功能。
2019年7月20日：初始版本。

[1] - 感谢 Jakub Sosnovec 提供了设置自定义堆栈大小的初步解决方案，并启发我将自定义属性扩展到库中。