BFS、DFS、Dijkstra 和 A-Star 算法的通用实现

optiklab1

5.00/5 (8投票s)

2023年9月15日

Apache

7分钟阅读

7568

125

我通过实际实现来演示，众所周知的算法 BFS、DFS、Dijkstra 和 A-Star 本质上是同一个算法的变种。

下载源代码 - 14 KB

引言

事实证明，像 BFS、DFS、Dijkstra 和 A-Star 这样的著名算法本质上是同一个算法的变种。

换句话说，可以实现一个通用的数据结构，使其能够在不更改核心组件的情况下在这些算法之间切换。虽然有一些局限性需要考虑，但探索这种方法很有趣。

您可以在我的 GitHub 存储库中找到这些算法的所有工作代码。我建议在阅读本文的同时尝试使用代码，因为实践经验比纯理论理解更能增强学习效果。

图表示

我们考虑一个包含 25 个节点、排列成 5x5 网格的图，目标是从左上角的节点 0 找到通往右下角的节点 24 的路径。

( 0  ) - ( 1  ) - ( 2  ) - ( 3  ) - ( 4  )
  |        |        |        |        |
( 5  ) - ( 6  ) - ( 7  ) - ( 8  ) - ( 9  )
  |        |        |        |        |
( 10 ) - ( 11 ) - ( 12 ) - ( 13 ) - ( 14 )
  |        |        |        |        |
( 15 ) - ( 16 ) - ( 17 ) - ( 18 ) - ( 19 )
  |        |        |        |        |
( 20 ) - ( 21 ) - ( 22 ) - ( 23 ) - ( 24 )

上述每种算法都能够实现这一点，但它们都有自己的局限性。

BFS 和 DFS 算法都操作于无权图，忽略边的权重。虽然它们可以找到任何路径，但不能保证找到最优路径。
Dijkstra 和 A-Star 算法都适用于有权图，但不能用于包含负权重的图。A-Star 通常更快，因为它进行了优化，在路径查找时融入了欧几里得坐标。

在本文中，我不涵盖基本概念，希望您已经熟悉它们。如果您对上述术语感到畏惧，那么您可能也应该学习基础知识。然而，玩弄这些代码示例仍然会很有趣。

为了应对这些局限性，我们为每个节点分配虚构的坐标 (X, Y)

(0, 0) - (0, 1) - (0, 2) - (0, 3) - (0, 4)
   |        |        |        |       |
(1, 0) - (1, 1) - (1, 2) - (1, 3) - (1, 4)
   |        |        |        |       |
(2, 0) - (2, 1) - (2, 2) - (2, 3) - (2, 4)
   |        |        |        |       |
(3, 0) - (3, 1) - (3, 2) - (3, 3) - (3, 4)
   |        |        |        |       |
(4, 0) - (4, 1) - (4, 2) - (4, 3) - (4, 4)

最后，我们为图中的每条边分配一些权重。

(0, 0) -1- (0, 1) -1- (0, 2) -1- (0, 3) -2- (0, 4)
   |          |          |          |         |
   2          1          1          2         2
   |          |          |          |         |
(1, 0) -2- (1, 1) -1- (1, 2) -2- (1, 3) -1- (1, 4)
   |          |          |          |         |
   2          1          1          1         1
   |          |          |          |         |
(2, 0) -1- (2, 1) -1- (2, 2) -1- (2, 3) -2- (2, 4)
   |          |          |          |         |
   2          1          1          1         2
   |          |          |          |         |
(3, 0) -2- (3, 1) -2- (3, 2) -1- (3, 3) -2- (3, 4)
   |          |          |          |         |
   2          1          1          2         2
   |          |          |          |         |
(4, 0) -2- (4, 1) -1- (4, 2) -2- (4, 3) -2- (4, 4)

在 C++ 中，此结构可以表示为：

class GraphNode
{
public:
	int X;
	int Y;
};

class Graph
{
public:
	vector<vector<pair<int, int>>> Edges;
	vector<GraphNode> Nodes;
};

Graph 中的边列表由一个数组的数组表示，其中索引对应于图中每条边离开的节点编号。然后，每个元素包含一对值：

图中每条边进入的节点编号
边的权重

利用这种简单的结构，我们可以遍历图中的每个节点并获取有关其连接的所有必要信息。

int toNode = graph.Edges[fromNode][neighbourIndex].first;
int weight = graph.Edges[fromNode][neighbourIndex].second;

现在，让我们在图中创建一些自定义连接，以观察我们的通用算法的工作方式。由于代码不是这里的重点，我将提供相关方法的链接。

或者，也可以懒惰地生成此图中的所有连接和权重，代码更少。但是，这种方法可能无法提供对算法遍历图的实际差异的全面理解。

通用算法

通用路径查找算法的核心是通用数据结构，在本项目中，我们将其称为“Queue”。然而，它不是经典的 FIFO（先进先出）数据结构。相反，它是一个通用结构，允许我们在遍历时实现节点排队，并且能够根据使用的算法更改排队机制。“Queue”的接口很简单：

class pathFindingBase
{
public:
  virtual void insert(int node) = 0;
  virtual int getFirst() = 0;
  virtual bool isEmpty() = 0;
};

在深入研究 Queue 的细节之前，让我们先研究一下遍历算法本身。

本质上，它非常类似于典型的 A-Star 或 Dijkstra 算法。首先，我们需要初始化一组集合，使我们能够：

维护一个尚未处理（白色）、正在处理（灰色）和已处理/访问（黑色）的节点列表。
跟踪从起点到集合中每个节点的最短路径的当前距离。
存储一对对（前一个节点，后一个节点）的列表，以便稍后重建最终路径。

main.cpp#L18

const int INF = 1000000;
const int WHITE = 0;
const int GREY = 1;
const int BLACK = 2;

/// <summary>
/// Universal algorithm to apply Path search using BFS, DFS, Dijkstra, A-Star.
/// </summary>
vector<int> FindPath(Graph& graph, int start, int finish, int finishX, int finishY)
{
  int verticesNumber = graph.Nodes.size();

  vector<int> nodeColor(verticesNumber, WHITE);   // All the nodes are White colored 
                                                  // initially
  vector<int> shortestPath(verticesNumber, INF);  // Current shortest path found 
                                                  // from Start to i is some 
                                                  // large/INFinite number from the beginning.
  vector<int> previousVertex(verticesNumber, -1); // Index of the vertex/node that is 
                                                  // predecessor of i-th vertex 
                                                  // in a shortest path to it.

  // We should use pointers here because we want to pass the pointer to a data-structure
  // so it may receive all the updates automatically on every step.
  shared_ptr<vector<int>> ptrShortestPath = make_shared<vector<int>>(shortestPath);
  shared_ptr<Graph> ptrGraph = make_shared<Graph>(graph);

接下来，我们需要初始化我们的数据结构。使用 GitHub 存储库中提供的代码，您可以简单地取消注释必要的代码行。代码不是为了通过参数选择数据结构而设计的，因为我希望您主动尝试它以获得更好的理解（是的，我很强硬 :D）。

//////////////////////////////////////////////////
// TODO
// UNCOMMENT DATA STRUCTURE YOU WANT TO USE:

//dfsStack customQueue; // UNCOMMENT TO USE DFS
//bfsQueue customQueue; // UNCOMMENT TO USE BFS 
//dijkstraQueue customQueue(ptrShortestPath); // UNCOMMENT TO USE DIJKSTRA
//aStarQueue customQueue(finishX, finishY, ptrGraph, ptrShortestPath); // UNCOMMENT TO USE A-STAR on vector

// END OF TODO
/////////////////////////////////////////////////

最后是算法本身。本质上，它是所有三种算法的组合，并带有一些附加检查。我们初始化一个“customQueue”并执行算法，直到它变为空。在检查图中的每个相邻节点时，我们都会将每个可能需要下一个遍历的节点入队。然后，我们调用 getFirst() 方法，该方法仅提取一个应该在算法中下一个遍历的节点。

main.cpp#L48

customQueue.insert(start);
  nodeColor[start] = BLACK;
  ptrShortestPath->at(start) = 0;

  while (!customQueue.isEmpty()) // Traverse nodes starting from start node.
  {
    int current = customQueue.getFirst();

    if (current == finish)       // If we found finish node, then let's print full path.
    {
      vector<int> path;

      int cur = finish;
      path.push_back(cur);

      while (previousVertex[cur] != -1) // Recover path node by node.
      {
        cur = previousVertex[cur];
        path.push_back(cur);
      }

      reverse(path.begin(), path.end()); // Since we are at the finish node, 
                                         // reverse list to be at start.

      return path;
    }

    for (int neighbourIndex = 0; 
         neighbourIndex < graph.Edges[current].size(); neighbourIndex++)
    {
      int to = graph.Edges[current][neighbourIndex].first;
      int weight = graph.Edges[current][neighbourIndex].second;

      if (nodeColor[to] == WHITE) // If node is not yet visited.
      {
        nodeColor[to] = GREY; // Mark node as "in progress".
        customQueue.insert(to);
        previousVertex[to] = current;
        ptrShortestPath->at(to) = ptrShortestPath->at(current) + 
                         weight; // Calculate cost of moving to this node.
      }
      else // Select the most optimal route.
      {
        if (ptrShortestPath->at(to) > ptrShortestPath->at(current) + weight)
        {
          ptrShortestPath->at(to) = ptrShortestPath->at(current) + weight;
        }
      }
    }

    nodeColor[current] = BLACK;
  }

  return {};
}

到目前为止，实现与您可能在书籍或互联网上找到的其他示例没有显著差异。然而，关键点在于 getFirst() 方法，它起着主要作用，因为它决定了节点遍历的确切顺序。

BFS 队列

让我们仔细看看我们队列数据结构的内部工作。BFS 的队列接口是最简单的。 bfsQueue.h#L11

#include <queue>
#include "pathFindingBase.h"

class bfsQueue : public pathFindingBase
{
private:
  queue<int> _queue;

public:
  virtual void insert(int node)
  {
    _queue.push(node);
  }

  virtual int getFirst()
  {
    int value = _queue.front();
    _queue.pop();
    return value;
  }

  virtual bool isEmpty()
  {
    return _queue.empty();
  }
};

实际上，我们可以简单地用 STL（标准模板库）提供的标准 C++ queue 替换这里的自定义队列接口。但是，目标是通用性。现在，您只需要在 main 方法中取消注释该行并运行此算法。

//bfsQueue customQueue; // UNCOMMENT TO USE BFS

结果，BFS 找到了路径 24<-19<-14<-9<-8<-7<-6<-1<-0。

(0, 0) - (0, 1) - (0, 2) - (0, 3) - (0, 4)
                                       |
                                    (1, 4)
                                       |
                                    (2, 4)
                                       |
                                    (3, 4)
                                       |
                                    (4, 4)

如果我们考虑权重，这条路径的总成本将是 11。但是，请记住，BFS 和 DFS 都不考虑权重。相反，它们遍历图中的所有节点，希望或多或少能找到所需的节点。

DFS 队列

DFS 看起来差别不大。我们只需要将 STD 队列替换为堆栈。 dfsStack.h#L11

#include <stack>
#include "pathFindingBase.h"

class dfsStack : public pathFindingBase
{
private:
  stack<int> _queue;

public:
  virtual void insert(int node)
  {
    _queue.push(node);
  }

  virtual int getFirst()
  {
    int value = _queue.top();
    _queue.pop();
    return value;
  }

  virtual bool isEmpty()
  {
    return _queue.empty();
  }
};

DFS 找到路径 24<-23<-22<-21<-20<-15<-10<-5<-0，成本为 15（它不优先查找最优成本）。有趣的是，它的遍历方向与 BFS 相反。

(0, 0)
   | 
(1, 0) 
   |
(2, 0)
   |
(3, 0)
   | 
(4, 0) - (4, 1) - (4, 2) - (4, 3) - (4, 4)

Dijkstra 队列

现在，Dijkstra 算法是图中最著名的贪婪搜索算法。尽管存在众所周知的局限性（无法处理负路径、循环等），但它仍然流行且足够高效。

需要注意的是，此实现中的 getFirst() 方法使用贪婪方法来选择要遍历的节点。 dijkstraQueue.h#L17

#include <queue>
#include "pathFindingBase.h"

class dijkstraQueue : public pathFindingBase
{
private:
  vector<int> _queue;
  shared_ptr<vector<int>> _shortestPaths;

public:
  dijkstraQueue(shared_ptr<vector<int>> shortestPaths) : _shortestPaths(shortestPaths) { }

  virtual void insert(int node)
  {
    _queue.push_back(node);
  }

  virtual int getFirst()
  {
    int minimum = INF;
    int minimumNode = -1;

    for (int i = 0; i < _queue.size(); i++)
    {
      int to = _queue[i];
      int newDistance = _shortestPaths->at(to);

      if (minimum > newDistance) // Greedy selection: select node with 
                                 // minimum distance on every step
      {
        minimum = newDistance;
        minimumNode = to;
      }
    }
    
    if (minimumNode != -1)
    {
      remove(_queue.begin(), _queue.end(), minimumNode);
    }

    return minimumNode;
  }

  virtual bool isEmpty()
  {
    return _queue.empty();
  }
};

Dijkstra 算法找到最短和最优路径 24<-19<-18<-13<-12<-7<-6<-1<-0，成本为 10。

(0, 0) -1- (0, 1)
             |
             1 
             |
           (1, 1) -1- (1, 2)
                         |
                         1 
                         |
                      (2, 2) -1- (2, 3)
                                    |
                                    1 
                                    |
                                  (3, 3) -1- (3, 4)
                                               |
                                               1 
                                               |
                                             (4, 4)

A星算法

A-Star 算法特别适用于在具有坐标的欧几里得空间中搜索路径的情况，例如地图。这就是为什么它在游戏中被广泛使用的原因。它不仅利用基于最小权重的“盲目”贪婪搜索，还考虑了到目标的欧几里得距离。因此，在实际场景中，它通常比 Dijkstra 算法更有效（有关更多详细信息，请参阅我的其他项目）。 aStarQueue.h#L18

class aStarQueue : public pathFindingBase
{
private:
  vector<int> _queue;
  shared_ptr<vector<int>> _shortestPaths;
  shared_ptr<Graph> _graph;
  int _finishX;
  int _finishY;

  /// <summary>
  /// Euclidian distance from node start to specified node id.
  /// </summary>
  int calcEuristic(int id)
  {
    return sqrt(
      pow(abs(
        _finishX > _graph->Nodes[id].X ?
        _finishX - _graph->Nodes[id].X :
        _graph->Nodes[id].X - _finishX), 2) +
      pow(abs(
        _finishY > _graph->Nodes[id].Y ?
        _finishY - _graph->Nodes[id].Y :
        _graph->Nodes[id].Y - _finishY), 2));
  }

public:
  aStarQueue(int finishX, int finishY, shared_ptr<Graph> graph, 
             shared_ptr<vector<int>> shortestPaths)
    :
    _shortestPaths(shortestPaths),
    _graph(graph)
  {
    _finishX = finishX;
    _finishY = finishY;
  }

  virtual void insert(int node)
  {
    _queue.push_back(node);
  }

  virtual int getFirst()
  {
    int minimum = INF;
    int minimumNode = -1;

    for (int i = 0; i < _queue.size(); i++)
    {
      int to = _queue[i];
      int newDistance = _shortestPaths->at(to);
      int euristic = calcEuristic(to);

      if (minimum > newDistance + euristic)
      {
        minimum = newDistance + euristic;
        minimumNode = to;
      }
    }

    if (minimumNode != -1)
    {
      _queue.erase(remove(_queue.begin(), _queue.end(), minimumNode), _queue.end());
    }

    return minimumNode;
  }

  virtual bool isEmpty()
  {
    return _queue.empty();
  }
};

结果，我们得到了与 Dijkstra 算法相同的结果，因为它提供了最最优的路线。

缺点

但是，我们的 Dijkstra 和 A-Star 算法存在一个问题……

上述实现使用通用数据结构内的 vector<T>（动态数组 []）。在每次调用 getFirst() 时，它需要 O(N) 时间才能在向量中找到所需的节点。因此，假设主算法也需要 O(N*M) 时间（其中 M 是平均邻居数量），则总体复杂度可能接近立方。这将在大型图上导致性能明显下降。

虽然这个示例有助于理解所有四种算法在根本上并非截然不同，但细节决定成败。使用通用数据结构高效地实现所有四种算法具有挑战性。

为了获得最佳性能（这通常是 99% 的情况下的主要关注点），应投入更多精力进行优化。例如，对于 Dijkstra 和 A-Star 算法，使用优先队列而不是数组非常有意义。

谈到 A-Star 算法的优化，提及一些将打开优化世界的链接非常有意义：Lucho Suaya 的 A* 优化和改进和 Steve Rabin 的 JPS+：比 A* 快 100 倍以上。

结语

本文的目的是展示所有遍历算法彼此之间的相关性。但是，本文使用的图示例绝对过于简单，无法展示这些算法之间实际性能差异。因此，请主要使用这些示例来获得概念理解，而不是用于生产目的。

如果您有兴趣探索这些算法的潜力，请阅读我基于我的其他项目的下一篇文章，该项目高效地实现了这些算法，并采用了更直观的方法，并提供了广泛的测试数据。

历史

2023 年 9 月 15 日：初始版本