优先级结构图字典数据算法 Python

如何解决单词阶梯问题？

Sandeep Mewara

5.00/5 (4投票s)

2020年11月22日

CPOL

5分钟阅读

19060

深入探讨基于图的算法

不久前，一位同事问我关于单词阶梯问题。她正在寻求改变。我相信她在准备数据结构和算法时偶然发现了这个。

问题陈述

通常，提供的谜题是下面示例的一种变体。

引用

找出将起始单词更改为目标单词所需的最少变换次数，这两个单词的长度相同。在每次变换中，只能更改一个字符，并确保单词存在于给定的字典中。

解释

假设所有这些 4 个字母的单词都在提供的字典中，将 SAIL 转换为 RUIN 最少需要四次转换，即：
SAIL -> MAIL -> MAIN -> RAIN -> RUIN

这里的目的是了解图算法。那么，在算法的背景下，图是什么？我们如何应用它们来解决这类问题？

图数据结构

图是一种流结构，用于表示实体之间的连接。在视觉上，它们通过节点（顶点）和边（连接器）来表示。

引用

树是一种无向图，其中任意两个节点之间只有一条路径。在树中，每个节点（根节点除外）恰好有一个父节点。

表示图最常见的方法是使用邻接矩阵。在该矩阵中，如果节点i到节点j有一条边，则元素A[i][j]为1，否则为0。例如，上述无向图的邻接矩阵是

  | 1 2 3 4
------------
1 | 0 1 0 1
2 | 1 0 1 0
3 | 0 1 0 1
4 | 1 0 1 0

另一种常见的方法是通过邻接表（数据的列表格式而不是矩阵）。

代码

我将根据问题需求使用一种广度优先图算法。

import collections
from collections import deque 

class Solution(object):
    # method that will help find the path
    def ladderLength(self, beginWord, 
                        endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: Set[str]
        :returntype: int
        """

        # Queue for BFS
        queue = deque()

        # start by adding begin word
        queue.append((beginWord, [beginWord]))

        while queue:
            # let's keep a watch at active queue
            print('Current queue:',queue)

            # get the current node and 
            # path how it came
            node, path = queue.popleft()

            # let's keep track of path length 
            # traversed so far
            print('Current transformation count:',
                                        len(path))

            # find possible next set of 
            # child nodes, 1 diff
            for next in self.next_nodes(node, 
                            wordList) - set(path):
                # traversing through all child nodes
                # if any of the child matches, 
                # we are good               
                if next == endWord:
                    print('found endword at path:',
                                            path)
                    return len(path)
                else:
                    # keep record of next 
                    # possible paths
                    queue.append((next, 
                                path + [next]))
        return 0

    def next_nodes(self, word, word_list):
        # start with empty collection
        possiblenodes = set()

        # all the words are of fixed length
        wl_word_length = len(word)

        # loop through all the words in 
        # the word list
        for wl_word in word_list:
            mismatch_count = 0

            # find all the words that are 
            # only a letter different from 
            # current word those are the 
            # possible next child nodes
            for i in range(wl_word_length):
                if wl_word[i] != word[i]:
                    mismatch_count += 1
            if mismatch_count == 1:
                # only one alphabet different-yes
                possiblenodes.add(wl_word)
        
        # lets see the set of next possible nodes 
        print('possible next nodes:',possiblenodes)
        return possiblenodes

# Setup
beginWord = "SAIL"
endWord = "RUIN"
wordList = ["SAIL","RAIN","REST","BAIL","MAIL",
                                    "MAIN","RUIN"]

# Call
print('Transformations needed: ',
    Solution().ladderLength(beginWord, 
                            endWord, wordList))

# Transformation expected == 4
# One possible shortes path with 4 transformation:
# SAIL -> MAIL -> MAIN -> RAIN -> RUIN

引用

使用了 Python 的deque（双端队列）。

deque有助于从两端进行更快的追加和弹出操作。其追加和弹出操作的时间复杂度为 O(1)。相比之下，列表的此操作的时间复杂度为 O(n)。

快速查看代码流程，以验证是否先遍历了特定距离的所有节点，然后再移动到下一级别。

Current queue: deque([('SAIL', ['SAIL'])])

Current transformation count: 1
possible next nodes: {'BAIL', 'MAIL'}
Current queue: deque([('BAIL', ['SAIL', 'BAIL']), 
                      ('MAIL', ['SAIL', 'MAIL'])])

Current transformation count: 2
possible next nodes: {'SAIL', 'MAIL'}
Current queue: deque([('MAIL', ['SAIL', 'MAIL']), 
                      ('MAIL', ['SAIL', 'BAIL', 
                       'MAIL'])])

Current transformation count: 2
possible next nodes: {'BAIL', 'MAIN', 'SAIL'}
Current queue: deque([('MAIL', ['SAIL', 'BAIL', 
                                'MAIL']), 
                      ('BAIL', ['SAIL', 'MAIL', 
                                'BAIL']), 
                      ('MAIN', ['SAIL', 'MAIL', 
                                'MAIN'])])

Current transformation count: 3
possible next nodes: {'BAIL', 'MAIN', 'SAIL'}
Current queue: deque([('BAIL', ['SAIL', 'MAIL', 
                                'BAIL']), 
                      ('MAIN', ['SAIL', 'MAIL', 
                                'MAIN']), 
                      ('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN'])])

Current transformation count: 3
possible next nodes: {'SAIL', 'MAIL'}
Current queue: deque([('MAIN', ['SAIL', 'MAIL', 
                                'MAIN']), 
                      ('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN'])])

Current transformation count: 3
possible next nodes: {'RAIN', 'MAIL'}
Current queue: deque([('MAIN', ['SAIL', 'BAIL', 
                                'MAIL', 'MAIN']), 
                      ('RAIN', ['SAIL', 'MAIL', 
                                'MAIN', 'RAIN'])])

Current transformation count: 4
possible next nodes: {'RAIN', 'MAIL'}
Current queue: deque([('RAIN', ['SAIL', 'MAIL', 
                                'MAIN', 'RAIN']), 
                      ('RAIN', ['SAIL', 'BAIL', 
                        'MAIL', 'MAIN', 'RAIN'])])

Current transformation count: 4
possible next nodes: {'MAIN', 'RUIN'}
found endword at path: ['SAIL', 'MAIL', 'MAIN', 
                                        'RAIN']

Transformations needed:  4
Overall path: ['SAIL', 'MAIL', 'MAIN', 
                               'RAIN', 'RUIN']

复杂性

对于我用来查找转换最短路径的上述代码，

时间

在next_nodes中，对于单词列表中的每个单词，我们迭代其长度以查找与之对应的所有中间单词。因此，我们进行了 M×N 次迭代，其中 M 是每个单词的长度，N 是输入单词列表中的总单词数。此外，形成一个中间单词需要 O(M) 时间。这加起来就是 O(M²×N)。

在ladderLength中，BFS 可以访问 N 个单词中的任何一个，对于每个单词，我们需要检查 M 个可能的中间单词。这加起来就是 O(M²×N)。

总而言之，加起来就是 O2(M²×N)，这将被称为O(M²×N)。

空间

在next_nodes中，单词列表中的每个单词都会有 M 个中间组合。对于每个单词，我们需要 M² 的空间来保存与之对应的所有转换。因此，它总共需要 O(M²×N) 的空间。

在ladderLength中，BFS 队列需要 O(M×N) 的空间。

总而言之，加起来就是 O(M²×N) + O(M×N)，这将被称为O(M²×N)。

总结

这可能有点棘手，因此需要一些练习来可视化图以及编写相关的代码。

太好了，现在我们知道如何解决单词阶梯之类的问题了。它还触及了我们可以参考的其他相关常见图算法。

我阅读了以下参考资料，如果需要，其中包含更多详细信息。

继续解决问题！.

如何解决单词阶梯问题？

问题陈述

解释

图数据结构

相关算法

广度优先搜索（BFS）

深度优先搜索（DFS）

最短路径优先或 Dijkstra 算法 (SPF)

代码

复杂性

时间

空间

总结