65.9K
CodeProject 正在变化。 阅读更多。
Home

C# 中的隐马尔可夫模型

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.93/5 (49投票s)

2010年3月30日

CPOL

9分钟阅读

viewsIcon

343642

downloadIcon

8474

C# 中的隐马尔可夫模型

隐马尔可夫模型 (HMM) 是对时间序列和序列数据进行建模的随机方法。它们尤其以在时间模式识别中的应用而闻名,例如语音手写手势识别词性标注乐谱跟踪、局部放电生物信息学

此代码也已整合到 Accord.NET Framework 中,该框架包含此代码的最新版本以及许多其他的统计和机器学习工具

目录

  1. 引言
  2. 定义
    1. 符号
    2. 典型问题
    3. 选择结构
  3. 算法
    1. 评估版
    2. 解码
    3. 学习
  4. 使用代码
  5. 备注
    1. 已知问题
  6. 致谢
  7. 另请参阅
  8. 参考文献

引言

隐马尔可夫模型最早由 Leonard E. Baum 等人在 20 世纪 60 年代后半叶的一系列统计学论文中提出。HMM 的早期应用之一是语音识别,始于 20 世纪 70 年代中期。事实上,该主题最全面的解释之一发表在 Lawrence R. Rabiner 于 1989 年发表的《"A Tutorial On Hidden Markov Models And Selected Applications in Speech Recognition"》一文中。在 20 世纪 80 年代后半叶,HMMs 开始应用于生物序列分析,特别是DNA。自那时以来,它们已在生物信息学领域无处不在。

假设离散性质的动态系统由马尔可夫链控制,并发出可观察输出的序列。根据马尔可夫假设,也假设最新的输出仅取决于系统的当前状态。当只有输出值可观察时,观察者通常不知道这些状态。

Example of a hidden Markov model

隐马尔可夫模型试图对这类系统进行建模,并允许(除其他外)(1) 推断产生给定输出序列的最可能状态序列,(2) 推断将要出现的下一个最可能状态(从而预测下一个输出),以及 (3) 计算给定输出序列源自该系统的概率(允许使用隐马尔可夫模型进行序列分类)。

隐马尔可夫模型中的“隐式”来自这样一个事实:观察者不知道系统可能处于哪个状态,而只能对其可能处于的状态有一个概率性的洞察。

定义

隐马尔可夫模型可以看作是有限状态机,其中每个序列单位观察都对应一个状态转换,并且每个状态都有一个输出符号发射。

符号

传统上,HMMs 由以下五元组定义

\lambda = (N, M, A, B, \pi)

其中

  • N 是模型的国家数量
  • M 是每个状态的独特观察符号的数量,即离散字母表大小
  • A 是 NxN 状态转移概率分布,形式为矩阵 A = {aij}
  • B 是 NxM 观察符号概率分布,形式为矩阵 B = {bj(k)}
  • π 是初始状态分布向量 π = {πi}

请注意,如果我们省略结构参数 M 和 N,我们得到更常用的紧凑表示法

\lambda = (A, B, \pi)

典型问题

与隐马尔可夫模型相关的有三个典型问题,我将从维基百科摘录如下:

  1. 给定模型的参数,计算特定输出序列的概率。这需要对所有可能的状态序列进行求和,但可以使用前向算法有效地完成,这是一种动态规划
  2. 给定模型的参数和一个特定的输出序列,找到最有可能生成该输出序列的状态序列。这需要对所有可能的状态序列进行最大化,但可以通过维特比算法有效地解决。
  3. 给定一个输出序列或一组输出序列,找到最有可能的状态转移和输出概率集。换句话说,根据一组输出序列的输出序列数据集,推导出 HMM 参数的最大似然估计。目前没有已知的可解算的算法可以精确解决此问题,但可以使用Baum-Welch 算法 Baldi-Chauvin 算法有效地推导出局部最大似然。 Baum-Welch 算法前向-后向算法的一个示例,也是期望最大化算法的一个特例。

这些问题的解决方案正是使隐马尔可夫模型有用的原因。从数据中学习(通过解决问题 3 的解决方案)然后能够进行预测(解决方案 2)和分类序列(解决方案 2)的能力,无非是应用机器学习。从这个角度来看,HMMs 可以仅仅被视为具有一些其他有用有趣属性的监督式序列分类器和序列预测器

选择结构

选择隐马尔可夫模型的结构并不总是显而易见的。状态的数量取决于应用以及人们愿意如何解释隐藏状态。构建合适的模型以及选择 HMM 可以采用的初始参数需要一些领域知识。还会涉及一些反复试验,并且有时需要在模型复杂性和学习难度之间进行复杂的权衡,就像大多数机器学习技术一样。

有关更多信息,请在此处查找。

算法

三个典型问题的解决方案是使 HMM 有用的算法。三个问题分别在下面的三个子部分中进行描述。

评估版

第一个典型问题是评估特定输出序列的概率。可以使用 Viterbi-forward 或 Forward 算法有效地计算它,这两种算法都是动态规划的一种形式。

维特比算法最初计算生成观察序列的最可能状态序列。在此过程中,它还可以返回遍历该特定状态序列的概率。因此,要获得维特比概率,请参考下面提到的解码问题。

前向算法与维特比算法不同,它不寻找特定的状态序列;而是计算任何状态序列产生观察序列的概率。在这两种算法中,都使用一个矩阵来存储关于模型可以假设的可能状态序列路径的计算。前向算法在学习问题中也起着关键作用,因此被实现为一个单独的方法。

/// <summary>
///   Calculates the probability that this model has generated the given sequence.
/// </summary>
/// <remarks>
///   Evaluation problem. Given the HMM  M = (A, B, pi) and  the observation
///   sequence O = {o1, o2, ..., oK}, calculate the probability that model
///   M has generated sequence O. This can be computed efficiently using the
///   either the Viterbi or the Forward algorithms.
/// </remarks>
/// <param name="observations" />
///   A sequence of observations.
///
/// <param name="logarithm" />
///   True to return the log-likelihood, false to return
///   the likelihood. Default is false.
///
/// <returns>
///   The probability that the given sequence has been generated by this model.
/// </returns>
public double Evaluate(int[] observations, bool logarithm)
{
    if (observations == null)
        throw new ArgumentNullException("observations");

    if (observations.Length == 0)
        return 0.0;


    // Forward algorithm
    double likelihood = 0;
    double[] coefficients;

    // Compute forward probabilities
    forward(observations, out coefficients);

    for (int i = 0; i < coefficients.Length; i++)
        likelihood += Math.Log(coefficients[i]);

    // Return the sequence probability
    return logarithm ? likelihood : Math.Exp(likelihood);
}

/// <summary>
///   Baum-Welch forward pass (with scaling)
/// </summary>
/// <remarks>
///   Reference: http://courses.media.mit.edu/2010fall/mas622j/ProblemSets/ps4/tutorial.pdf
/// </remarks>
private double[,] forward(int[] observations, out double[] c)
{
    int T = observations.Length;
    double[] pi = Probabilities;
    double[,] A = Transitions;

    double[,] fwd = new double[T, States];
    c = new double[T];


    // 1. Initialization
    for (int i = 0; i < States; i++)
        c[0] += fwd[0, i] = pi[i] * B[i, observations[0]];

    if (c[0] != 0) // Scaling
    {
        for (int i = 0; i < States; i++)
            fwd[0, i] = fwd[0, i] / c[0];
    }


    // 2. Induction
    for (int t = 1; t < T; t++)
    {
        for (int i = 0; i < States; i++)
        {
            double p = B[i, observations[t]];

            double sum = 0.0;
            for (int j = 0; j < States; j++)
                sum += fwd[t - 1, j] * A[j, i];
            fwd[t, i] = sum * p;

            c[t] += fwd[t, i]; // scaling coefficient
        }

        if (c[t] != 0) // Scaling
        {
            for (int i = 0; i < States; i++)
                fwd[t, i] = fwd[t, i] / c[t];
        }
    }

    return fwd;
}

解码

第二个典型问题是发现生成给定输出序列的最可能状态序列。可以使用维特比算法有效地计算它。使用回溯来检测算法遍历的最大概率路径。在该过程中还会计算遍历该序列的概率。

/// <summary>
///   Calculates the most likely sequence of hidden states
///   that produced the given observation sequence.
/// </summary>
/// <remarks>
///   Decoding problem. Given the HMM M = (A, B, pi) and  the observation sequence
///   O = {o1,o2, ..., oK}, calculate the most likely sequence of hidden states Si
///   that produced this observation sequence O. This can be computed efficiently
///   using the Viterbi algorithm.
/// </remarks>
/// <param name="observations" />A sequence of observations.
/// <param name="probability" />The state optimized probability.
/// <returns>The sequence of states that most likely produced the sequence.</returns>
public int[] Decode(int[] observations, out double probability)
{
    // Viterbi algorithm.

    int T = observations.Length;
    int states = States;
    int minState;
    double minWeight;
    double weight;

    int[,] s = new int[states, T];
    double[,] a = new double[states, T];

    // Base
    for (int i = 0; i < states; i++)
    {
        a[i, 0] = (-1.0 * System.Math.Log(pi[i])) - System.Math.Log(B[i, observations[0]]);
    }

    // Induction
    for (int t = 1; t < T; t++)
    {
        for (int j = 0; j < states; j++)
        {
            minState = 0;
            minWeight = a[0, t - 1] - System.Math.Log(A[0, j]);

            for (int i = 1; i < states; i++)
            {
                weight = a[i, t - 1] - System.Math.Log(A[i, j]);

                if (weight < minWeight)
                {
                    minState = i;
                    minWeight = weight;
                }
            }

            a[j, t] = minWeight - System.Math.Log(B[j, observations[t]]);
            s[j, t] = minState;
        }
    }

    // Find minimum value for time T-1
    minState = 0;
    minWeight = a[0, T - 1];

    for (int i = 1; i < states; i++)
    {
        if (a[i, T - 1] < minWeight)
        {
            minState = i;
            minWeight = a[i, T - 1];
        }
    }

    // Trackback
    int[] path = new int[T];
    path[T - 1] = minState;

    for (int t = T - 2; t >= 0; t--)
        path[t] = s[path[t + 1], t + 1];


    probability = System.Math.Exp(-minWeight);
    return path;
}

学习

第三个也是最后一个问题是学习最有可能的参数,这些参数在给定一组来自该系统的序列时最能模拟该系统。我所见的大多数实现都没有考虑从一组序列中学习的问题,而只是一次从一个序列中学习。但是,下面的算法完全适用于从一组序列中学习,并且还使用了缩放,这是我在其他实现中没有见过的。

源代码遵循 Rabiner (1989) 的原始算法。但是,Rabiner 的论文中详细介绍的算法存在一些已知的问题。有关这些问题的更多信息,可以在本文的另一部分““备注””中找到。

/// <summary>
///   Runs the Baum-Welch learning algorithm for hidden Markov models.
/// </summary>
/// <remarks>
///   Learning problem. Given some training observation sequences O = {o1, o2, ..., oK}
///   and general structure of HMM (numbers of hidden and visible states), determine
///   HMM parameters M = (A, B, pi) that best fit training data.
/// </remarks>
/// <param name="iterations" />
///   The maximum number of iterations to be performed by the learning algorithm. If
///   specified as zero, the algorithm will learn until convergence of the model average
///   likelihood respecting the desired limit.
///
/// <param name="observations" />
///   An array of observation sequences to be used to train the model.
///
/// <param name="tolerance" />
///   The likelihood convergence limit L between two iterations of the algorithm. The
///   algorithm will stop when the change in the likelihood for two consecutive iterations
///   has not changed by more than L percent of the likelihood. If left as zero, the
///   algorithm will ignore this parameter and iterates over a number of fixed iterations
///   specified by the previous parameter.
///
/// <returns>
///   The average log-likelihood for the observations after the model has been trained.
/// </returns>
public double Learn(int[][] observations, int iterations, double tolerance)
{
    if (iterations == 0 && tolerance == 0)
        throw new ArgumentException("Iterations and limit cannot be both zero.");

    // Baum-Welch algorithm.

    // The Baum–Welch algorithm is a particular case of a generalized expectation-maximization
    // (GEM) algorithm. It can compute maximum likelihood estimates and posterior mode estimates
    // for the parameters (transition and emission probabilities) of an HMM, when given only
    // emissions as training data.

    // The algorithm has two steps:
    //  - Calculating the forward probability and the backward probability for each HMM state;
    //  - On the basis of this, determining the frequency of the transition-emission pair values
    //    and dividing it by the probability of the entire string. This amounts to calculating
    //    the expected count of the particular transition-emission pair. Each time a particular
    //    transition is found, the value of the quotient of the transition divided by the probability
    //    of the entire string goes up, and this value can then be made the new value of the transition.


    int N = observations.Length;
    int currentIteration = 1;
    bool stop = false;

    double[] pi = Probabilities;
    double[,] A = Transitions;


    // Initialization
    double[][, ,] epsilon = new double[N][, ,]; // also referred as ksi or psi
    double[][,] gamma = new double[N][,];

    for (int i = 0; i < N; i++)
    {
        int T = observations[i].Length;
        epsilon[i] = new double[T, States, States];
        gamma[i] = new double[T, States];
    }


    // Calculate initial model log-likelihood
    double oldLikelihood = Double.MinValue;
    double newLikelihood = 0;


    do // Until convergence or max iterations is reached
    {
        // For each sequence in the observations input
        for (int i = 0; i < N; i++)
        {
            var sequence = observations[i];
            int T = sequence.Length;
            double[] scaling;

            // 1st step - Calculating the forward probability and the
            //            backward probability for each HMM state.
            double[,] fwd = forward(observations[i], out scaling);
            double[,] bwd = backward(observations[i], scaling);


            // 2nd step - Determining the frequency of the transition-emission pair values
            //            and dividing it by the probability of the entire string.


            // Calculate gamma values for next computations
            for (int t = 0; t < T; t++)
            {
                double s = 0;

                for (int k = 0; k < States; k++)
                    s += gamma[i][t, k] = fwd[t, k] * bwd[t, k];

                if (s != 0) // Scaling
                {
                    for (int k = 0; k < States; k++)
                        gamma[i][t, k] /= s;
                }
            }

            // Calculate epsilon values for next computations
            for (int t = 0; t < T - 1; t++)
            {
                double s = 0;

                for (int k = 0; k < States; k++)
                    for (int l = 0; l < States; l++)
                        s += epsilon[i][t, k, l] = fwd[t, k] * A[k, l] * 
                                                   bwd[t + 1, l] * B[l, sequence[t + 1]];

                if (s != 0) // Scaling
                {
                    for (int k = 0; k < States; k++)
                        for (int l = 0; l < States; l++)
                            epsilon[i][t, k, l] /= s;
                }
            }

            // Compute log-likelihood for the given sequence
            for (int t = 0; t < scaling.Length; t++)
                newLikelihood += Math.Log(scaling[t]);
        }

        // Average the likelihood for all sequences
        newLikelihood /= observations.Length;


        // Check if the model has converged or we should stop
        if (checkConvergence(oldLikelihood, newLikelihood,
            currentIteration, iterations, tolerance))
        {
            stop = true;
        }

        else
        {
            // 3. Continue with parameter re-estimation
            currentIteration++;
            oldLikelihood = newLikelihood;
            newLikelihood = 0.0;


            // 3.1 Re-estimation of initial state probabilities
            for (int k = 0; k < States; k++)
            {
                double sum = 0;
                for (int i = 0; i < N; i++)
                    sum += gamma[i][0, k];
                pi[k] = sum / N;
            }

            // 3.2 Re-estimation of transition probabilities
            for (int i = 0; i < States; i++)
            {
                for (int j = 0; j < States; j++)
                {
                    double den = 0, num = 0;

                    for (int k = 0; k < N; k++)
                    {
                        int T = observations[k].Length;

                        for (int l = 0; l < T - 1; l++)
                            num += epsilon[k][l, i, j];

                        for (int l = 0; l < T - 1; l++)
                            den += gamma[k][l, i];
                    }

                    A[i, j] = (den != 0) ? num / den : 0.0;
                }
            }

            // 3.3 Re-estimation of emission probabilities
            for (int i = 0; i < States; i++)
            {
                for (int j = 0; j < Symbols; j++)
                {
                    double den = 0, num = 0;

                    for (int k = 0; k < N; k++)
                    {
                        int T = observations[k].Length;

                        for (int l = 0; l < T; l++)
                        {
                            if (observations[k][l] == j)
                                num += gamma[k][l, i];
                        }

                        for (int l = 0; l < T; l++)
                            den += gamma[k][l, i];
                    }

                    // avoid locking a parameter in zero.
                    B[i, j] = (num == 0) ? 1e-10 : num / den;
                }
            }

        }

    } while (!stop);


    // Returns the model average log-likelihood
    return newLikelihood;
}

Using the Code

假设我们已经从我们希望建模的系统中收集了一些序列。序列表示为整数数组,例如

int[][] sequences = new int[][]
{
    new int[] { 0,1,1,1,1,1,1 },
    new int[] { 0,1,1,1 },
    new int[] { 0,1,1,1,1 },
    new int[] { 0,1, },
    new int[] { 0,1,1 },
};

对我们来说,显而易见的是,系统输出的序列总是以零开头,并在末尾有一个或多个一。但是,让我们尝试拟合一个隐马尔可夫模型来预测这些序列。

// Creates a new Hidden Markov Model with 2 states for
//  an output alphabet of two characters (zero and one)
HiddenMarkovModel hmm = new HiddenMarkovModel(2, 2);

// Try to fit the model to the data until the difference in
//  the average likelihood changes only by as little as 0.01
hmm.Learn(sequences, 0.01);

模型训练完成后,让我们测试一下它是否能识别某些序列

// Calculate the probability that the given
//  sequences originated from the model
double l1 = hmm.Evaluate(new int[] { 0, 1 });       // l1 = 0.9999
double l2 = hmm.Evaluate(new int[] { 0, 1, 1, 1 }); // l2 = 0.9999

double l3 = hmm.Evaluate(new int[] { 1, 1 });       // l3 = 0.0000
double l4 = hmm.Evaluate(new int[] { 1, 0, 0, 0 }); // l4 = 0.0000

当然,模型表现得很好,因为这是一个相当简单的例子。一个更有用的测试用例将包括允许输入序列中的一些错误,希望模型能变得更能容忍测量误差。

int[][] sequences = new int[][]
{
    new int[] { 0,1,1,1,1,0,1,1,1,1 },
    new int[] { 0,1,1,1,0,1,1,1,1,1 },
    new int[] { 0,1,1,1,1,1,1,1,1,1 },
    new int[] { 0,1,1,1,1,1         },
    new int[] { 0,1,1,1,1,1,1       },
    new int[] { 0,1,1,1,1,1,1,1,1,1 },
    new int[] { 0,1,1,1,1,1,1,1,1,1 },
};

// Creates a new Hidden Markov Model with 3 states for
//  an output alphabet of two characters (zero and one)
HiddenMarkovModel hmm = new HiddenMarkovModel(2, 3);

// Try to fit the model to the data until the difference in
//  the average likelihood changes only by as little as 0.0001
hmm.Learn(sequences, 0.0001);

// Calculate the probability that the given
//  sequences originated from the model
double l1 = hmm.Evaluate(new int[] { 0,1 });      // 0.9999
double l2 = hmm.Evaluate(new int[] { 0,1,1,1 });  // 0.9166

double l3 = hmm.Evaluate(new int[] { 1,1 });      // 0.0000
double l4 = hmm.Evaluate(new int[] { 1,0,0,0 });  // 0.0000

double l5 = hmm.Evaluate(new int[] { 0,1,0,1,1,1,1,1,1 }); // 0.0342
double l6 = hmm.Evaluate(new int[] { 0,1,1,1,1,1,1,0,1 }); // 0.0342

我们可以看到,尽管似然值非常低,但包含模拟测量误差的序列的似然值大于完全不遵循序列结构的序列的似然值。

后续文章中,我们将看到这些低的似然值不是问题,因为HMMs 经常被用于组合形成序列分类器。当以这种方式配置使用时,真正重要的是哪个 HMM 在集合中返回最高的概率。

备注

在将隐马尔可夫模型用于长序列建模时,一个实际问题是条件概率的数值缩放。给定大多数模型,观察长序列的概率非常小,并且在计算中使用这些极小的数字常常导致数值不稳定,使得 HMMs 应用于基因组长度序列相当具有挑战性。

处理小条件概率有两种常用方法。一种方法是使用精心设计的缩放因子重新缩放条件概率,另一种方法是使用条件概率的对数进行计算。有关使用对数的更多信息,请参阅 Tobias P. Mann 的题为“Numerically Stable Hidden Markov Model Implementation”的工作。

已知问题

本文中的代码基于 Rabiner 的教程。但是,缩放和其他算法存在一些问题。有关所有问题的勘误可以在网站“An Erratum for ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition’”上找到,并由 Ali Rahimi 维护。我还没有核实此处提供的实现是否也存在同样的错误。此代码在许多情况下都运行良好,但我无法保证其完美性。请自行承担风险使用。

致谢

感谢 Guilherme C. Pedroso 在 Baum-Welch 推广到多个输入序列方面的帮助。他还合著了一篇使用隐马尔可夫模型进行手势识别的非常有趣的文章,题为“Automatic Recognition of Finger Spelling for LIBRAS based on a Two-Layer Architecture”,发表在第 25 届应用计算研讨会 (ACM SAC 2010) 上。

另请参阅

参考文献

© . All rights reserved.