一个简单的 SGML 解析器和访问者模式的优雅实现

Sacha Barber

4.95/5 (22投票s)

2009 年 1 月 11 日

CPOL

8分钟阅读

98557

372

关于访问者模式及其反射式版本的一瞥。

下载演示项目 - 49.6 KB

介绍

作为软件开发人员，我们的职责就是努力熟悉能让开发更轻松的技巧/窍门。模式是其中一个有助于开发过程的领域。本文将讨论一种设计模式，称为“访问者”模式。现在，有些人可能已经了解访问者模式了，但对于那些不了解的人来说，本文可能会有所帮助。

我还想提及一下我选择现在写访问者模式的原因。原因其实在于我下一篇文章，将与扩展 LINQ 有关。你看，在涵盖了我认为是 LINQ 某些底层工作方式的基本构建块之前，我不想着手进行 LINQ 扩展任务。

为此，本文将同时介绍标准的访问者模式和一种稍好、更灵活的解决方案。

标准访问者模式

对于那些不知道访问者模式是什么的人，维基百科的这段摘录很好地解释了访问者模式的优缺点。

“在面向对象编程和软件工程中，访问者设计模式是一种将算法与其操作的对象结构分离开来的方法。这种分离的一个实际结果是，能够在不修改现有对象结构的情况下向其添加新操作。因此，使用访问者模式有助于符合开闭原则。

本质上，访问者允许在不修改类本身的情况下，向一组类添加新的虚拟函数；相反，会创建一个实现所有适当的虚拟函数特化的访问者类。访问者以实例引用作为输入，并通过双重分派来实现目标。

虽然强大，但与传统的虚拟函数相比，访问者模式确实存在局限性。不可能为没有在每个类中添加小型回调方法的对象创建访问者，并且每个类中的回调方法不能继承到新子类的级别。”

-- http://en.wikipedia.org/wiki/Visitor_pattern

那么这如何转化为代码呢？嗯，一个简单实现的类图看起来是这样的，其中实现是一个需要检查的不同奶酪，但人们可以想象一个需要访问的更好结构，例如 SGML 文档中的元素，或表达式体内的表达式（可能用于一个 System.Linq.Expressions Expression 类内的表达式...可能）。

基本思想是存在一个实现了 Visitor 接口的对象。还有 n 个其他对象，它们都实现了 Visitable 接口，该接口允许接受 Visitor 类型。当接受 Visitor 类型时，会回调用到 Visitor，在 Visitor 对象内的 Visit 方法会接收当前 Visitable 对象。这个机制通常会重复遍历结构中的所有项，例如 Visitable 对象的列表/树。

这个模式的优点在于你可以简单地访问，并且可以确保被访问的初始对象结构是完整的。因此，这个模式非常适合遍历某种结构对象，例如 SGML。

让我们来看一些代码。

让我们开始看看实现通用 Visitable 接口的一些对象。可以看到，这些类（为了本文的缘故，是非常简单的类）都拥有一个 public void Accept(Visitor v) { v.Visit(this); }，所以当调用 Accept() 方法时，会调用 Visitor 对象中匹配调用签名的正确 Visit() 方法。

/// <summary>
/// A Cheese Interface
/// </summary>
interface Visitable 
{ 
    void Accept(Visitor v); 
}

/// <summary>
/// Wensleydale cheese that accepts a Visitor
/// </summary>
sealed class Wensleydale : Visitable 
{
    public String CheeseName { get { return "This is Wensleydale"; } }
    public void Accept(Visitor v) { v.Visit(this); }
}

/// <summary>
/// Gouda cheese that accepts a Visitor
/// </summary>
sealed class Gouda : Visitable 
{
    public String CheeseName { get { return "This is Gouda"; } }
    public void Accept(Visitor v) { v.Visit(this); }
}

/// <summary>
/// Brie cheese that accepts a Visitor
/// </summary>
sealed class Brie : Visitable 
{
    public String CheeseName { get { return "This is Brie"; } }
    public void Accept(Visitor v) { v.Visit(this); }
}

这是一个 Visitor 类型的实现。在这个小例子中，我使用了一个简单的类，但更常见的是，它可能是一个 Form 或某种疯狂的 DSL 创建器，实现了 Visitor 接口。

/// <summary>
/// I have used a simple class here, but you can imagine that
/// this could be a form, or some more meaningful class
/// that does more than print the Name of the Visited item
/// </summary>
class VisitorImplementation : Visitor 
{
    /// <summary>
    /// Visit Wensleydale
    /// </summary>
    public void Visit(Wensleydale w) { Console.WriteLine(w.CheeseName); }

    /// <summary>
    /// Visit Gouda
    /// </summary>
    public void Visit(Gouda g) { Console.WriteLine(g.CheeseName); }

    /// <summary>
    /// Visit Brie
    /// </summary>
    public void Visit(Brie b) { Console.WriteLine(b.CheeseName); }
}

这是一个小的测试，用来检验所有这些。

class Program
{
    static void Main(string[] args)
    {
        //Create some items
        List<Visitable> cheeses = new List<Visitable>()
        {
            new Wensleydale(),
            new Gouda(),
            new Brie(),

        };

        //Now Visit them
        Visitor v = new VisitorImplementation();
        cheeses.ForEach(c => c.Accept(v));


        Console.ReadLine();
    }
}

这是输出

现在这看起来都很棒，那么这种方法的缺点是什么呢？嗯，使用标准的 Visitor 模式，你需要为每个需要访问的项目创建一个 Visitable 类，并在 Visitor 实现对象中添加一小段回调代码。

这可能听起来工作量不大，但想象一下，你正在处理一个包含成千上万种可能标签类型的巨大 SGML 文档，在这种情况下，这种方法实现起来不会那么有趣，对吧？

那么我们该怎么办呢？这正是我自己很久以前问过的问题。实际上，那是在我上大学时，我们必须写一个 HTML 解析器，我们研究了一下，发现反射有答案。我决定复兴我们当时使用的概念，以便向你们展示另一种可能的实现方式，这种方式不需要使用那么多 Visitable 类和那么多回调。正如我之前所说，我写这篇文章的原因是因为它与我即将写的一篇关于 LINQ 的文章有关。

反射式访问者

你已经见识了标准访问者模式的全部风采，那么我们来看看另一种访问者模式的尝试，这次使用反射？

在这个例子中，我选择创建一个小型解析器/反射式访问者，用于解析和访问一个小的 SGML 文档，它支持有限的功能。SGML 支持的标签是 SGML_Tag/H_Tag/P_Tag/Pre_Tag/Title_Tag/Text_Tag。

SGML_Tag 是所有其他支持标签的基类。SGML_Tag 看起来是这样的。

/// <summary>
/// Base class for all SGML_Tag subclasses
/// Provides a very minimal implementation of
/// a SGML_Tag
/// </summary>
public class SGML_Tag
{
    #region Ctor
    public SGML_Tag(String name)
    {
        this.Name = name;
        Children = new List<SGML_Tag>();
    }
    #endregion

    #region Public Properties
    public String Name { get; private set; }
    public List<SGML_Tag> Children { get; private set; }
    public String CodeType
    {
        get
        {
            return String.Empty;
        }
    }
    #endregion

    #region Public Methods
    /// <summary>
    /// Adds a child to the current SGML_Tag
    /// </summary>
    public void AddChild(SGML_Tag child)
    {
        this.Children.Add(child);
    }
    #endregion

    #region Overrides
    public override string ToString()
    {
        return String.Format(Name);
    }
    #endregion
}

而一个典型的子类可能看起来是这样的。

/// <summary>
/// A Title Tag
/// </summary>
public class Title_Tag : SGML_Tag
{
    #region Ctor
    public Title_Tag(String name) : base(name)
    {

    }
    #endregion

    #region Public Properties
    public String Text { get; set; }
    #endregion
}

可以看到，SGML_Tag 有一个子元素列表。这与 SGML 结构一致，因为我们基本上有一个树。解析器实际上会创建一个根 SGML_Tag，它将作为需要访问的项的源。

解析器不是什么

我为这项任务编写的解析器故意设计得很简单，绝非一个可用的解析器。它唯一的作用是识别上面概述的支持的标签，并在解析 SGML 文档时创建一个这些标签的树。仅此而已。原因在于，这足以说明反射和访问者模式的用法。

基本思想是，输入 SGML 文档被解析成 SGML_Tag 对象的树，然后通过反射由另一个对象（在标准访问者模式中是 Visitor）来访问这些对象，其中总只有一个根节点，它包含子 SGML_Tag 对象。

让我们来看看解析器。

public class SimpleGMLParser
{
    private SGML_Tag root;

    /// <summary>
    /// Constructor
    /// </summary>
    public SimpleGMLParser() 
    {
        root = null;
    }

    /// <summary>
    /// Gets the parsed tree of SGML_Tag(s) by returning
    /// the root SGML_Tag which holds child SGML_Tag(s)
    /// </summary>
    [MethodImpl(MethodImplOptions.Synchronized)]
    public SGML_Tag GetParsedTree() 
    {
        return root;
    }

    /// <summary>
    /// Parse input document
    /// </summary>
    [MethodImpl(MethodImplOptions.Synchronized)]
    public Boolean Parse() 
    {
        XElement rawHtml = null;
        using (StreamReader fileReader = new StreamReader(Program.CurrentFile))
            rawHtml = XElement.Parse(fileReader.ReadToEnd());

        root = new SGML_Tag(rawHtml.Name.LocalName);
        
        //loop through 1st level elements
        foreach (XElement xmlItem in rawHtml.Elements())
        {
            DealWithSingleNode(xmlItem, root);
        }

        return true;
    }

    /// <summary>
    /// Does a recursive call by obtaining the elements for the current
    /// XElements Descendants
    /// </summary>
    private void GetElements(XElement xmlParent, SGML_Tag tagParent)
    {
        foreach (XElement xmlItem in xmlParent.Descendants())
        {
            DealWithSingleNode(xmlItem, tagParent);
        }
    }

    /// <summary>
    /// Creates a Tag for a single element, and adds it to the tree of created
    /// SGML_Tag(s)
    /// </summary>
    private void DealWithSingleNode(XElement xmlItem, SGML_Tag tagParent)
    {
        if (xmlItem.HasElements)
        {
            SGML_Tag child = CreateTag(xmlItem.Name.LocalName);
            tagParent.Children.Add(child);
            GetElements(xmlItem, child);
        }
        else
        {
            SGML_Tag child = CreateTag(xmlItem.Name.LocalName);
            tagParent.Children.Add(child);
        }
    }

    /// <summary>
    /// Attempts to create a SGML_Tag or one of its subclasses by examining
    /// the incoming tag value. The tag value is used to do a lookup
    /// against method names in this class using reflection, and if an 
    /// appropriate method is found it is called, 
    /// otherwise a Text_Tag is returned
    /// </summary>
    private SGML_Tag CreateTag(String tag)
    {
        String methodName = tag.ToLower(); ;
        MethodInfo mi;

        // Turn the input into a potential method name.
        methodName = methodName.ReplaceAll(new string[] {"\\D"}, "").ToLower();
        methodName = methodName.Substring(0, 1).ToUpper() 
            + methodName.Substring(1);
        methodName = "Make_" + methodName + "_Tag";

        //Attempt to find the method in this class with methodName name.
        mi = (from MethodInfo m in this.GetType().GetMethods(
                  BindingFlags.Instance | BindingFlags.NonPublic)
              where m.Name.Equals(methodName)
              select m).SingleOrDefault();

        if (mi  == null)
            return Make_Text_Tag(tag);
        else
        {
            //Do a Reflective method invocation call to 
            //correct Make_XXX_Tag method
            return (SGML_Tag)mi.Invoke(this, new Object[] { tag });
        }
    }

    /// <summary>
    /// Creates a new Title_Tag object which is added the current parent
    /// </summary>
    private Title_Tag Make_Title_Tag(String tag)
    {
        Title_Tag newTag = new Title_Tag(tag);
        return newTag;
    }

    /// <summary>
    /// Creates a new H_Tag object which is added the current parent
    /// </summary>
    private H_Tag Make_H_Tag(String tag)
    {
        int level = Int32.Parse(
            tag.ReplaceAll(new string[] { "<","h", ">" }, ""));
        H_Tag newTag = new H_Tag(tag, level);
        return newTag;
    }

    /// <summary>
    /// Creates a new Pre_Tag object which is added the current parent
    /// </summary>
    private Pre_Tag Make_Pre_Tag(String tag)
    {
        Pre_Tag newTag = new Pre_Tag(tag);
        return newTag;
    }

      //Other Make_XXX_Tag methods left out for clarity
    ....
    ....
    ....
 
    #endregion
}

正如我所说，这是一个非常朴素的 SGML 解析器实现。但这没关系，因为这不是本文的重点。你只需要知道，在解析过程中，根 SGML_Tag 属性最终会得到一个已解析的 SGML_Tag(s) 树，然后就可以进行访问了。对于那些感兴趣的人来说，可以看到解析器也使用反射来确定要创建的 SGML_Tag 标签。

为了反射性地访问这个根 SGML_Tag 属性及其所有关联的 SGML_Tag 子项，我们需要一个能够实际处理根 SGML_Tag 的东西。

我使用的是一个标准的 WinForms 应用，但有一个 ViewController 负责大部分窗体交互。ViewController 非常简单，让我们看看它的代码。

/// <summary>
/// A simple controller for FormParsedItems
/// form
/// </summary>
public class ViewController
{
    private TreeNode root = null;

    /// <summary>
    /// Ctor
    /// </summary>
    public ViewController(IView currentView)
    {
        this.CurrentView = currentView;
        this.TreeOfVisitedNodes = this.CurrentView.GetControlByName(
                                       "tvParsedItems") as TreeView;

    }

    public IView CurrentView { get; private set; }
    public TreeView TreeOfVisitedNodes { get; private set; }

    /// <summary>
    /// Create a Parser and parse input document, then
    /// Reflectively visit the tree of parsed SGML_Tag(s)
    /// </summary>
    public void Run()
    {
        SimpleParser.SimpleGMLParser parser = 
        new SimpleParser.SimpleGMLParser();

        if (parser.Parse())
        {
            SGML_Tag rootTag = parser.GetParsedTree();
            if (rootTag != null)
            {
                root = TreeOfVisitedNodes.Nodes.Add("Root Node", rootTag.Name);
                TreeOfVisitedNodes.ExpandAll();
                this.TraverseVisitableNodes(rootTag, root);
            }
        }
    }

    /// <summary>
    /// Recursivly traverses current SGML_Tag children and Visits
    /// each in turn
    /// </summary>
    private void TraverseVisitableNodes(SGML_Tag htmlParent, TreeNode treeParent)
    {
        foreach (SGML_Tag htmlTag in htmlParent.Children)
        {
            if (htmlTag.Children.Count > 0)
            {
                TreeNode childNode = Visit(htmlTag, treeParent);
                TraverseVisitableNodes(htmlTag, childNode);
            }
            else
            {
                Visit(htmlTag, treeParent);
            }
        }
    }

    /// <summary>
    /// Attempts to visit a SGML_Tag or one of its subclasses by examining
    /// the incoming tag value. The tag value is used to do a lookup
    /// against method names in this class using reflection, and if an
    /// appropriate method is found it is called.
    /// </summary>
    private TreeNode Visit(SGML_Tag tag, TreeNode parent)
    {
        String methodName = tag.Name.ToLower();
        MethodInfo mi;

        // Turn the input into a potential method name.
        methodName = methodName.ReplaceAll(
                         new string[] { "\\D" }, "").ToLower();
        methodName = methodName.Substring(0, 1).ToUpper() + methodName.Substring(1);
        methodName = "Visit_" + methodName + "_Tag";

        //Attempt to find the method in this class with methodName name.
        mi = (from MethodInfo m in this.GetType().GetMethods(
            BindingFlags.Instance | BindingFlags.NonPublic)
              where m.Name.Equals(methodName)
              select m).SingleOrDefault();

        if (mi == null)
            return null;
        else
        {
            return (TreeNode)mi.Invoke(this, new Object[] { tag, parent});
        }
    }

    /// <summary>
    /// Visits a H_Tag
    /// </summary>
    private TreeNode Visit_H_Tag(H_Tag ht, TreeNode parent)
    {
        TreeNode newParent = parent.Nodes.Add(ht.Name, ht.Name);
        TreeOfVisitedNodes.ExpandAll();
        Console.WriteLine(String.Format("Visiting H_Tag {0}", ht.ToString()));
        return newParent;
    }

    /// <summary>
    /// Visits a Text_Tag
    /// </summary>
    private TreeNode Visit_Text_Tag(Text_Tag tt, TreeNode parent)
    {
        TreeNode newParent = parent.Nodes.Add(tt.Name, tt.Name);
        TreeOfVisitedNodes.ExpandAll();
        Console.WriteLine(String.Format("Visiting Text_Tag {0}", tt.ToString()));
        return newParent;
    }

       //Other Make_XXX_Tag methods left out for clarity
    ....
    ....
    ....
}

正如你所见，没有 Visitor/Visitable 接口；相反，我们依赖类型本身来为我们提供一个应该用于执行访问的方法的名称。现在，这种方法可能不适合所有人，但值得思考一下，看看这种方法是否适合你。

运行应用，使用以下 SGML 文档。

<html>
    <p>
        <h1>Heading1</h1>
        <h2>Heading2</h2>
    </p>
    <h2>Heading2</h2>
</html>

我们得到以下结果。

其他不错的附加功能

我写了一个小扩展方法，对于在字符串中替换某些值的出现非常有用；如下所示。

/// <summary>
/// String Extension methods, and associayed helpers
/// </summary>
public static class StringExtensionMethods
{
    #region String.ReplaceAll(..)
    /// <summary>
    /// Replaces all instances of the strings in the unwanted string 
    /// array within the inputString with the replacement string 
    /// </summary>
    public static String ReplaceAll(this String inputString, 
        String[] unwanted, String replacement)
    {
        for (int i = 0; i < unwanted.Length; i++)
        {
            if (unwanted[i].Equals("\\D"))
            {
                inputString =StringExtensionMethods.StripNumbers(inputString);
            }
            else
            {
                inputString = inputString.Replace(unwanted[i], replacement);
            }
        }
        return inputString;
    }
    #endregion

    #region Private Helper Methods
    /// <summary>
    /// Strips all numbers from an input string, and returns
    /// the stripped string
    /// </summary>
    private static String StripNumbers(string input)
    {
        Regex regEx = new Regex("[0-9]+");
        StringBuilder sb = new StringBuilder();
        foreach (char a in input)
        {
            if (!regEx.IsMatch(a.ToString()))
            {
                sb.Append(a);
            }
        }

        return sb.ToString();
    }
    #endregion

}

结论

正如你所见，我们可以通过使用反射来简化标准的访问者模式。下次，我们将探讨访问者模式在 LINQ 中的作用。特别是，我们将研究 System.Linq.Expressions 命名空间如何使用访问者模式。

总之，这次我关于这个模式想说的就这些了，但我希望如果你喜欢这篇文章，你会在我下一篇关于 LINQ 的文章中继续阅读更多关于它的内容。

你怎么看？

我想问一下，如果你喜欢这篇文章，或者觉得它很有用，请投一票并发表评论。非常感谢。