MS Elmax: MSXML C++ DOM 解析器包装器

Shao Voon Wong

4.77/5 (73投票s)

2016年4月12日

Ms-PL

20分钟阅读

200147

1641

自 XML 问世以来就该写的 C++ XML 解析文章！本文定义了一个在 DOM 模型之上的新的 Elmax 抽象模型。

下载 ms_elmax-0.9.1.zip - 171.1 KB

简单代码示例

<Books>
  <Book>
    <Price>12.990000</Price>
  </Book>
</Books>

要创建上面的 XML，请参阅下面的 C++ 代码

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"] = 12.99f;

代码的第三行检测到三个元素不存在，并且 float 赋值将尝试创建这三个元素，并将 12.99f 转换为 string 并赋值给 Price 元素。要读取 Price 元素，我们只需将其赋值给 float 变量（如下所示）

Elmax::Element root;
root.SetDomDoc(pDoc); // An XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = 
  root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice;

在读取 Price 元素之前，使用 Exists() 检查 Price 元素是否存在，这是一个好习惯。

引言

多年来，作为一名 C++ 软件开发人员，我曾不得不偶尔维护应用程序项目文件的 XML 文件格式。我发现 DOM 难以导航和使用。我遇到过许多声称易于使用的文章和 XML 库，但没有一个像我的前同事 Srikumar Karaikudi Subramanian 和 Ali Akber Saifee 共同开发的内部 XML 库那样简单。Srikumar 编写了第一个版本，只能从 XML 文件读取，之后 Ali 添加了节点创建功能，允许将内容保存到 XML 文件。但是，该库是专有的。离开公司后，我失去了使用一个真正易于使用的 XML 库的机会。与许多有才华的程序员不同，我是个白痴；我需要一个傻瓜式 XML 库。可惜，LINQ-to-XML (Xinq) 在 C++/CLI 中不可用！我决定重建 Srikumar 和 Ali 的 XML 库，并将其开源！我将本文献给 Srikumar Karaikudi Subramanian 和 Ali Akber Saifee！

XML 与二进制序列化

在本节中，在讨论 Elmax 之前，我们先看看 XML 相对于二进制序列化的优点。我将不讨论 XML 序列化，因为我对它不熟悉。下面是一个在线书店的简化（版本 1）文件格式

Version=1
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
Authors
  Author*
    Name
    AuthorID

子元素缩进在父元素之下。可以出现多个（数量大于 1）的元素将附加星号（*）。下图显示了（版本 1）二进制序列化文件格式通常的样子

假设在版本 2 中，我们在 Book 下添加了 Description，在 Author 下添加了 Biography。

Version=2
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
    Description(new)
Authors
  Author*
    Name
    AuthorID
    Biography(new)

下图显示了二进制序列化文件格式的版本 1 和版本 2。版本 2 中新增的部分以较浅的颜色显示。

请注意，版本 1 和版本 2 在二进制上是不兼容的？下面是二进制（**注意**：不是二进制序列化）文件格式选择如何实现它

Version=2
Books
  Book*
    ISBN
    Title
    Price
    AuthorID
Authors
  Author*
    Name
    AuthorID
Description(new)*
Biography(new)*

这样，应用程序的版本 1 仍然可以读取版本 2 的二进制文件，而忽略文件末尾新增的部分。如果使用 XML 并且不进行任何额外工作，应用程序的版本 1 仍然可以读取版本 2 的 XML 文件（**前向兼容**），而忽略新增的元素，前提是原始元素的**数据类型保持不变且未被移除**。版本 2 应用程序可以通过使用旧的解析代码读取版本 1 的 XML 文件（**向后兼容**）。XML 解析的缺点是它比二进制文件格式慢，并且占用更多空间，但 XML 文件是自描述的。

下面是一个我将文件格式实现为 XML 的示例，后面是一个创建 XML 文件的代码示例

<?xml version="1.0" encoding="UTF-8"?>
<All>
  <Version>2</Version>
  <Books>
    <Book ISBN="1111-1111-1111">
      <Title>How not to program!</Title>
      <Price>12.990000</Price>
      <Desc>Learn how not to program from the industry`s 
        worst programmers! Contains lots of code examples 
        which programmers should avoid! Treat it as reverse 
        education.</Desc>
      <AuthorID>111</AuthorID>
    </Book>
    <Book ISBN="2222-2222-2222">
      <Title>Caught with my pants down</Title>
      <Price>10.000000</Price>
      <Desc>Novel about extra-martial affairs</Desc>
      <AuthorID>111</AuthorID>
    </Book>
  </Books>
  <Authors>
    <Author Name="Wong Shao Voon" AuthorID="111">
      <Bio>World`s most funny author!</Bio>
    </Author>
  </Authors>
</All>

代码

#import <msxml6.dll>
using namespace MSXML2; 

HRESULT CTryoutDlg::CreateAndInitDom(
    MSXML2::IXMLDOMDocumentPtr& pDoc)
{
    HRESULT hr = pDoc.CreateInstance(__uuidof(MSXML2::DOMDocument30));
    if (SUCCEEDED(hr))
    {
        // these methods should not fail so don't inspect result
        pDoc->async = VARIANT_FALSE;
        pDoc->validateOnParse = VARIANT_FALSE;
        pDoc->resolveExternals = VARIANT_FALSE;
        MSXML2::IXMLDOMProcessingInstructionPtr pi = 
            pDoc->createProcessingInstruction
                (L"xml", L" version='1.0' encoding='UTF-8'");
        pDoc->appendChild(pi);
    }
    return hr;
}

bool CTryoutDlg::SaveXml(
    MSXML2::IXMLDOMDocumentPtr& pDoc, 
    const std::wstring& strFilename)
{
    TCHAR szPath[MAX_PATH];

    if(SUCCEEDED(SHGetFolderPath(NULL, 
        CSIDL_LOCAL_APPDATA|CSIDL_FLAG_CREATE, 
        NULL, 
        0, 
        szPath))) 
    {
        PathAppend(szPath, strFilename.c_str());
    }

    variant_t varFile(szPath);
    return SUCCEEDED(pDoc->save(varFile));
}

void CTryoutDlg::TestWrite()
{
    MSXML2::IXMLDOMDocumentPtr pDoc;
    HRESULT hr = CreateAndInitDom(pDoc);
    if (SUCCEEDED(hr))
    {
        using namespace Elmax;
        using namespace std;
        Element root;
        root.SetConverter(NORMAL_CONV);
        root.SetDomDoc(pDoc);

        Element all = root[L"All"];
        all[L"Version"] = 2;
        Element books = all[L"Books"].CreateNew();
        Element book1 = books[L"Book"].CreateNew();
        book1.Attribute(L"ISBN") = L"1111-1111-1111";
        book1[L"Title"] = L"How not to program!";
        book1[L"Price"] = 12.99f;
        book1[L"Desc"] = L"Learn how not to program from the industry`s 
            worst programmers! Contains lots of code examples which 
            programmers should avoid! Treat it as reverse education.";
        book1[L"AuthorID"] = 111;

        Element book2 = books[L"Book"].CreateNew();
        book2.Attribute(L"ISBN") = L"2222-2222-2222";
        book2[L"Title"] = L"Caught with my pants down";
        book2[L"Price"] = 10.00f;
        book2[L"Desc"] = L"Novel about extra-martial affairs";
        book2[L"AuthorID"] = 111;

        Element authors = all[L"Authors"].CreateNew();
        Element author = authors[L"Author"].CreateNew();
        author.Attribute(L"Name") = L"Wong Shao Voon";
        author.Attribute(L"AuthorID") = 111;
        author[L"Bio"] = L"World`s most funny author!";

        std::wstring strFilename = L"Books.xml";
        SaveXml(pDoc, strFilename);
    }
}

这是读取前一个代码片段中保存的 XML 的代码。一些辅助类（DebugPrint）和方法（CreateAndLoadXml 和 DeleteFile）被省略了，以便专注于相关代码。辅助类和方法可以在源代码下载的 Tryout 项目中找到。

void CTryoutDlg::TestRead()
{
    DebugPrint dp;
    MSXML2::IXMLDOMDocumentPtr pDoc;
    std::wstring strFilename = L"Books.xml";
    HRESULT hr = CreateAndLoadXml(pDoc, strFilename);
    if (SUCCEEDED(hr))
    {
        using namespace Elmax;
        using namespace std;
        Element root;
        root.SetConverter(NORMAL_CONV);
        root.SetDomDoc(pDoc);

        Element all = root[L"All"];
        if(all.Exists()==false)
        {
            dp.Print(L"Error: root does not exists!");
            return;
        }
        dp.Print(L"Version : {0}\n\n", 
                 all[L"Version"].GetInt32(0));

        dp.Print(L"Books\n");
        dp.Print(L"=====\n");
        Element books = all[L"Books"];
        if(books.Exists())
        {
            Element::collection_t vecBooks = 
                books.GetCollection(L"Book");
            for(size_t i=0; i<vecBooks.size(); ++i)
            {
                dp.Print(L"ISBN: {0}\n", 
                    vecBooks[i].Attribute(L"ISBN").GetString(L"Error"));
                dp.Print(L"Title: {0}\n", 
                    vecBooks[i][L"Title"].GetString(L"Error"));
                dp.Print(L"Price: {0}\n", 
                    vecBooks[i][L"Price"].GetFloat(0.0f));
                dp.Print(L"Desc: {0}\n", 
                    vecBooks[i][L"Desc"].GetString(L"Error"));
                dp.Print(L"AuthorID: {0}\n\n", 
                    vecBooks[i][L"AuthorID"].GetInt32(-1));
            }
        }

        dp.Print(L"Authors\n");
        dp.Print(L"=======\n");
        Element authors = all[L"Authors"];
        if(authors.Exists())
        {
            Element::collection_t vecAuthors = 
                authors.GetCollection(L"Author");
            for(size_t i=0; i<vecAuthors.size(); ++i)
            {
                dp.Print(L"Name: {0}\n", 
                    vecAuthors[i].Attribute(L"Name").GetString(L"Error"));
                dp.Print(L"AuthorID: {0}\n", 
                    vecAuthors[i].Attribute(L"AuthorID").GetInt32(-1));
                dp.Print(L"Bio: {0}\n\n", 
                    vecAuthors[i][L"Bio"].GetString(L"Error"));
            }
        }
    }
    DeleteFile(strFilename);
}

这是读取 XML 后的输出

Version : 2

Books
=====
ISBN: 1111-1111-1111
Title: How not to program!
Price: 12.990000
Desc: Learn how not to program from the industry`s 
    worst programmers! Contains lots of code examples 
    which programmers should avoid! Treat it as reverse education.
AuthorID: 111

ISBN: 2222-2222-2222
Title: Caught with my pants down
Price: 10.000000
Desc: Novel about extra-martial affairs
AuthorID: 111

Authors
=======
Name: Wong Shao Voon
AuthorID: 111
Bio: World`s most funny author!

库用法

在本节中，我们将了解如何使用 Elmax 库对元素、属性、CData 部分和注释执行创建、读取、更新和删除（CRUD）操作。从前面的代码示例可以看出，Elmax 使用了 Microsoft XML DOM 库。这是因为我不想重新创建所有 XML 功能，例如 XPath。由于 Elmax 依赖于 Microsoft XML，而 Microsoft XML 又依赖于 COM，因此我们必须在应用程序启动时调用 CoInitialize(NULL); 来初始化 COM 运行时，并在应用程序结束前调用 CoUninitialize(); 来取消初始化。Elmax 是 DOM 之上的一个抽象；但是，它并不试图复制 DOM 的所有功能。例如，程序员不能使用 Elmax 读取元素同级节点。在 Elmax 模型中，元素是第一公民。属性、CData 部分和注释是元素的子元素！这与 DOM 不同，在 DOM 中，它们本身就是节点。我将 CData 部分和注释设计为元素的子元素的原因是，CData 部分和注释无法通过名称或 ID 来标识。

元素创建

Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();

通常，我们使用 CreateNew 来创建元素。还有一个 Create 方法。区别在于，如果元素已存在，Create 方法不会创建。请注意，我没有使用 Create 或 CreateNew 来创建 All 和 Version 元素。这是因为当我为链中的最后一个元素赋值时，它们会自动创建。请注意，当你反复调用 CreateNew 时，只有最后一个元素会被创建。让我用一个代码示例来解释这一点。

root[L"aa"][L"bb"][L"cc"].CreateNew();
root[L"aa"][L"bb"][L"cc"].CreateNew();
root[L"aa"][L"bb"][L"cc"].CreateNew();

在第一个 CreateNew 调用中，创建了元素 aa、bb 和 cc。在每个后续调用中，只创建了元素 cc。这是创建的 XML（为了方便阅读而缩进）

<aa>
  <bb>
    <cc/>
    <cc/>
    <cc/>
  </bb>
</aa>

Create 和 CreateNew 有一个可选的 std::wstring 参数来指定命名空间 URI。如果你的元素属于一个命名空间，那么你必须显式地使用 Create 或 CreateNew 来创建它；这意味着你不能依赖赋值来自动创建它。稍后将详细介绍。注意：当调用实例 Element 方法（除了 Create、CreateNew、setter 和 accessor）并且元素不存在时，Elmax 将引发异常！何时使用 Create 而不是 CreateNew？一种可能的场景是应用程序加载 XML 文件，进行编辑，然后保存。在编辑阶段，它在赋值或添加节点之前不会检查元素是否存在于原始 XML 文件中：调用 Create，如果元素不存在则创建它，否则 Create 什么也不做。

元素删除

using namespace Elmax;
Element elem;
Element elemChild = elem[L"Child"];
// do processing
elem.RemoveNode(elemChild); // Remove its child node.
elem.RemoveNode(); // Remove itself from DOM.

注意：对于 AddNode 方法，在当前版本中只能添加已删除的节点。

元素赋值

在文章开头，我展示了如何创建元素并同时为最后一个元素赋值。我将在下面重复那个代码片段

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"] = 12.99f;

事实证明，这个例子很危险，因为它使用了编译器决定的重载赋值运算符。如果你本意是赋值一个 float，但却因为忘记添加 .0 并为 float 值附加 f 而意外地赋值了一个整数？在这种情况下，我猜危害不大。在所有情况下，最好使用 setter 方法显式地赋值。

Elmax::Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"].SetFloat(12.99f);

以下是可用的 setter 方法列表

bool SetBool(bool val);
bool SetChar(char val);
bool SetShort(short val);
bool SetInt32(int val);
bool SetInt64(__int64 val);
bool SetUChar(unsigned char val);
bool SetUShort(unsigned short val);
bool SetUInt32(unsigned int val);
bool SetUInt64(unsigned __int64 val);
bool SetFloat(float val);
bool SetDouble(double val);
bool SetString(const std::wstring& val);
bool SetString(const std::string& val);
bool SetGUID(const GUID& val);
bool SetDate(const Elmax::Date& val);
bool SetDateTime(const Elmax::DateAndTime& val);

元素读取

在文章开头，我展示了如何从元素中读取值。我将在下面重复那个代码片段

Elmax::Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice;

这是更正确的版本，使用 GetFloat accessor 指定一个默认值。

Elmax::Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Elmax::Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice.GetFloat(10.0f);

如果值不存在或无效，Price 将获得默认值 10.0f，而在此之前的示例将获得 0.0f，因为没有指定默认值。但默认情况下，Elmax 不知道 string 值是文本形式的不正确浮点值，除非你使用正则表达式来验证字符串值。在根元素中设置 REGEX_CONV 而不是 NORMAL_CONV 来使用正则表达式类型转换器。作为替代，你可以在进行 Elmax 解析之前，使用 schema 或 DTD 来验证你的 XML。要了解 schema 或 DTD 验证，请参考 MSDN。

Elmax::Element root;
root.SetConverter(REGEX_CONV);

这是 SetConverter 方法的声明

//! Set the type converter pointer
void SetConverter(CONVERTER conv, IConverter* pConv=NULL);

要使用自己的自定义类型转换器，请设置可选的 pConv 指针。

Elmax::Element root;
root.SetConverter(CUSTOM_CONV, pCustomTypeConv);

如果你在堆上分配了 pCustomTypeConv，你负责删除它。Elmax 中有区域类型转换器，但它们目前未经过测试，因为我不知道如何测试它们，因为在亚洲，数字表示在所有国家/地区都相同，不像在欧洲。**注意：**在版本 0.9.0 中，数据转换已更改为使用 Boost lexical_cast；已删除使用普通或 regex 转换的选项。给可能修改 Elmax 的读者一个提示，请记住在修改之后运行所有 429 个单元测试，以确保你没有破坏任何东西。单元测试仅适用于 Visual Studio 2010。下面是可用的值访问器列表

bool GetBool(bool defaultVal) const;
char GetChar(char defaultVal) const;
short GetShort(short defaultVal) const;
int GetInt32(int defaultVal) const;
__int64 GetInt64(__int64 defaultVal) const;
unsigned char GetUChar(unsigned char defaultVal) const;
unsigned short GetUShort(unsigned short defaultVal) const;
unsigned int GetUInt32(unsigned int defaultVal) const;
unsigned __int64 GetUInt64(unsigned __int64 defaultVal) const;
float GetFloat(float defaultVal) const;
double GetDouble(double defaultVal) const;
std::wstring GetString(const std::wstring& defaultVal) const;
std::string GetString(const std::string& defaultVal) const;
GUID GetGUID(const GUID& defaultVal) const;
Elmax::Date GetDate(const Elmax::Date& defaultVal) const;
Elmax::DateAndTime GetDateTime(const Elmax::DateAndTime& defaultVal) const;

对于 GetBool 和布尔值的解释，true、yes、ok 和 1 评估为真，而 false、no、cancel 和 0 评估为假。它们不区分大小写。

命名空间

要在命名空间 URI "http:www.yahoo.com" 下创建元素，请参见下文

using namespace Elmax;
Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();
Element book1 = books[L"Book"].CreateNew(L"http://www.yahoo.com");

XML 输出如下

<?xml version="1.0" encoding="UTF-8"?>
<All>
  <Version>1</Version>
  <Books>
    <Book xmlns="http://www.yahoo.com"/>
  </Books>
</All>

要在一个命名空间 URI 下创建一组元素和属性，请参见下文

using namespace Elmax;
Element all = root[L"All"];
all[L"Version"] = 1;
Element books = all[L"Books"].CreateNew();
Element book1 = books[L"Yahoo:Book"].CreateNew(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN").Create(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN") = L"1111-1111-1111";
book1[L"Yahoo:Title"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Title"] = L"How not to program!";
book1[L"Yahoo:Price"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Price"] = 12.99f;
book1[L"Yahoo:Desc"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:Desc"] = L"Learn how not to program from the industry`s 
    worst programmers! Contains lots of code examples which programmers 
    should avoid! Treat it as reverse education.";
book1[L"Yahoo:AuthorID"].Create(L"http://www.yahoo.com");
book1[L"Yahoo:AuthorID"] = 111;

XML 输出如下

<All>
  <Version>1</Version>
  <Books>
    <Yahoo:Book xmlns:Yahoo="http://www.yahoo.com" 
      Yahoo:ISBN="1111-1111-1111">
      <Yahoo:Title>How not to program!</Yahoo:Title>
      <Yahoo:Price>12.990000</Yahoo:Price>
      <Yahoo:Desc>Learn how not to program from the 
        industry`s worst programmers! Contains lots of code 
        examples which programmers should avoid! Treat it 
        as reverse education.</Yahoo:Desc>
      <Yahoo:AuthorID>111</Yahoo:AuthorID>
    </Yahoo:Book>
  </Books>
</All>

枚举具有相同名称的元素

你可以使用 AsCollection 方法以 vector 的形式返回具有相同名称的同级元素。

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);

Element::collection_t vec = root[L"aa"][L"bb"][L"cc"].AsCollection();

for(size_t i=0;i<vec.size(); ++i)
{
    int n = vec.at(i).GetInt32(10);
}

AsCollection 的这种重载形式（如下）允许你指定一个谓词函数对象来确定要选择哪些元素。

typedef std::vector< Element > collection_t;
template<typename Predicate>
collection_t AsCollection(Predicate pred);

如果你使用的是 C++0x，你可以为谓词提供一个 lambda（也称为匿名函数），如下所示

Element::collection_t vec = root[L"aa"][L"bb"][L"cc"].AsCollection(
    [](Elmax::Element elem)->bool 
    { 
        if(elem.Attribute("Price").GetDouble() > 10.0 )
        {
            return true;
        }
        return false;
    }
);

枚举具有相同名称的子元素

你可以使用 GetCollection 方法以 vector 的形式获取具有相同名称的子元素。

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);

Element::collection_t vec = root[L"aa"][L"bb"].GetCollection(L"cc");

for(size_t i=0;i<vec.size(); ++i)
{
    int n = vec.at(i).GetInt32(10);
}

GetCollection 的这种重载形式（如下）允许你指定一个谓词函数对象来确定要选择哪些元素。

typedef std::vector< Element > collection_t;
template<typename Predicate>
collection_t GetCollection(const std::wstring& name, Predicate pred);

如果你使用的是 C++0x，你可以为谓词提供一个 lambda（也称为匿名函数），如下所示

Element::collection_t vec = root[L"aa"][L"bb"].GetCollection(
    L"cc",
    [](Elmax::Element elem)->bool 
    { 
        if(elem.Attribute("Price").GetDouble() > 10.0 )
        {
            return true;
        }
        return false;
    }
);

查询元素子元素的数量

要查询每个名称的子元素数量，可以使用 QueryChildrenNum 方法。

using namespace Elmax;
Element root;
root.SetConverter(NORMAL_CONV);
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|qq"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);
Element elem4 = root[L"aa|bb|qq"].CreateNew();
elem4.SetInt32(44);
Element elem5 = root[L"aa|bb|cc"].CreateNew();
elem5.SetInt32(55);

Element::available_child_t acmap = 
    root[L"aa"][L"bb"].QueryChildrenNum();

assert(acmap[L"cc"] == (unsigned int)(3));
assert(acmap[L"qq"] == (unsigned int)(2));

还有一个重载形式（如下）的 QueryChildrenNum，它在返回之前不创建临时的 vector。注意：QueryChildrenNum 只能查询元素，不能查询属性、CData 部分或注释。

typedef std::map< std::wstring, size_t > available_child_t;
bool QueryChildrenNum(available_child_t& children);

避免创建临时元素的快捷方式

在之前的枚举示例中，我使用了

Elmax::Element elem1 = root[L"aa|bb|cc"].CreateNew();

而不是

Elmax::Element elem1 = root[L"aa"][L"bb"][L"cc"].CreateNew();

因为第二种形式会在栈上创建临时元素 aa 和 bb，而它们并未被使用。第一种形式节省了一些繁琐的输入，并且只在重载的 [] 运算符中返回一个元素，更不用说它也更快了。\\ 和 / 也可以用作分隔符。为了提高速度，请使用下面的代码，它过度使用了临时对象

if(root[L"aa"][L"bb"][L"cc"][L"dd"].Exists())
{
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"Title"] = L"Beer jokes";
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"Author"] = L"The joker";
    root[L"aa"][L"bb"][L"cc"][L"dd"][L"Price"] = 10.0f;
}

你可以将其赋值给一个 Element 变量，然后改用该变量。

Elmax::Element elem1 = root[L"aa|bb|cc|dd"];
if(elem1.Exists())
{
    elem1[L"Title"] = L"Beer jokes";
    elem1[L"Author"] = L"The joker";
    elem1[L"Price"] = 10.0f;
}

根元素

当你对元素调用 SetDomDoc 时，根元素就会被创建。你现在应该知道 [] 运算符用于访问子元素。对于根元素，[] 运算符会访问自身，以查看其名称是否与 [] 运算符中的名称匹配。

Element root;
root.SetDomDoc(pDoc);

Element elem1 = root[L"aa|bb|cc"];

上面示例中的 aa 元素实际上指的是根，而不是根的子元素。如果元素不是通过 SetDomDoc 调用创建的，那么 aa 指的是它的子元素。使用 [] 运算符时，请记住在（宽）字符串字面量前加上 L，例如 elem[L"Hello"]，否则你会得到一个奇怪的、无用的错误。Element 对象是直接或间接从根创建的。例如，根创建 aa 元素，而 aa 元素具有创建其他元素的能力。如果你实例化你的元素不是从根开始，那么你的元素就无法创建。这是 Microsoft XML DOM 的限制，只有 DOM 文档才能创建节点。那些直接或间接从根创建的元素已经获得了它们的 DOM 文档，因此也就获得了创建元素的能力。

RootElement 是 PJ Arends 贡献的辅助类，用于消除调用 SetConverter 和 SetDomDoc 的需要。它继承自 Element 类。要成功构建此类，你需要下载并构建 C++ Boost FileSystem 库。如果用户决定不使用 RootElement 类，他/她可以从 Elmax 项目中排除此类，并且不包含 FileSystem 库。

RootElement 构造函数接受一个文件路径，如果文件存在则将 XML 文件加载到 DOM 中。否则，文档就是一个空的 DOM。当用户调用 SaveFile 时，他/她可以选择指定一个新路径，或者使用构造函数中的路径。

using namespace Elmax;
std::wstring path = L"D:\\temp.xml";
RootElement root(path); // load the file if it exists

// Use RootElement like other elements
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem["dd"].SetBool(true);

root.SaveFile();

如果用户使用普通的 Element 作为根，等效的 win32 代码可能是

MSXML2::IXMLDOMDocumentPtr pDoc;

HRESULT hr = pDoc.CreateInstance(__uuidof(MSXML2::DOMDocument30));
if (SUCCEEDED(hr))
{
    pDoc->async = VARIANT_FALSE;
    pDoc->validateOnParse = VARIANT_FALSE;
    pDoc->resolveExternals = VARIANT_FALSE;
}

std::wstring path = L"D:\\temp.xml";

DWORD       fileAttr;
fileAttr = GetFileAttributes(fileName);
if (0xFFFFFFFF != fileAttr) // file exists
    pDoc->LoadXml(path)

using namespace Elmax;

Element root;
root.SetConverter(REGEX_CONV);
root.SetDomDoc(doc);

// Use RootElement like other elements
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem[L"dd"].SetBool(true);

variant_t varFile(path.c_str());
pDoc->save(varFile);

HyperElement 连接

某些 XML 元素根据某些标准与其他 XML 元素相关。你可以使用 HyperElement 类将它们连接起来。HyperElement 只包含 static 方法。

static std::vector< std::pair<Elmax::Element, Elmax::Element> >
	JoinOneToOne(
		std::vector<Elmax::Element>& vecElem1,
		const std::wstring& attrName1,
		std::vector<Elmax::Element>& vecElem2,
		const std::wstring& attrName2,
		bool bCaseSensitive);
		
static std::vector< std::pair<Elmax::Element, std::vector<Elmax::Element> > >
    JoinOneToMany(
        std::vector<Elmax::Element>& vecElem1,
        const std::wstring& attrName1,
        std::vector<Elmax::Element>& vecElem2,
        const std::wstring& attrName2,
        bool bCaseSensitive);

第 1 个方法 JoinOneToOne 接受 2 个元素向量（由 AsCollection 和 GetCollection 返回），并对指定属性名称的值进行文本比较。如果属性名称为空，则使用元素的值代替。还有一个类似的 JoinOneToMany 方法，我在这里不详述；参数相似。这些函数在仅仅进行文本相等比较时非常有用，但如果你需要比较浮点值或进行更复杂的比较，提供一个谓词可能是最好的选择。幸运的是，有一个重载方法接受一个谓词函数。下面是 JoinOneToMany 的示例，用于从提供的 XML 文件 Books.xml 中查找科幻作家所写的书籍。

template<typename DoubleElementPredicate>
static std::vector< std::pair<Elmax::Element, Elmax::Element> >
    JoinOneToOne(
    std::vector<Elmax::Element>& vecElem1,
    std::vector<Elmax::Element>& vecElem2,
    DoubleElementPredicate pred);
	
template<typename DoubleElementPredicate>
static std::vector< std::pair<Elmax::Element, std::vector<Elmax::Element> > >
    JoinOneToMany(
    std::vector<Elmax::Element>& vecElem1,
    std::vector<Elmax::Element>& vecElem2,
    DoubleElementPredicate pred);

<All>
  <Version>1</Version>
  <Books>
    <Book ISBN="1111-1111-1111">
      <Title>2001: A Space Odyssey</Title>
	  <Price>12.990000</Price>
	  <AuthorID>111</AuthorID>
	</Book>
	<Book ISBN="2222-2222-2222">
	  <Title>Rendezvous with Rama</Title>
	  <Price>15.000000</Price>
	  <AuthorID>111</AuthorID>
	</Book>
	<Book ISBN="3333-3333-3333">
	  <Title>Foundation</Title>
	  <Price>10.000000</Price>
	  <AuthorID>222</AuthorID>
	</Book>
	<Book ISBN="4444-4444-4444">
	  <Title>Currents of Space</Title>
	  <Price>11.900000</Price>
	  <AuthorID>222</AuthorID>
	</Book>
	<Book ISBN="5555-5555-5555">
	  <Title>Pebbles in the Sky</Title>
	  <Price>14.000000</Price>
	  <AuthorID>222</AuthorID>
	</Book>
  </Books>
  <Authors>
    <Author Name="Arthur C. Clark" AuthorID="111">
	  <Bio>Sci-Fic author!</Bio>
	</Author>
	<Author Name="Isaac Asimov" AuthorID="222">
	  <Bio>Sci-Fic author!</Bio>
	</Author>
  </Authors>
</All>

DebugPrint dp;
MSXML2::IXMLDOMDocumentPtr pDoc;
std::wstring strFilename = L"Books.xml";
HRESULT hr = CreateAndLoadXml(pDoc, strFilename);
if (SUCCEEDED(hr))
{
    using namespace Elmax;
    using namespace std;
    Element root;
    root.SetConverter(NORMAL_CONV);
    root.SetDomDoc(pDoc);

    Element all = root[L"All"];
    if(all.Exists()==false)
    {
        dp.Print(L"Error: root does not exists!");
        return;
    }
    Element authors = all[L"Authors"];
    auto vec = HyperElement::JoinOneToMany(authors.GetCollection(L"Author"), 
		books.GetCollection(L"Book"), 
        [](Elmax::Element x, Elmax::Element y)->bool 
        { 
            if(x.Attribute("AuthorID").GetString("a") == y[L"AuthorID"].GetString("b") )
            {
                return true;
            }
            return false;
        });

    for(size_t i=0; i< vec.size(); ++i)
    {
        dp.Print(L"List of books by {0}\n", 
		vec[i].first.Attribute(L"Name").GetString(""));
        dp.Print(L"=============================================\n");
        for(size_t j=0; j< vec[i].second.size(); ++j)
        {
            dp.Print(L"{0}\n", vec[i].second[j][L"Title"].GetString("None"));
        }
        dp.Print(L"\n");
    }
}

这是上面代码的输出

List of books by Arthur C. Clark
=============================================
2001: A Space Odyssey
Rendezvous with Rama

List of books by Isaac Asimov
=============================================
Foundation
Currents of Space
Pebbles in the Sky

这些 HyperElement 方法只是非常简单的函数，通过循环比较两个向量中的元素。

多线程中的共享状态

你可能在不同的线程中使用不同的 Elmax Element 对象，而不会在线程之间共享它们。但是，Element 具有所有 Element 对象共享的 static 类型转换器对象。要解决这个问题，请分配一个新的类型转换器并在根中使用它。请记住使用后删除转换器。

using namespace Elmax;
Element root;
root.SetDomDoc(pDoc);
RegexConverter* pRegex = new RegexConverter();
root.SetConverter(CUSTOM_CONV, pRegex);

顺便说一句，你需要记住在你的工作线程中调用 CoInitialize/CoUninitialize！

将文件内容保存为 XML

你可以调用 SetFileContents 来以 Base64 格式将文件的二进制内容保存在 Element 中。如果你打算将内容保存回具有相同名称的文件，你可以指定将文件名和文件长度保存在属性中。我们还需要保存原始文件长度，因为 GetFileContents 在 Base64 转换后有时会报告更长的长度！

bool SetFileContents(const std::wstring& filepath, 
                     bool bSaveFilename, bool bSaveFileLength);

我们使用 GetFileContents 从 Base64 转换中获取文件内容。filename 会被写入，前提是你指定在 SetFileContents 中保存文件名。length 是返回的字符数组的长度，而不是已保存的文件长度属性。

char* GetFileContents(std::wstring& filename, size_t& length);

属性

要创建属性（如果不存在）并为其赋值 string，请参阅以下示例

book1.Attribute(L"ISBN") = L"1111-1111-1111";

要创建具有命名空间 URI 的属性并为其赋值字符串，你必须显式创建它。

book1.Attribute(L"Yahoo:ISBN").Create(L"http://www.yahoo.com");
book1.Attribute(L"Yahoo:ISBN") = L"1111-1111-1111";

要删除属性，请使用 Delete 方法。

book1.Attribute(L"ISBN").Delete();

要了解属性名称是否存在，请使用 Exists 方法。

bool bExists = book1.Attribute(L"ISBN").Exists();

Attribute setter 和 accessor 的列表与 Element 的相同。并且它们使用相同的类型转换器。

注释

供你参考，XML 注释的形式为 。以下是一些你可以与注释一起使用的操作

using namespace Elmax;
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem.AddComment(L"Can you see me?"); // add a new comment!

Comment comment = elem.GetComment(0); // get comment at 0 index

comment.Update(L"Can you hear me?"); // update the comment

comment.Delete(); // Delete this comment node!

你可以使用 GetCommentCollection 方法获取作为元素子元素的 Comment 对象向量。

CData 部分

供你参考，XML CData 部分的形式为 <![CDATA[<IgnoredInCDataSection/>]]>。XML CData 部分通常包含不被解析器解析的数据，因此它可以包含 < 和 > 以及其他无效的文本字符。有些程序员喜欢将其存储为 Base64 格式（参见下一节）。以下是一些你可以与 CData 部分一起使用的操作

using namespace Elmax;
Element elem = root[L"aa"][L"bb"][L"cc"].CreateNew();
elem.AddCData(L"<<>>"); // add a new CData section!

CData cdata = elem.GetCData(0); // get CData section at 0 index

cdata.Update(L">><<"); // update the CData section

cdata.Delete(); // Delete this CData section node!

你可以使用 GetCDataCollection 方法获取作为元素子元素的 CData 对象向量。

Base64

有些程序员更喜欢将二进制数据以 Base64 格式存储在元素下，而不是在 CData 部分，以便于识别和查找。缺点是 Base64 格式占用更多空间，数据转换也需要时间。此代码示例展示了如何在赋值前使用 Base64 转换，以及在读取后如何从 Base64 转换回二进制数据

Elmax::Element elem1;
string strNormal = "@#$^*_+-|\~<>";
// Assigning base64 data
elem1 = Element::ConvToBase64(strNormal.c_str(), strNormal.length());

// Reading base64 data
wstring strBase64 = elem1.GetString(L"ABC");

size_t len = 0;
// Get the length required
Element::ConvFromBase64(strBase64, NULL, len);

char* p = new char[len+1];
memset(p, 0, len+1);

Element::ConvFromBase64(strBase64, p, len);
// process p here (not shown)(Remember to delete p).

结构化数据

StrUtil 类在 StringUtils 文件夹中提供了方便的字符串格式化（Format 方法）（类似于 .NET 的 String.Format）和将字符串分割（Split 方法）回基本类型的方法。

字符串格式化

格式化主要通过 Format 方法完成。下面是一个格式化矩形信息的示例。

using namespace Elmax;
Element elem;

StrUtil util;
Rectangle rect(10, 20, 100, 200);
elem.SetString( util.Format(L"{0}, {1}, {2}, {3}", rect.X(), rect.Y(), rect.Width(), rect.Height()) );
// elem now contains "10, 20, 100, 200"

字符串分割

在分割字符串之前，需要先选择一种策略。除了 C 字符串 strtok，Boost splitter 和正则表达式也可作为替代字符串分割策略。

using namespace Elmax;
Element elem;

StrUtil util;
StrtokStrategy strTok(L", ");
util.SetSplitStrategy(&strTok); // set a splitting strategy

int x = 0;
int y = 0;
int width = 0;
int height = 0;
util.Split( elem.GetString(L""), x, y, width, height ); 

Rectangle rect(x, y, width, height); // instaniate a rectangle using the splitted data.

聚合

提供了以下 SQL 聚合函数：Count、Minimum、Maximum、Sum 和 Average。在性能方面，最好还是将值读入自定义数据结构中自行聚合。这些函数用于将来的 LINQ 式功能。它们并非供最终用户使用，但如果你想将它们用于聚合，它们就在那里。缺少的是 GroupBy 功能。OrderBy 由 Sort 方法处理。

Count、Minimum、Maximum、Sum 和 Average

Minimum、Maximum、Sum 和 Average 函数接受两个字符串参数。第一个参数是子元素的名称，第二个参数是属性名称。如果用户不想聚合属性值，可以将第二个参数留空。Min、Max、Sum 和 Avg 返回 64 位整数结果，而 MinF、MaxF、SumF 和 AvgF 返回 32 位浮点值，MinD、MaxD、SumD 和 AvgD 返回 64 位浮点值。

using namespace Elmax;
Element elem = root[L"aa"][L"bb"]; // bb contains a group of cc elements

// cc are the children of bb, dd is child element of cc, whose value will be use for maximum search.
__int64 nMax = elem.Max(L"cc|dd", L"attr");

这是从中获取最大值的 XML。

<aa>
  <bb>
    <cc><dd>55</dd></cc>
    <cc><dd>33</dd></cc>
    <cc><dd>22</dd></cc>
  </bb>
</aa>

Count 函数接受一个 string 参数和一个谓词参数，并返回一个无符号整数。谓词决定元素是否应计入总数。

using namespace Elmax;
Element root;

Element elem1 = root[L"aa|bb|cc"].CreateNew();
elem1.SetInt32(11);
Element elem2 = root[L"aa|bb|cc"].CreateNew();
elem2.SetInt32(22);
Element elem3 = root[L"aa|bb|cc"].CreateNew();
elem3.SetInt32(33);

Pred pred;
unsigned int cnt = root[L"aa"][L"bb"].Count(L"cc",  pred);

// Predicate definition

struct Pred : public std::unary_function<Elmax::Element, bool>
{
    bool operator() (Elmax::Element& ele) 
    {
        if(ele.GetInt32(0)<33)
            return true;

        return false;
    }
};

Sort

排序通过 Sort 方法完成。第一个参数是子元素的名称，第二个参数是比较谓词，可以是一个 lambda。由于 MSXML 的限制，这不是原地排序。原地排序只能在 XML 节点真正由 Elmax 管理的跨平台 Elmax 中实现。

using namespace Elmax;
Element elem;

collection_t vec = 
elem.Sort(L"cc", [](Element elem1, Element elem2) -> bool
{
    return elem1[L"dd"].GetInt32(0) < elem2[L"dd"].GetInt32(0);
});

C++0x 移动构造函数

Elmax 库定义了一些 C++0x 移动构造函数和移动赋值运算符。为了在 2010 年之前的旧 Visual Studio 版本中构建该库，你必须在 stdafx.h 中定义 _HAS_CPP0X 为 0 来隐藏它们。

在 Visual C++ 8.0 (Visual Studio 2005) 中构建

Elmax 可在 Visual Studio 2005/2008/2010 中构建。但是，要在 Visual Studio 2005 中构建 Boost Regular Expression 库需要额外的努力。直到两周前我的电脑坏了，我才意识到用户在 Visual C++ 8.0 中构建 Elmax 是多么困难；旧电脑有预编译的 Boost regex 库，而新电脑没有。要为你的电脑获取 regex 库，你需要下载 Boost，并将 Boost 的 regex 包含文件夹路径添加到 Visual Studio VC++ 目录中，并根据 Boost 文档构建 Boost regex 库。对于 Visual Studio 2008/2010 用户，Elmax 将使用 C++0x TR1 的 regex 库。

C# 库

Elmax 已移植到 C# 2.0，并添加了 87 个单元测试来测试 C# 库，使总单元测试数量达到 342 个。C# 库可以在 Visual Studio 2005/2008/2010 中构建。有计划将 Elmax 移植到 C# 4.0，以利用可选参数。有趣的是，C++ 版本大约有 7365 行代码，而 C# 版本只有 2622 行！C++ 版本有 10 个类，而 C# 只有 4 个类，因为辅助类已经实现在 .NET BCL 库中，所以我无需编写它们。由于类较少，因此需要较少的单元测试来测试 C# 版本（87 个对比 255 个）。仅比较 Element.cpp 和 Element.cs，Element.cpp 有 2200 行代码，而 Element.cs 有 1755 行！下面是用 C++ 编写元素的代码，后面是 C# 版本。你可以看到它们几乎相似。但是，C# 版本的 Elmax 没有隐式赋值和读取值类型。元素的赋值和读取必须通过 setter 和 accessor 来完成，例如 SetFloat 和 GetFloat 方法。

// C++ version of writing element
using namespace Elmax;
...
Element root;
root.SetDomDoc(pDoc); // A empty DOM doc is initialized beforehand.
root[L"Books"][L"Book"][L"Price"].SetFloat(12.99f);

// C# version of writing element
using Elmax;
...
Element root = new Element();
root.SetDomDoc(doc); // A empty DOM doc is initialized beforehand.
root["Books"]["Book"]["Price"].SetFloat(12.99f);

下面是用 C++ 读取元素的代码，后面是 C# 版本

// C++ version of reading element
using namespace Elmax;
...
Element root;
root.SetDomDoc(pDoc); // A XML file is read into the DOM doc beforehand.
Element elemPrice = root[L"Books"][L"Book"][L"Price"];
if(elemPrice.Exists())
    float price = elemPrice.GetFloat(10.0f);

// C# version of reading element
using Elmax;
...
Element root = new Element();
root.SetDomDoc(doc); // A XML file is read into the DOM doc beforehand.
Element elemPrice = root["Books"]["Book"]["Price"];
if(elemPrice.Exists)
    float price = elemPrice.GetFloat(10.0f);

未来发展

版本 1 已完成所有功能。下一版本的工作已经开始，跨平台 Elmax 将会从头开始重写，不使用任何 Microsoft 特定技术。版本 1 和版本 2 将同时开发和维护。版本 1 和版本 2 的代码库将在某个时间点合并，这意味着版本 1 最终将放弃 MSXML 并使用版本 2 中的跨平台解析器，但版本 1 将继续为某些 Microsoft 数据类型（如 GUID 和 MFC CString）提供方便的访问器和修改器。

Bug 报告

有关 Bug 报告和功能请求，请在此处提交：这里。提交 Bug 报告时，请包含示例代码和 XML 文件（如果适用）以重现 Bug。当前的 Elmax 版本是 0.65 beta。Elmax CodePlex 站点位于 http://elmax.codeplex.com/。

历史

28/06/2022
- 已移除 Boost lexical_cast。
- 已从 MSXML2::Document30 更改为 MSXML2::Document60。
26/11/2013
- 已更新源代码以使用 Boost lexical_cast。
23/08/2012
- 添加了以下章节：结构化数据 | 聚合 | 未来发展
19/04/2012
- 已更新源代码以包含 RootElement 类的 C# 版本。已更新根元素章节。
10/04/2012
- 已更新包含 PJ Arends 的 RootElement 类和 Elmax.h 头文件的源代码。
09/04/2012
- 已将源代码更新到最新版本。
21/05/2011
- 添加了一个 HyperElement 章节
11/02/2011:
- 将库移植到 C#
- 添加了 Visual Studio 2008 解决方案和项目
- 添加了一个关于如何在 Visual Studio 2005 中构建 Elmax 的文章章节
10/01/2011
- 已更新源代码（版本 0.65 beta），以使用 ATL Base64 实现，并通过使用有效的 static 类型转换器对象，更改了 getter 和 setter 方法，使其不因 null 类型转换器指针而引发异常。
26/12/2010
- 添加了一个 TestRead 代码片段，演示如何读取 XML 文件
- 已更新源代码
24/12/2010
- 已更新 VC2005 代码以使用 Boost regex，而不是较新 VC 中包含的 TR1 regex。
- 增加了访问 XML 中的 Date、GUID 和文件内容的功能
23/12/2010
- 首次发布