帮助系统自动化
Word 文档自动化
引言
这个简单的程序展示了如何使用现有的 Word 文档生成一个帮助系统。程序会生成 HTML 文件和 XML 文件,这些文件将被添加到 Web 项目中。
在本文中,我只给出主要的思路。更多细节,请查看源代码和示例 Word 文档。
本文适合哪些读者
本文适合想要开始使用 Microsoft Word 自动化程序进行开发的开发者。
Microsoft Word 2000
这个程序基于使用 Word 文档中的样式和格式,以便以后将其转换为 XML 文件。
我使用了 Word 应用程序的 Word DLL,请在您的项目中引用它。
(Microsoft Office 11.0 Object Library)
项目详情
Word 2000 具有格式和样式,如 (TOC, TOCEntry, Heading, ...) 我使用了这些样式来通过我的程序进行自动化处理。
因此,如果用户想使用我的程序,她/他必须使用样式。
这个程序生成两个 XML 文件
- 目录
- 文档
将文档 XML 文件转换为 HTML 文件,并将目录 XML 文件用作树形或任何导航控件中的 DataSource。
使用库 Word Part 1
添加对 Word 文档 Microsoft Word 11.0 对象库的引用以供使用。
请看 WordApp.cs。
添加引用
Word = Microsoft.Office.Interop.Word;
我使用了...
Word.ApplicationClass wordApplication;
... 以便访问 Word 文档属性和文本等。
String WordFilePath ;//this is the path of your document
打开 Word 文档
//------ var
private Word.Document doc;
private Word.Paragraphs DocParagraphs;
public String WordFilePath;
private Word.InlineShapes Inshapes;
///-------
这将打开 Word 文档并使用 doc
对象。
wordApplication = new Word.ApplicationClass();
object o_nullobject = System.Reflection.Missing.Value;
object o_filePath = WordFilePath;
object tru = false;
object tr = true;
wordApplication.Visible = false;// make Microsoft Word work in background
doc = wordApplication.Documents.Open(ref o_filePath,
ref o_nullobject, ref tr, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,
ref tru, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);
获取内联形状
public Word.InlineShapes getInlineDocumentShape()
{
foreach (Word.Shape W in doc.Shapes)
{
W.ConvertToInlineShape();
}
Word.InlineShapes ishape;
ishape = doc.InlineShapes;
Inshapes = ishape;
return Inshapes;
}
获取 Word 段落
public Word.Paragraphs getDocumentParagraphs()
{
return DocParagraphs =doc.Paragraphs ;
}
现在转换为 XML
更多详情请看 DocumentParser.cs。
{
TableOfContent.xml :for table of content
Document.xml : for word document paragraphs and Images
public void ParsToXml() {...}
XmlTextWriter tocWriter;//table of content writer
XmlTextWriter parWriter;//paragraph writer
//---
单词
Paragraphs pars = getDocumentParagraphs(); //to get word paragraphs Word.
InlineShapes inShapes = getInlineDocumentShape(); //to get word images
现在开始循环每个段落以获取样式和文本。
段落样式
- 标题:文档中的每个主题都以
Heading(N)
开头 - N = 1 主要主题
- N>1 子主题
- TOC:目录中的每个主题都以
TOC(N)
开头 - N = 1 主要主题
- N>1 子主题
- ImageStyle:文档中的每个图像都具有此样式。
for(index = 1; index < pars.Count; index++) { style = ((Word.Style)pars[index].get_Style()).NameLocal;
目录的格式和样式
if(style.StartsWith("TOC ")) //this style of table of content
每个主题都以样式 [TOC ]
开头。
示例
[TOC1]
1.简介[TOC2]
1.1 作者[TOC2]
1.2 关于[TOC3]
1.2.1 关于书籍
{ ..获取当前级别和下一级别之间的差异}
目录的示例文档 XML 文件如下
<TableOfContent>...
<Topic level="4" name="INTERFACE REQUIREMENTS" page="6">
<Topic level="4.1" name="User Interfaces" page="6">
<Topic level="4.1.1" name="Accessibility" page="6" />
<Topic level="4.1.2" name="System messages" page="6" />
<Topic level="4.1.3" name="Paging" page="7" />
<Topic level="4.1.4" name="Data lists and Data grids" page="7" />
</Topic>
<Topic level="4.2" name="Hardware Interfaces" page="8" />
<Topic level="4.3" name="Software Interfaces" page="8">
<Topic level="4.3.1" name="Operating Platform" page="8" />
<Topic level="4.3.2" name="Storage engine" page="8" />
<Topic level="4.3.3" name="External data sources" page="8" />
</Topic>
...<TableOfContent>
如果样式是 ImageStyle
InlineShapes inShapes = getInlineDocumentShape(); to get inline shape from document
//mindex :index of inline shape in document
inShapes[mindex].Select(); //make the select to copy in clipboard
wordApplication.Selection.CopyAsPicture();
获取段落的单词
如果样式是标题
Word.Words words;
words = pars[index].Range.Words;//take words of paragraph
检查单词是否具有列表类型
for (windex = 1; windex <= words.Count; windex++)
{//check
if (words[windex].FormattedText.ListFormat.ListType.ToString() == "wdListNoNumbering")
{....check format for each word and write it to xml Text node
using FormatingFunction(,)
public String FormatingFunction(Word.Words obj, int index)
{
if (index > obj.Count)
{
return "";
}
String fr = "";
if (obj[index].Bold.ToString() == "-1")
{fr = "Bold";}
if (obj[index].Italic.ToString() == "-1")
{if (fr != ""){
fr += "," + "Italic";}
else
{fr = "Italic";}
}//wdUnderlineSingle//WdUnderLineNone
if (obj[index].Underline.ToString() == "wdUnderlineSingle")
{if (fr != ""){
fr += "," + "UnderLine";}
else{fr = "UnderLine";
} }
return fr;}
else
{...write it in list node..}
}
} //---------
<Text Format="">is </Text>
<Text Format="Italic">Performance Management System </Text>
<Text Format="">that helps you collect different measures and make faster and smarter
decisions through a set of user friendly customizable dashboards and scorecards
targeted for each and every member of your organization.</Text>
<Text Format="Italic" />
</Paragraph>
</Topic>
<Topic Name="Product Features" Level="2.2">
<Paragraph>
<Text Format="" />
<Image src="2.21">j7B/wBND+VFFAB/Z4/56fpR9g/6aH8qKKAD7AP+eh/KroGABRRQB//Z</Image>
<Paragraph>
<Text Format="">The figure above provides a high level vision of </Text>
<Text Format="Bold">Cub </Text>
<Text Format="">solution. The vision includes the idea of hiding the complexity of
creating ETL (Extract, Transform & Load) processes, a data warehouse and an
OLAP database for analysis from the end user.
</Text>
</Paragraph>
<Paragraph>
<Text Format="" />
<Paragraph>
<Text Format="">It will provide end users with a sub-set of the features offered by
the underlying systems, taking into account the ability to extend this set
in future releases. As well as linking with existing DW and OLAP database
provided as part of an implementation service.
</Text>
</Paragraph>
</Paragraph>
</Paragraph>
</Topic>
将 XML 转换为 HTML
我构建了一个 HTML 转换器,用于将 XML 节点转换为 HTML。请查看 HtmlConvertor.cs。
未来计划
我将对本文进行更多解释。
请期待未来的文章
- 动态在线灵活的 GridView
- SpyWare
历史
- 2007年9月26日: 初步发布