AnyDataFileToXmlConverter 类/实用工具

Chris Hambleton

4.75/5 (13投票s)

2006年12月1日

3分钟阅读

69058

1137

一个类/实用工具，可以将各种格式的数据文件转换为 .NET DataSet 兼容的 XML

下载源文件 - 237.8 KB

Sample Image - csv.jpg

引言

AnyDataFileToXmlConverter 实用工具是一个 Windows 应用程序，可以读取和转换各种格式的数据文件，例如 Access 数据库、Excel 电子表格、CSV 文件、制表符分隔文件等，转换为 .NET 中使用的标准 DataSet 兼容 XML 格式。

背景

几个月前，我有一个项目需要收集、验证和处理各种格式的数据文件。由于数据文件的验证/处理方式会随着时间的推移而频繁更改，并且需要标准化处理格式，所以我决定使用 .NET DataSet 兼容的 XML 格式，然后使用 XSLT 来处理/重新格式化数据。许多文件格式可以自动转换为这种文件格式（如 Excel 电子表格和 Microsoft Access 数据库），但许多其他文件格式（如制表符分隔、逗号分隔、管道分隔的文本文件）需要重新处理为 .NET DataSet 兼容的 XML 格式。主要类和这个实用工具是这项工作的结果。

使用应用程序

AnyDataFileToXmlConverter 实用工具快速且易于使用 - 只需在实用工具中打开/加载文件，文件将自动处理为 .NET XML 格式。对于 Excel 电子表格，将枚举工作簿中的工作表，并且必须选择一个工作表才能处理为 XML。在 Microsoft Access 数据库中，必须在处理之前指定要转换为 XML 的特定数据的查询。

实用工具中提供了几个示例数据文件，这些文件演示了可以重新处理为 XML 的各种常见数据文件格式。还有其他可以处理的文件格式（如管道分隔文件），并且可以轻松修改该类以处理其他格式。可以将 XML 文件加载到该实用工具中，但由于它们已经处于最终结果格式，因此不会执行任何处理。

此外，还有一个可选的数据清理功能，可以自动删除由于处理包含空行或已清除行的 Excel 电子表格而在电子表格末尾通常创建的任何“垃圾”XML 节点。

您还可以将输出/结果显示从 XML 更改为网格，以便更轻松地查看和排序。

数据文件处理示例

逗号分隔文件

csv

制表符分隔文件

tabbed

Excel 电子表格

excel

Microsoft Access 数据库

access

AnyDataFileToXmlConverter 引擎 - 工作原理

AnyDataFileToXmlConverter 实用工具的主要引擎/处理器是 RawFileConverter 类，该类使用加载文件的文件扩展名来确定使用哪个处理函数来重新处理该文件。对于大多数文件格式（文本文件除外），Microsoft.Jet.OLEDB 提供程序用于自动将文件加载到 DataSet 中，然后从 DataSet 中检索 XML。

对于文本文件，会评估文件内容以查找数据列之间可能存在的字符分隔符，然后根据字符分隔符将文件的每一行拆分为列。创建一个 DataTable，然后将文件中的所有数据加载到该表中，文件中的每一行都转换为一个 DataRow，每个字段都加载到该 DataRow 中的一个列中。将整个文件处理到 DataTable 后，从 DataTable 的 DataSet 对象中检索 XML。

/// <summary>
/// Converts the specified TEXT file to it's equivalent XmlDocument
/// </summary>
/// <PARAM name="sFilePath"></PARAM>
/// <returns></returns>
private static XmlDocument ConvertTextFile(string sFilePath)
{
    XmlDocument xmlRaw = null;
    StreamReader oSR = null;             

    try
    {               
        DataSet dsTextFile = new DataSet();
        DataTable dtTextFile = new DataTable();
        DataRow drRows = null;

        // check and pre-process the text file if it's a non-standard text file
        sFilePath = PreprocessNonStandardFiles(sFilePath);

        // find the correct delimiter for the file 
        // (some files have multiple delimiting chars, but only one is correct)
        char chrDelimiter = GetDelimiterCharacter(sFilePath);

        //Open the file and go to the top of the file                
        oSR = new StreamReader(sFilePath);                
        oSR.BaseStream.Seek(0, SeekOrigin.Begin);

        // read the first line 
        string sFirstLine = oSR.ReadLine();
        bool bHeaderIsDataRow = false;

        // init the columns if the file has a valid, parsible header
        string[] sColumns = sFirstLine.Split(chrDelimiter);
        if(sColumns.Length > MINIMUM_NUMBER_CSV_COLUMNS)
        {                    
            bHeaderIsDataRow = InitializeTableColumns(sColumns, 
                               ref dtTextFile, true);
            if (bHeaderIsDataRow == true)
            {
                oSR.BaseStream.Seek(0, SeekOrigin.Begin);
                oSR.Close();
                oSR = new StreamReader(sFilePath);
                oSR.BaseStream.Seek(0, SeekOrigin.Begin);
            }
        }              

        // add in the Rows for the datatable/file
        dsTextFile.DataSetName = "NewDataSet";
        dsTextFile.Tables.Clear();
        dtTextFile.TableName = "Table";
        dsTextFile.Tables.Add(dtTextFile);

        // iterate thru the file and process each line
        while (oSR.Peek() > -1)
        {                    
            int iFieldIndex = 0;
            string sLine = oSR.ReadLine();
            string sLineTrimmed = sLine.Trim();
            string[] sLineFields = sLine.Split(chrDelimiter);

            if ((sLineFields.Length <= 0) || 
                (sLineTrimmed.Length < MINIMUM_NUMBER_CSV_COLUMNS))
            {
                continue;
            }

            // if the number of fields is less that the minimum, skip the field
            if (sLineFields.Length <= MINIMUM_NUMBER_CSV_COLUMNS)
            {
                continue;
            }

            // if we suddenly have more fields than columns, 
            // we're in a header or something, so re-init the columns
            if ((sLineFields.Length > dtTextFile.Columns.Count) &&
                (sLineFields.Length > MINIMUM_NUMBER_CSV_COLUMNS))
            {
                //note: bad data?! - header/inconsistent delimiting problems?  
                if (dtTextFile.Rows.Count <= 0)
                {
                    InitializeTableColumns(sLineFields, ref dtTextFile, false);
                }
            }

            drRows = dtTextFile.NewRow();
            foreach (string strField in sLineFields)
            {
                string sField = strField.Trim();
                sField = sField.Replace("\"", "");
                sField = sField.Replace("'", "");
                sField = sField.Replace("$", "");
                sField = sField.Replace("%", "");
                sField = sField.Replace("-0-", "0");
                sField = sField.Replace("&", "and");

                // header/inconsistent file delimiting problems?  
                if (dtTextFile.Columns.Count <= iFieldIndex)
                    break;

                drRows[iFieldIndex] = sField;
                iFieldIndex = iFieldIndex + 1;
            }
            
            dtTextFile.Rows.Add(drRows);
        }        

        // load the dataset to an xmldocument                
        xmlRaw = CleanupRawXml(dsTextFile.GetXml());                
    }
    catch (Exception ex)
    {
        throw new Exception("Error: ConvertTextFile", ex);
    }
    finally
    {
        oSR.Close();
    }

    return xmlRaw;
}

结论

希望您发现本文和实用工具很有用 - 它已经多次派上用场，并且核心类最近已合并到通用 FileDataProvider 类中，用作另一个 .NET 数据提供程序，如 SqlDataProvider 和 OracleDataProvider 提供程序。但那是另一篇文章的内容了。享受吧！

许可证

本文未附加明确的许可证，但可能在文章文本或下载文件本身中包含使用条款。如有疑问，请通过下面的讨论区联系作者。

作者可能使用的许可证列表可以在此处找到。