65.9K
CodeProject 正在变化。 阅读更多。
Home

批量转换 Linux ASCII 文件到 Windows

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.29/5 (5投票s)

2004年12月8日

6分钟阅读

viewsIcon

48518

downloadIcon

369

本文解释了如何使用拖放事件来过滤系统中的文件。它还将行尾从 Linux(或其他操作系统)转换为 Windows 风格的 \n(换行)\r(回车)行尾。

Sample Image

引言

本文是一个简单的关于拖放、文本文件 I/O 和批处理的练习。

背景

因此,我和我的业务伙伴正在将我们 Linux 服务器上的 BIND DNS 信息迁移到一个新的 Windows 2003 机器上。在 Windows DNS 的文档中,我们发现可以直接将 BIND 为我们的 DNS 区域创建的区域文件移动到 Windows DNS 目录,重命名它们,然后就可以了,你就拥有了一个 DNS 记录的副本。

正如所有听起来很简单的事情一样,它们通常并不简单。Windows 不喜欢 Linux BIND 文件的格式,因为它们只有 \n 作为换行/回车,而 Windows 需要更多。Windows 使用 \n\r 来表示换行。好吧,我知道市面上可能有一些工具可以将 Linux(和其他)ASCII 文件转换为 Windows 风格的 ASCII 文件;但是,我想,管他呢,我自己写一个。

于是,我开始编写这个小工具。花了大约 20 分钟,我决定将其作为我在 Code Project 的第一篇帖子。

代码

代码非常简单,但它解释了如何进行拖放操作和过滤文件。还包含了一些简单的正则表达式匹配,用于匹配 UTF-8 字符集中的字符。我对代码进行了大量的注释,所以应该不会有什么疑问。

我们首先需要的是一个拖放界面,因为我们不想手动打开文件,也不想使用文件对话框。所以我们创建了一个列表视图,包含三列:文件(名称)、大小(字节)和状态

然后,我们设置了列表视图的 DragEnterDragDrop 事件处理程序。

private void DropSpot_DragEnter(object sender, System.Windows.Forms.DragEventArgs e)
{
    // We only want to accept files, so we only set our DragDropEffects 
    // if that's what's being dragged in
    if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
    {
        e.Effect = DragDropEffects.All;
    }
}
ArrayList Files = new ArrayList();
private void DropSpot_DragDrop(object sender, 
                System.Windows.Forms.DragEventArgs e)
{
    // Get a list of all objects in the Drop Data, that are files
    string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
    // Iterate through the dropped files
    for (int i=0;i<files.Length;i++)
    {
        // Add the to our ArrayList
        Files.Add(files[i]);
        // Create our new List View item
        ListViewItem item = new ListViewItem();
        // Get a file info object
        // we use this for getting file size, etc.
        System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
        item.Text = System.IO.Path.GetFileName(fInfo.Name);
        item.SubItems.Add(fInfo.Length.ToString());
        item.SubItems.Add("Pending");
        FileListView.Items.Add(item);
        FileListView.Tag = Files[Files.Count-1];
    }
    // Refresh the file list - for good measure
    this.Refresh();
    // If we added files, clear the instruction label
    if (FileListView.Items.Count>0) label1.Visible = false;

}

在这两个处理程序中,您可以看到我们使用 DataFormats enum 进行过滤。我们只想接受 DataFormats.FileDrop 类型对象。这给了我们从文件夹拖动的对象,或者等效的系统对象。

第一个处理程序 DragEnter,只是设置 DragDropEffects,这将相应地更改鼠标光标,表明文件可以被拖放到此处。

顺便说一句,为了指导用户,我在列表视图顶部添加了一个标签,指示用户“将文件拖放到此处进行修复”。此标签也具有与上述相同的两个事件处理程序。当添加文件时,此标签会“消失”,当文件列表清除时,会重新出现。

接下来需要的是一个能够(尽可能准确地)确定文件是纯 ASCII 文本还是二进制的组件。

注意:根据定义,计算机上存储的所有文件都以二进制格式存储。我们试图确定的是文件是包含“文本”或“ASCII”的,而不是图像或其他非文本文件等二进制数据。经过对其他项目的广泛研究,我还没有找到一个 100% 准确的方法,但下面描述的方法相当准确。它要求文件中的字符都落在 UTF-8 字符集范围内。

进入代码。这个方法 VerifyAscii(string Buffer) 将接收输入缓冲区,并使用 C# 的正则表达式匹配,分块搜索文件,以确定这些块中的所有字符是否都符合 ASCII 标准。注意,RegEx 设置为 \xFF,可以更改为 \x80 以表示 7 位 ASCII 集。

private bool VerifyAscii(string Buffer)
{
    // Create Regex for matching only the Ascii Table
    System.Text.RegularExpressions.Regex R = 
            new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
    // The Size of the block that we want to analyze
    // Done this way for performance
    // Much overhead (depending on size of file) to Regex the whole thing
    int BlockSize = 10;
    // Our Iteration variables
    int Start;
    int Len;
    string Block;
    System.Text.RegularExpressions.MatchCollection matchColl;
    // Iterate through our buffer
    for (int i=0;i<(Buffer.Length/BlockSize);i++)
    {
        // Starting Point for this iteration
        Start = (i*5);
        // Ternerary operator used to assign length of this block
        // we don't want to overshoot the end of the string buffer
        Len = (Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
        // Get our block from the buffer
        Block  = Buffer.Substring(Start,Len);
        // Run our Regex, and get our match collection
        matchColl = R.Matches(Block);
        // If our match count is less that the length of the string,
        // we know that we have characters outside of the ascii table
        if (matchColl.Count<Len)
        {
            // Return false, this buffer could not be
            // evaluated as Ascii Only
            return false;
        }
    }
    // No bad charaters were found, 
    // so all characters are within the ascii table
    // Return true
    return true;
}

正在发生什么:为了性能,缓冲区被分成小块。对一个典型系统日志文件大小的字符串执行正则表达式会花费很长时间,并且开销很大。将此部分分成小块可以更快地处理正则表达式匹配,以及在每次迭代中重新创建字符串。

注意:我选择不在这里使用 StringBuilder,因为这只是一个用于转换大约 200 个 DNS 区域文件的快速实用程序;但在替代字符串缓冲区方面,这会是更好的选择。在 C# 中,当处理字符串时,您处理的是特殊类型。C# 将字符串视为值类型,但在后台,它们是对象。当您对字符串执行几乎任何操作,如连接或截断时,原始对象不会被修改,而是会创建一个新对象,移动数据,并销毁旧对象。这在密集型应用程序中会产生大量开销。

教训:在实际应用程序中执行大量字符串操作时,请使用 System.Text.StringBuilder,因为它就是为此目的而设计的。

接下来,我们有了我们的文件修复方法,恰当地命名为 RepairFile(string Path)

private bool RepairFile(string Path)
{
    // Create a file info object
    System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
    // If the file exists, proceed
    if (Fi.Exists)
    {
        // NOTE: Error trapping omitted for 
        // readability
        // You would want to trap the file operations in
        // a try / catch / finally block 
        // -----------------------------------------------
        // Create a StreamReader object 
        // We use a StreamReader because we are assuming 
        // that we are dealing with a text file
        System.IO.StreamReader S = Fi.OpenText();
        // Read the entire file -
        // NOTE: This would be better done using buffering
        // for performance, but for this example, I omitted it
        string FileBuffer = S.ReadToEnd();
        // Close our reader
        S.Close();
        // Call to our VerifyAscii method to ensure that
        // this is NOT a binary file
        if (VerifyAscii(FileBuffer))
        {
            // Split our buffer into lines
            string[] Lines = FileBuffer.Split('\n');
            // Create our StreamWriter
            // Again, using a streamWriter, since we are
            // dealing with Text
            System.IO.StreamWriter W = 
                   new System.IO.StreamWriter(Fi.OpenWrite());
            // Loop through our "Lines" and use the StreamWriter's WriteLine
            // Method to terminate the lines with the operating system
            // specific carriage return / line feed combination
            for (int i=0;i<Lines.Length;i++)
            {
                W.WriteLine(Lines[i].Trim());
            }
            // Close our writer
            W.Close();
            return true;
        }
        else
        {
            // Error Message for "non-ascii" files
            MessageBox.Show(Path+" \nDoes not Appear to be plain text. " + 
                       " No repair will be performed","File Format Error");
            return false;
        }
    }
    return false;
}

正在发生什么:这段代码本身就很好理解,在不清晰的地方,代码包含了大量的注释。总而言之,这个方法使用 StreamReader 对象打开由“path”指定的文件。我们使用 StreamReader 对象,因为我们假设读取的文件是文本,它比普通的 Stream 对象更适合此目的。

注意:为了简单起见,文件使用 StreamReader 对象的 ReadToEnd 方法被完整地读取。在实际应用中,为了应用程序的性能,最好分块读取文件。

文件加载后,首先将其发送到我们上面描述的 VerifyAscii 方法。如果 VerifyAscii 返回 true,则使用新行字符 (\n) 上的 String.Split 方法将缓冲区拆分成行。然后通过以 Write 模式打开原始文件来创建一个 StreamWriter 对象。出于保留目的,使用不同的文件进行输出可能更好,但我不在乎可能的损坏,因为这些已经是副本了。

然后,我们遍历 Lines[] Array 中的每一行,并将其写回原始文件。通过使用 StreamWriter.WriteLine(),我们用特定于操作系统的行终止字符来终止行。在 Windows 中,行由 \n\r(ASCII 13 + ASCII 10)终止。

整合

文件在 DragDrop 事件处理程序中被拖动后,会被添加到 ListView 中。要开始“修复”,请点击“Go”按钮,处理过程就开始了。

private void btnGo_Click(object sender, System.EventArgs e)
{
    // Iterate through our files list
    for (int i=0;i<Files.Count;i++)
    {                
        // If repair was successful, 
        // Mark the status column for this file complete,
        // otherwise mark it failed
        if (!RepairFile(Files[i].ToString()))
        {
            if (i<FileListView.Items.Count)
            {
                FileListView.Items[i].SubItems[2].Text = "Failed";
            }
        }
        else
        {
            if (i<FileListView.Items.Count)
            {
                FileListView.Items[i].SubItems[2].Text = "Complete";
            }
        }
    }
}

正在发生什么:早些时候,在我们的 DragDrop 事件处理程序中,文件被添加到我们的列表视图和 ArrayList。当我们的“Go”按钮被点击时,我们遍历存储在 ArrayList 中的对象(在本例中是文件路径)。每个项目都会依次传递给 RepairFile 方法,如果 RepairFile 返回 true,则对应的列表视图项目会被标记为“Complete”;否则,会显示错误消息,并将项目标记为“Failed”。

摘要

这里演示了使用拖放事件处理程序过滤系统中的拖放对象作为文件,使用 Regex 匹配 UTF-8 字符集,以及对文件进行批量处理。当然,代码还有改进的空间,但我把很多这些领域留给了您去发现,并保持代码的简单。

使用代码

这是完整的代码。它是在 Visual Studio 2003 中编写的,所以您应该只需类似地打开并构建它。

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;

namespace LinWinRepair
{
    /// <summary>
    /// Summary description for Form1.
    /// </summary>
    public class mainUI : System.Windows.Forms.Form
    {
        private System.Windows.Forms.Panel panel1;
        private System.Windows.Forms.Button btnClear;
        private System.Windows.Forms.Button btnGo;
        private System.Windows.Forms.ListView FileListView;
        private System.Windows.Forms.Label label1;
        private System.Windows.Forms.ColumnHeader columnHeader4;
        private System.Windows.Forms.ColumnHeader columnHeader5;
        private System.Windows.Forms.ColumnHeader columnHeader6;
        /// <summary>
        /// Required designer variable.
        /// </summary>
        private System.ComponentModel.Container components = null;

        public mainUI()
        {
            //
            // Required for Windows Form Designer support
            //
            InitializeComponent();

            //
            // TODO: Add any constructor code after InitializeComponent call
            //
        }

        /// <summary>
        /// Clean up any resources being used.
        /// </summary>
        protected override void Dispose( bool disposing )
        {
            if( disposing )
            {
                if (components != null) 
                {
                    components.Dispose();
                }
            }
            base.Dispose( disposing );
        }

        #region Windows Form Designer generated code
        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InitializeComponent()
        {
            this.panel1 = new System.Windows.Forms.Panel();
            this.btnClear = new System.Windows.Forms.Button();
            this.btnGo = new System.Windows.Forms.Button();
            this.FileListView = new System.Windows.Forms.ListView();
            this.label1 = new System.Windows.Forms.Label();
            this.columnHeader4 = new System.Windows.Forms.ColumnHeader();
            this.columnHeader5 = new System.Windows.Forms.ColumnHeader();
            this.columnHeader6 = new System.Windows.Forms.ColumnHeader();
            this.panel1.SuspendLayout();
            this.SuspendLayout();
            // 
            // panel1
            // 
            this.panel1.Controls.Add(this.btnClear);
            this.panel1.Controls.Add(this.btnGo);
            this.panel1.Dock = System.Windows.Forms.DockStyle.Bottom;
            this.panel1.Location = new System.Drawing.Point(15, 303);
            this.panel1.Name = "panel1";
            this.panel1.Size = new System.Drawing.Size(370, 40);
            this.panel1.TabIndex = 1;
            // 
            // btnClear
            // 
            this.btnClear.Anchor = ((System.Windows.Forms.AnchorStyles)
                 ((System.Windows.Forms.AnchorStyles.Bottom | 
                 System.Windows.Forms.AnchorStyles.Right)));
            this.btnClear.Location = new System.Drawing.Point(208, 5);
            this.btnClear.Name = "btnClear";
            this.btnClear.TabIndex = 6;
            this.btnClear.Text = "Clear";
            this.btnClear.Click += new System.EventHandler(this.btnClear_Click);
            // 
            // btnGo
            // 
            this.btnGo.Anchor = ((System.Windows.Forms.AnchorStyles)
                 ((System.Windows.Forms.AnchorStyles.Bottom | 
                 System.Windows.Forms.AnchorStyles.Right)));
            this.btnGo.Location = new System.Drawing.Point(288, 5);
            this.btnGo.Name = "btnGo";
            this.btnGo.TabIndex = 5;
            this.btnGo.Text = "Go";
            this.btnGo.Click += new System.EventHandler(this.btnGo_Click);
            // 
            // FileListView
            // 
            this.FileListView.AllowDrop = true;
            this.FileListView.Columns.AddRange(new 
                 System.Windows.Forms.ColumnHeader[] {
                        this.columnHeader4,
                        this.columnHeader5,
                        this.columnHeader6});
            this.FileListView.Dock = System.Windows.Forms.DockStyle.Fill;
            this.FileListView.Location = new System.Drawing.Point(15, 15);
            this.FileListView.Name = "FileListView";
            this.FileListView.Size = new System.Drawing.Size(370, 288);
            this.FileListView.TabIndex = 0;
            this.FileListView.View = System.Windows.Forms.View.Details;
            this.FileListView.DragDrop += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
            this.FileListView.DragEnter += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
            // 
            // label1
            // 
            this.label1.AllowDrop = true;
            this.label1.BackColor = System.Drawing.SystemColors.Window;
            this.label1.Location = new System.Drawing.Point(24, 136);
            this.label1.Name = "label1";
            this.label1.Size = new System.Drawing.Size(360, 23);
            this.label1.TabIndex = 2;
            this.label1.Text = "Drag and Drop Files here to Repair";
            this.label1.TextAlign = System.Drawing.ContentAlignment.MiddleCenter;
            this.label1.DragEnter += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
            this.label1.DragDrop += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
            // 
            // columnHeader4
            // 
            this.columnHeader4.Text = "File";
            this.columnHeader4.Width = 218;
            // 
            // columnHeader5
            // 
            this.columnHeader5.Text = "Size (bytes)";
            this.columnHeader5.Width = 87;
            // 
            // columnHeader6
            // 
            this.columnHeader6.Text = "Status";
            // 
            // mainUI
            // 
            this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
            this.ClientSize = new System.Drawing.Size(400, 358);
            this.Controls.Add(this.label1);
            this.Controls.Add(this.FileListView);
            this.Controls.Add(this.panel1);
            this.DockPadding.All = 15;
            this.Name = "mainUI";
            this.Text = "LinWin File Repair Tool";
            this.panel1.ResumeLayout(false);
            this.ResumeLayout(false);

        }
        #endregion

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main() 
        {
            Application.Run(new mainUI());
        }

        private void DropSpot_DragEnter(object sender, 
                System.Windows.Forms.DragEventArgs e)
        {
            // We only want to accept files, so we only set our DragDropEffects 
            // if that's what's being dragged in
            if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
            {
                e.Effect = DragDropEffects.All;
            }
        }
        ArrayList Files = new ArrayList();
        private void DropSpot_DragDrop(object sender, 
                System.Windows.Forms.DragEventArgs e)
        {
            // Get a list of all objects in the Drop Data, that are files
            string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
            // Iterate through the dropped files
            for (int i=0;i<files.Length;i++)
            {
                // Add the to our ArrayList
                Files.Add(files[i]);
                // Create our new List View item
                ListViewItem item = new ListViewItem();
                // Get a file info object
                // we use this for getting file size, etc.
                System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
                item.Text = System.IO.Path.GetFileName(fInfo.Name);
                item.SubItems.Add(fInfo.Length.ToString());
                item.SubItems.Add("Pending");
                FileListView.Items.Add(item);
                FileListView.Tag = Files[Files.Count-1];
            }
            // Refresh the file list - for good measure
            this.Refresh();
            // If we added files, clear the instruction label
            if (FileListView.Items.Count>0) label1.Visible = false;

        }

        private void btnClear_Click(object sender, System.EventArgs e)
        {

            // Clear our ArrayList
            Files.Clear();
            // Clear our File ListView
            FileListView.Clear();
            // Bring the old instruction label back
            label1.Visible=true;
        }

        private void btnGo_Click(object sender, System.EventArgs e)
        {    
            // Iterate through our files list
            for (int i=0;i<Files.Count;i++)
            {
                // If repair was successful, 
                // Mark the status column for this file complete,
                // otherwise mark it failed
                if (!RepairFile(Files[i].ToString()))
                {
                    if (i<FileListView.Items.Count)
                    {
                        FileListView.Items[i].SubItems[2].Text = "Failed";
                    }
                }
                else
                {
                    if (i<FileListView.Items.Count)
                    {
                        FileListView.Items[i].SubItems[2].Text = "Complete";
                    }
                }
            }
        }

        private bool RepairFile(string Path)
        {
            // Create a file info object
            System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
            // If the file exists, proceed
            if (Fi.Exists)
            {
                // NOTE: Error trapping omitted for 
                // readability
                // You would want to trap the file operations in
                // a try / catch / finally block 
                // -----------------------------------------------
                // Create a StreamReader object 
                // We use a StreamReader because we are assuming 
                // that we are dealing with a text file
                System.IO.StreamReader S = Fi.OpenText();
                // Read the entire file -
                // NOTE: This would be better done using buffering
                // for performance, but for this example, I omitted it
                string FileBuffer = S.ReadToEnd();
                // Close our reader
                S.Close();
                // Call to our VerifyAscii method to ensure that
                // this is NOT a binary file
                if (VerifyAscii(FileBuffer))
                {
                    // Split our buffer into lines
                    string[] Lines = FileBuffer.Split('\n');
                    // Create our StreamWriter
                    // Again, using a streamWriter, since we are
                    // dealing with Text
                    System.IO.StreamWriter W = 
                        new System.IO.StreamWriter(Fi.OpenWrite());
                    // Loop through our "Lines"
                    // and use the StreamWriter's WriteLine
                    // Method to terminate
                    // the lines with the operating system
                    // specific carriage return / line feed combination
                    for (int i=0;i<Lines.Length;i++)
                    {
                        W.WriteLine(Lines[i].Trim());
                    }
                    // Close our writer
                    W.Close();
                    return true;
                }
                else
                {
                    // Error Message for "non-ascii" files
                    MessageBox.Show(Path+" \nDoes not Appear to be plain text. " + 
                               " No repair will be performed","File Format Error");
                    return false;
                }
            }
            return false;
        }

        private bool VerifyAscii(string Buffer)
        {
            // Create Regex for matching only the Ascii Table
            System.Text.RegularExpressions.Regex R = 
                new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
            // The Size of the block that we want to analyze
            // Done this way for performance
            // Much overhead (depending on size of file) to Regex the whole thing
            int BlockSize = 10;
            // Our Iteration variables
            int Start;
            int Len;
            string Block;
            System.Text.RegularExpressions.MatchCollection matchColl;
            // Iterate through our buffer
            for (int i=0;i<(Buffer.Length/BlockSize);i++)
            {
                // Starting Point for this iteration
                Start = (i*5);
                // Ternerary operator used to assign length of this block
                // we don't want to overshoot the end of the string buffer
                Len = 
                  (Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
                // Get our block from the buffer
                Block  = Buffer.Substring(Start,Len);
                // Run our Regex, and get our match collection
                matchColl = R.Matches(Block);
                // If our match count is less that the length of the string,
                // we know that we have characters outside of the ascii table
                if (matchColl.Count<Len)
                {
                    // Return false, this buffer could not be
                    // evaluated as Ascii Only
                    return false;
                }
            }
            // No bad charaters were found, 
            // so all characters are within the ascii table
            // Return true
            return true;
        }
    
    }
}
© . All rights reserved.