Batch Converting Linux ASCII Files to Windows

Seth Webster

4.29/5 (5 votes)

Dec 8, 2004

6 min read

48551

369

This article explains using the Drag and Drop events to filter for files from the system. It also converts line ends from Linux (or other operating systems) to Windows style \n (new line) \r (carriage return) line ends.

Download source - 15.5 Kb

Sample Image

Introduction

This article is a simple exercise in Drag & Drop, TextFile IO, and Batching.

Background

So my business partner and I were migrating our DNS information from BIND on one of our Linux boxes to a new Windows 2003 machine. In the documentation for Windows DNS, we found that you could simply move the Zone files created by BIND for our DNS Zones to the Windows DNS directory, rename them, and voila, you have a copy of DNS records.

As with all things that sound simple, they usually aren't. Windows didn't like the formatting of the Linux BIND files because they had only the \n for line feed / carriage return, and Windows wanted more. Windows uses \n\r for new lines. Well, I knew there were probably utilities out there for converting Linux (and other) ASCII files to Windows style ASCII files; however, I figured, what the heck, I'll write my own.

So I set upon writing this little utility. It took about 20 minutes, and I decided to make this my first post for Code Project.

The Code

The code is quite simple, but it does explain about doing drag & drop operations and filtering for files. Also included is some simple RegEx matching for characters in the UTF-8 character set. I have commented on the code profusely, so there shouldn't be too many questions.

The first thing we needed was a drag and drop interface, because we didn't want to open the files manually or by using a file dialog. So we created a list view with three columns: File (name), Size (bytes), and Status.

We then set up the list view with event handlers on DragEnter and DragDrop.

private void DropSpot_DragEnter(object sender, System.Windows.Forms.DragEventArgs e)
{
    // We only want to accept files, so we only set our DragDropEffects 
    // if that's what's being dragged in
    if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
    {
        e.Effect = DragDropEffects.All;
    }
}
ArrayList Files = new ArrayList();
private void DropSpot_DragDrop(object sender, 
                System.Windows.Forms.DragEventArgs e)
{
    // Get a list of all objects in the Drop Data, that are files
    string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
    // Iterate through the dropped files
    for (int i=0;i<files.Length;i++)
    {
        // Add the to our ArrayList
        Files.Add(files[i]);
        // Create our new List View item
        ListViewItem item = new ListViewItem();
        // Get a file info object
        // we use this for getting file size, etc.
        System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
        item.Text = System.IO.Path.GetFileName(fInfo.Name);
        item.SubItems.Add(fInfo.Length.ToString());
        item.SubItems.Add("Pending");
        FileListView.Items.Add(item);
        FileListView.Tag = Files[Files.Count-1];
    }
    // Refresh the file list - for good measure
    this.Refresh();
    // If we added files, clear the instruction label
    if (FileListView.Items.Count>0) label1.Visible = false;

}

In both handlers, you can see that we filter using the DataFormats enum. We only want to accept objects of type DataFormats.FileDrop. This gives us the objects that are files being dragged from a folder, or equivalent system objects.

The first handler, DragEnter, simply sets the DragDropEffects which will change the the mouse cursor appropriately, indicating that files can be dropped here.

Incidentally, for purposes of user direction, I added a label on top of the list view, which instructs users to "Drag and Drop files here to repair". This label is also given the same two event handlers described above. The label will "disappear" when files are added, and reappear when the file list is cleared.

The next piece of code needed was something to determine (as accurately as possible) whether or not the file is plain ASCII text or binary.

Note: By definition, all files stored on a computer are stored in binary format. What we are seeking to determine is whether or not the file stored is one containing "text" or "ASCII", and not binary data such as an image or otherwise non-textual files. After much research on other projects, I have yet to find a method that is 100% accurate, but the method described below is quite accurate. It requires that the characters in the file all fall within the UTF-8 character set.

On to the code. This method, VerifyAscii(string Buffer) will take the input buffer, and, using C#'s Regular Expression matching, will search the file in blocks, to find whether or not all of the characters within these blocks are ASCII compliant. Note that the RegEx is set to \xFF which can be changed to \x80 for the 7-bit ASCII set.

private bool VerifyAscii(string Buffer)
{
    // Create Regex for matching only the Ascii Table
    System.Text.RegularExpressions.Regex R = 
            new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
    // The Size of the block that we want to analyze
    // Done this way for performance
    // Much overhead (depending on size of file) to Regex the whole thing
    int BlockSize = 10;
    // Our Iteration variables
    int Start;
    int Len;
    string Block;
    System.Text.RegularExpressions.MatchCollection matchColl;
    // Iterate through our buffer
    for (int i=0;i<(Buffer.Length/BlockSize);i++)
    {
        // Starting Point for this iteration
        Start = (i*5);
        // Ternerary operator used to assign length of this block
        // we don't want to overshoot the end of the string buffer
        Len = (Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
        // Get our block from the buffer
        Block  = Buffer.Substring(Start,Len);
        // Run our Regex, and get our match collection
        matchColl = R.Matches(Block);
        // If our match count is less that the length of the string,
        // we know that we have characters outside of the ascii table
        if (matchColl.Count<Len)
        {
            // Return false, this buffer could not be
            // evaluated as Ascii Only
            return false;
        }
    }
    // No bad charaters were found, 
    // so all characters are within the ascii table
    // Return true
    return true;
}

What's Going On: For performance, the buffer is split into chunks. Doing a RegEx on a string that was the size of a typical system log file, for example, would take a long time, and it would be a lot of overhead. Doing this portion in small chunks allows for faster processing of the regular expression matching, as well as the re-creation of the string through each iteration.

Note: I opted not to use StringBuilder here, as this just started out as a quick utility for converting about 200 DNS zone files; but that would be the way to go in place of the string buffer. In C#, when working with strings, you are dealing with a special type. C# treats strings as value types, but behind the scenes, they are objects. When you perform virtually any operation on a string such as concatenation, or truncation, the original object is not modified, a new object is created, the data is moved, and the old object is destroyed. This can create a lot of overhead in an intensive application.

Lesson: When performing large amounts of string operations in a real application, use System.Text.StringBuilder as it is designed for these purposes.

Next up, we have our file repair method aptly named, RepairFile(string Path).

private bool RepairFile(string Path)
{
    // Create a file info object
    System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
    // If the file exists, proceed
    if (Fi.Exists)
    {
        // NOTE: Error trapping omitted for 
        // readability
        // You would want to trap the file operations in
        // a try / catch / finally block 
        // -----------------------------------------------
        // Create a StreamReader object 
        // We use a StreamReader because we are assuming 
        // that we are dealing with a text file
        System.IO.StreamReader S = Fi.OpenText();
        // Read the entire file -
        // NOTE: This would be better done using buffering
        // for performance, but for this example, I omitted it
        string FileBuffer = S.ReadToEnd();
        // Close our reader
        S.Close();
        // Call to our VerifyAscii method to ensure that
        // this is NOT a binary file
        if (VerifyAscii(FileBuffer))
        {
            // Split our buffer into lines
            string[] Lines = FileBuffer.Split('\n');
            // Create our StreamWriter
            // Again, using a streamWriter, since we are
            // dealing with Text
            System.IO.StreamWriter W = 
                   new System.IO.StreamWriter(Fi.OpenWrite());
            // Loop through our "Lines" and use the StreamWriter's WriteLine
            // Method to terminate the lines with the operating system
            // specific carriage return / line feed combination
            for (int i=0;i<Lines.Length;i++)
            {
                W.WriteLine(Lines[i].Trim());
            }
            // Close our writer
            W.Close();
            return true;
        }
        else
        {
            // Error Message for "non-ascii" files
            MessageBox.Show(Path+" \nDoes not Appear to be plain text. " + 
                       " No repair will be performed","File Format Error");
            return false;
        }
    }
    return false;
}

What's Going on: This code is fairly self explanatory and, in places where it is not, the code contains a plethora of comments. In summary, this method opens the file specified by "path" using a StreamReader object. We use a StreamReader object because we are assuming text in the files we're reading, and it is better suited for this purpose than the plain Stream object.

Note: For simplicity, the file is read in its entirety using the ReadToEnd method of the StreamReader object. In a real world application, it would be better for performance of the application to read the file in, using blocks.

Once the file is loaded, it is first sent to our VerifyAscii method described above. If VerifyAscii returns true, the buffer is split into lines using String.Split method on the new-line character (\n). A StreamWriter object is then created by opening our original file in Write mode. For preservation purposes, it may be better to use a different file for the output, but I wasn't worried about the possible damage as these were copies anyway.

We then iterate through row in the Lines[] Array and write it back to the original file. By using StreamWriter.WriteLine(), we are terminating the lines with the operating system specific line termination characters. In Windows, lines are terminated by \n\r (ASCII 13 + ASCII 10).

Bringing it all together

The files, once dragged, are added to the ListView in the DragDrop event handler. To start the "repair", the "Go" button is clicked, and processing begins.

private void btnGo_Click(object sender, System.EventArgs e)
{
    // Iterate through our files list
    for (int i=0;i<Files.Count;i++)
    {                
        // If repair was successful, 
        // Mark the status column for this file complete,
        // otherwise mark it failed
        if (!RepairFile(Files[i].ToString()))
        {
            if (i<FileListView.Items.Count)
            {
                FileListView.Items[i].SubItems[2].Text = "Failed";
            }
        }
        else
        {
            if (i<FileListView.Items.Count)
            {
                FileListView.Items[i].SubItems[2].Text = "Complete";
            }
        }
    }
}

What's Going on: Earlier, in our DragDrop event handler, files were added to our list view, and an ArrayList. When our "Go" button is clicked, we iterate through the objects (file paths, in this case) stored in the ArrayList. Each item is successively passed to the RepairFile method and, if RepairFile returns true, the corresponding item in the list view is marked "Complete"; otherwise, an error message is displayed and the item is marked "Failed".

Summary

Demonstrated here was the use of Drag and Drop event handlers to filter Drag Objects as files from the system, the use of Regex to match the UTF-8 character set, and batch processing of files. There are certainly areas in which this code could be improved, but many of these areas I left up to you to discover, and to keep the code simple.

Using the code

Here is the complete code. This was coded in Visual Studio 2003, so you should simply open and build it similarly.

using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;

namespace LinWinRepair
{
    /// <summary>
    /// Summary description for Form1.
    /// </summary>
    public class mainUI : System.Windows.Forms.Form
    {
        private System.Windows.Forms.Panel panel1;
        private System.Windows.Forms.Button btnClear;
        private System.Windows.Forms.Button btnGo;
        private System.Windows.Forms.ListView FileListView;
        private System.Windows.Forms.Label label1;
        private System.Windows.Forms.ColumnHeader columnHeader4;
        private System.Windows.Forms.ColumnHeader columnHeader5;
        private System.Windows.Forms.ColumnHeader columnHeader6;
        /// <summary>
        /// Required designer variable.
        /// </summary>
        private System.ComponentModel.Container components = null;

        public mainUI()
        {
            //
            // Required for Windows Form Designer support
            //
            InitializeComponent();

            //
            // TODO: Add any constructor code after InitializeComponent call
            //
        }

        /// <summary>
        /// Clean up any resources being used.
        /// </summary>
        protected override void Dispose( bool disposing )
        {
            if( disposing )
            {
                if (components != null) 
                {
                    components.Dispose();
                }
            }
            base.Dispose( disposing );
        }

        #region Windows Form Designer generated code
        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InitializeComponent()
        {
            this.panel1 = new System.Windows.Forms.Panel();
            this.btnClear = new System.Windows.Forms.Button();
            this.btnGo = new System.Windows.Forms.Button();
            this.FileListView = new System.Windows.Forms.ListView();
            this.label1 = new System.Windows.Forms.Label();
            this.columnHeader4 = new System.Windows.Forms.ColumnHeader();
            this.columnHeader5 = new System.Windows.Forms.ColumnHeader();
            this.columnHeader6 = new System.Windows.Forms.ColumnHeader();
            this.panel1.SuspendLayout();
            this.SuspendLayout();
            // 
            // panel1
            // 
            this.panel1.Controls.Add(this.btnClear);
            this.panel1.Controls.Add(this.btnGo);
            this.panel1.Dock = System.Windows.Forms.DockStyle.Bottom;
            this.panel1.Location = new System.Drawing.Point(15, 303);
            this.panel1.Name = "panel1";
            this.panel1.Size = new System.Drawing.Size(370, 40);
            this.panel1.TabIndex = 1;
            // 
            // btnClear
            // 
            this.btnClear.Anchor = ((System.Windows.Forms.AnchorStyles)
                 ((System.Windows.Forms.AnchorStyles.Bottom | 
                 System.Windows.Forms.AnchorStyles.Right)));
            this.btnClear.Location = new System.Drawing.Point(208, 5);
            this.btnClear.Name = "btnClear";
            this.btnClear.TabIndex = 6;
            this.btnClear.Text = "Clear";
            this.btnClear.Click += new System.EventHandler(this.btnClear_Click);
            // 
            // btnGo
            // 
            this.btnGo.Anchor = ((System.Windows.Forms.AnchorStyles)
                 ((System.Windows.Forms.AnchorStyles.Bottom | 
                 System.Windows.Forms.AnchorStyles.Right)));
            this.btnGo.Location = new System.Drawing.Point(288, 5);
            this.btnGo.Name = "btnGo";
            this.btnGo.TabIndex = 5;
            this.btnGo.Text = "Go";
            this.btnGo.Click += new System.EventHandler(this.btnGo_Click);
            // 
            // FileListView
            // 
            this.FileListView.AllowDrop = true;
            this.FileListView.Columns.AddRange(new 
                 System.Windows.Forms.ColumnHeader[] {
                        this.columnHeader4,
                        this.columnHeader5,
                        this.columnHeader6});
            this.FileListView.Dock = System.Windows.Forms.DockStyle.Fill;
            this.FileListView.Location = new System.Drawing.Point(15, 15);
            this.FileListView.Name = "FileListView";
            this.FileListView.Size = new System.Drawing.Size(370, 288);
            this.FileListView.TabIndex = 0;
            this.FileListView.View = System.Windows.Forms.View.Details;
            this.FileListView.DragDrop += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
            this.FileListView.DragEnter += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
            // 
            // label1
            // 
            this.label1.AllowDrop = true;
            this.label1.BackColor = System.Drawing.SystemColors.Window;
            this.label1.Location = new System.Drawing.Point(24, 136);
            this.label1.Name = "label1";
            this.label1.Size = new System.Drawing.Size(360, 23);
            this.label1.TabIndex = 2;
            this.label1.Text = "Drag and Drop Files here to Repair";
            this.label1.TextAlign = System.Drawing.ContentAlignment.MiddleCenter;
            this.label1.DragEnter += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
            this.label1.DragDrop += new 
                 System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
            // 
            // columnHeader4
            // 
            this.columnHeader4.Text = "File";
            this.columnHeader4.Width = 218;
            // 
            // columnHeader5
            // 
            this.columnHeader5.Text = "Size (bytes)";
            this.columnHeader5.Width = 87;
            // 
            // columnHeader6
            // 
            this.columnHeader6.Text = "Status";
            // 
            // mainUI
            // 
            this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
            this.ClientSize = new System.Drawing.Size(400, 358);
            this.Controls.Add(this.label1);
            this.Controls.Add(this.FileListView);
            this.Controls.Add(this.panel1);
            this.DockPadding.All = 15;
            this.Name = "mainUI";
            this.Text = "LinWin File Repair Tool";
            this.panel1.ResumeLayout(false);
            this.ResumeLayout(false);

        }
        #endregion

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main() 
        {
            Application.Run(new mainUI());
        }

        private void DropSpot_DragEnter(object sender, 
                System.Windows.Forms.DragEventArgs e)
        {
            // We only want to accept files, so we only set our DragDropEffects 
            // if that's what's being dragged in
            if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
            {
                e.Effect = DragDropEffects.All;
            }
        }
        ArrayList Files = new ArrayList();
        private void DropSpot_DragDrop(object sender, 
                System.Windows.Forms.DragEventArgs e)
        {
            // Get a list of all objects in the Drop Data, that are files
            string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
            // Iterate through the dropped files
            for (int i=0;i<files.Length;i++)
            {
                // Add the to our ArrayList
                Files.Add(files[i]);
                // Create our new List View item
                ListViewItem item = new ListViewItem();
                // Get a file info object
                // we use this for getting file size, etc.
                System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
                item.Text = System.IO.Path.GetFileName(fInfo.Name);
                item.SubItems.Add(fInfo.Length.ToString());
                item.SubItems.Add("Pending");
                FileListView.Items.Add(item);
                FileListView.Tag = Files[Files.Count-1];
            }
            // Refresh the file list - for good measure
            this.Refresh();
            // If we added files, clear the instruction label
            if (FileListView.Items.Count>0) label1.Visible = false;

        }

        private void btnClear_Click(object sender, System.EventArgs e)
        {

            // Clear our ArrayList
            Files.Clear();
            // Clear our File ListView
            FileListView.Clear();
            // Bring the old instruction label back
            label1.Visible=true;
        }

        private void btnGo_Click(object sender, System.EventArgs e)
        {    
            // Iterate through our files list
            for (int i=0;i<Files.Count;i++)
            {
                // If repair was successful, 
                // Mark the status column for this file complete,
                // otherwise mark it failed
                if (!RepairFile(Files[i].ToString()))
                {
                    if (i<FileListView.Items.Count)
                    {
                        FileListView.Items[i].SubItems[2].Text = "Failed";
                    }
                }
                else
                {
                    if (i<FileListView.Items.Count)
                    {
                        FileListView.Items[i].SubItems[2].Text = "Complete";
                    }
                }
            }
        }

        private bool RepairFile(string Path)
        {
            // Create a file info object
            System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
            // If the file exists, proceed
            if (Fi.Exists)
            {
                // NOTE: Error trapping omitted for 
                // readability
                // You would want to trap the file operations in
                // a try / catch / finally block 
                // -----------------------------------------------
                // Create a StreamReader object 
                // We use a StreamReader because we are assuming 
                // that we are dealing with a text file
                System.IO.StreamReader S = Fi.OpenText();
                // Read the entire file -
                // NOTE: This would be better done using buffering
                // for performance, but for this example, I omitted it
                string FileBuffer = S.ReadToEnd();
                // Close our reader
                S.Close();
                // Call to our VerifyAscii method to ensure that
                // this is NOT a binary file
                if (VerifyAscii(FileBuffer))
                {
                    // Split our buffer into lines
                    string[] Lines = FileBuffer.Split('\n');
                    // Create our StreamWriter
                    // Again, using a streamWriter, since we are
                    // dealing with Text
                    System.IO.StreamWriter W = 
                        new System.IO.StreamWriter(Fi.OpenWrite());
                    // Loop through our "Lines"
                    // and use the StreamWriter's WriteLine
                    // Method to terminate
                    // the lines with the operating system
                    // specific carriage return / line feed combination
                    for (int i=0;i<Lines.Length;i++)
                    {
                        W.WriteLine(Lines[i].Trim());
                    }
                    // Close our writer
                    W.Close();
                    return true;
                }
                else
                {
                    // Error Message for "non-ascii" files
                    MessageBox.Show(Path+" \nDoes not Appear to be plain text. " + 
                               " No repair will be performed","File Format Error");
                    return false;
                }
            }
            return false;
        }

        private bool VerifyAscii(string Buffer)
        {
            // Create Regex for matching only the Ascii Table
            System.Text.RegularExpressions.Regex R = 
                new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
            // The Size of the block that we want to analyze
            // Done this way for performance
            // Much overhead (depending on size of file) to Regex the whole thing
            int BlockSize = 10;
            // Our Iteration variables
            int Start;
            int Len;
            string Block;
            System.Text.RegularExpressions.MatchCollection matchColl;
            // Iterate through our buffer
            for (int i=0;i<(Buffer.Length/BlockSize);i++)
            {
                // Starting Point for this iteration
                Start = (i*5);
                // Ternerary operator used to assign length of this block
                // we don't want to overshoot the end of the string buffer
                Len = 
                  (Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
                // Get our block from the buffer
                Block  = Buffer.Substring(Start,Len);
                // Run our Regex, and get our match collection
                matchColl = R.Matches(Block);
                // If our match count is less that the length of the string,
                // we know that we have characters outside of the ascii table
                if (matchColl.Count<Len)
                {
                    // Return false, this buffer could not be
                    // evaluated as Ascii Only
                    return false;
                }
            }
            // No bad charaters were found, 
            // so all characters are within the ascii table
            // Return true
            return true;
        }
    
    }
}