Batch Converting Linux ASCII Files to Windows






4.29/5 (5 votes)
Dec 8, 2004
6 min read

48163

369
This article explains using the Drag and Drop events to filter for files from the system. It also converts line ends from Linux (or other operating systems) to Windows style \n (new line) \r (carriage return) line ends.
Introduction
This article is a simple exercise in Drag & Drop, TextFile IO, and Batching.
Background
So my business partner and I were migrating our DNS information from BIND on one of our Linux boxes to a new Windows 2003 machine. In the documentation for Windows DNS, we found that you could simply move the Zone files created by BIND for our DNS Zones to the Windows DNS directory, rename them, and voila, you have a copy of DNS records.
As with all things that sound simple, they usually aren't. Windows didn't like the formatting of the Linux BIND files because they had only the \n for line feed / carriage return, and Windows wanted more. Windows uses \n\r for new lines. Well, I knew there were probably utilities out there for converting Linux (and other) ASCII files to Windows style ASCII files; however, I figured, what the heck, I'll write my own.
So I set upon writing this little utility. It took about 20 minutes, and I decided to make this my first post for Code Project.
The Code
The code is quite simple, but it does explain about doing drag & drop operations and filtering for files. Also included is some simple RegEx matching for characters in the UTF-8 character set. I have commented on the code profusely, so there shouldn't be too many questions.
The first thing we needed was a drag and drop interface, because we didn't want to open the files manually or by using a file dialog. So we created a list view with three columns: File (name), Size (bytes), and Status.
We then set up the list view with event handlers on DragEnter
and DragDrop
.
private void DropSpot_DragEnter(object sender, System.Windows.Forms.DragEventArgs e)
{
// We only want to accept files, so we only set our DragDropEffects
// if that's what's being dragged in
if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
{
e.Effect = DragDropEffects.All;
}
}
ArrayList Files = new ArrayList();
private void DropSpot_DragDrop(object sender,
System.Windows.Forms.DragEventArgs e)
{
// Get a list of all objects in the Drop Data, that are files
string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
// Iterate through the dropped files
for (int i=0;i<files.Length;i++)
{
// Add the to our ArrayList
Files.Add(files[i]);
// Create our new List View item
ListViewItem item = new ListViewItem();
// Get a file info object
// we use this for getting file size, etc.
System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
item.Text = System.IO.Path.GetFileName(fInfo.Name);
item.SubItems.Add(fInfo.Length.ToString());
item.SubItems.Add("Pending");
FileListView.Items.Add(item);
FileListView.Tag = Files[Files.Count-1];
}
// Refresh the file list - for good measure
this.Refresh();
// If we added files, clear the instruction label
if (FileListView.Items.Count>0) label1.Visible = false;
}
In both handlers, you can see that we filter using the DataFormats
enum
. We only want to accept objects of type DataFormats.FileDrop
. This gives us the objects that are files being dragged from a folder, or equivalent system objects.
The first handler, DragEnter
, simply sets the DragDropEffects
which will change the the mouse cursor appropriately, indicating that files can be dropped here.
Incidentally, for purposes of user direction, I added a label on top of the list view, which instructs users to "Drag and Drop files here to repair". This label is also given the same two event handlers described above. The label will "disappear" when files are added, and reappear when the file list is cleared.
The next piece of code needed was something to determine (as accurately as possible) whether or not the file is plain ASCII text or binary.
Note: By definition, all files stored on a computer are stored in binary format. What we are seeking to determine is whether or not the file stored is one containing "text" or "ASCII", and not binary data such as an image or otherwise non-textual files. After much research on other projects, I have yet to find a method that is 100% accurate, but the method described below is quite accurate. It requires that the characters in the file all fall within the UTF-8 character set.
On to the code. This method, VerifyAscii(string Buffer)
will take the input buffer, and, using C#'s Regular Expression matching, will search the file in blocks, to find whether or not all of the characters within these blocks are ASCII compliant. Note that the RegEx
is set to \xFF which can be changed to \x80 for the 7-bit ASCII set.
private bool VerifyAscii(string Buffer)
{
// Create Regex for matching only the Ascii Table
System.Text.RegularExpressions.Regex R =
new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
// The Size of the block that we want to analyze
// Done this way for performance
// Much overhead (depending on size of file) to Regex the whole thing
int BlockSize = 10;
// Our Iteration variables
int Start;
int Len;
string Block;
System.Text.RegularExpressions.MatchCollection matchColl;
// Iterate through our buffer
for (int i=0;i<(Buffer.Length/BlockSize);i++)
{
// Starting Point for this iteration
Start = (i*5);
// Ternerary operator used to assign length of this block
// we don't want to overshoot the end of the string buffer
Len = (Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
// Get our block from the buffer
Block = Buffer.Substring(Start,Len);
// Run our Regex, and get our match collection
matchColl = R.Matches(Block);
// If our match count is less that the length of the string,
// we know that we have characters outside of the ascii table
if (matchColl.Count<Len)
{
// Return false, this buffer could not be
// evaluated as Ascii Only
return false;
}
}
// No bad charaters were found,
// so all characters are within the ascii table
// Return true
return true;
}
What's Going On: For performance, the buffer is split into chunks. Doing a RegEx on a string that was the size of a typical system log file, for example, would take a long time, and it would be a lot of overhead. Doing this portion in small chunks allows for faster processing of the regular expression matching, as well as the re-creation of the string through each iteration.
Note: I opted not to use StringBuilder
here, as this just started out as a quick utility for converting about 200 DNS zone files; but that would be the way to go in place of the string buffer. In C#, when working with strings, you are dealing with a special type. C# treats strings as value types, but behind the scenes, they are objects. When you perform virtually any operation on a string such as concatenation, or truncation, the original object is not modified, a new object is created, the data is moved, and the old object is destroyed. This can create a lot of overhead in an intensive application.
Lesson: When performing large amounts of string operations in a real application, use System.Text.StringBuilder
as it is designed for these purposes.
Next up, we have our file repair method aptly named, RepairFile(string Path)
.
private bool RepairFile(string Path)
{
// Create a file info object
System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
// If the file exists, proceed
if (Fi.Exists)
{
// NOTE: Error trapping omitted for
// readability
// You would want to trap the file operations in
// a try / catch / finally block
// -----------------------------------------------
// Create a StreamReader object
// We use a StreamReader because we are assuming
// that we are dealing with a text file
System.IO.StreamReader S = Fi.OpenText();
// Read the entire file -
// NOTE: This would be better done using buffering
// for performance, but for this example, I omitted it
string FileBuffer = S.ReadToEnd();
// Close our reader
S.Close();
// Call to our VerifyAscii method to ensure that
// this is NOT a binary file
if (VerifyAscii(FileBuffer))
{
// Split our buffer into lines
string[] Lines = FileBuffer.Split('\n');
// Create our StreamWriter
// Again, using a streamWriter, since we are
// dealing with Text
System.IO.StreamWriter W =
new System.IO.StreamWriter(Fi.OpenWrite());
// Loop through our "Lines" and use the StreamWriter's WriteLine
// Method to terminate the lines with the operating system
// specific carriage return / line feed combination
for (int i=0;i<Lines.Length;i++)
{
W.WriteLine(Lines[i].Trim());
}
// Close our writer
W.Close();
return true;
}
else
{
// Error Message for "non-ascii" files
MessageBox.Show(Path+" \nDoes not Appear to be plain text. " +
" No repair will be performed","File Format Error");
return false;
}
}
return false;
}
What's Going on: This code is fairly self explanatory and, in places where it is not, the code contains a plethora of comments. In summary, this method opens the file specified by "path
" using a StreamReader
object. We use a StreamReader
object because we are assuming text in the files we're reading, and it is better suited for this purpose than the plain Stream
object.
Note: For simplicity, the file is read in its entirety using the ReadToEnd
method of the StreamReader
object. In a real world application, it would be better for performance of the application to read the file in, using blocks.
Once the file is loaded, it is first sent to our VerifyAscii
method described above. If VerifyAscii
returns true
, the buffer is split into lines using String.Split
method on the new-line character (\n). A StreamWriter
object is then created by opening our original file in Write
mode. For preservation purposes, it may be better to use a different file for the output, but I wasn't worried about the possible damage as these were copies anyway.
We then iterate through row in the Lines[] Array
and write it back to the original file. By using StreamWriter.WriteLine()
, we are terminating the lines with the operating system specific line termination characters. In Windows, lines are terminated by \n\r (ASCII 13 + ASCII 10).
Bringing it all together
The files, once dragged, are added to the ListView
in the DragDrop
event handler. To start the "repair", the "Go" button is clicked, and processing begins.
private void btnGo_Click(object sender, System.EventArgs e)
{
// Iterate through our files list
for (int i=0;i<Files.Count;i++)
{
// If repair was successful,
// Mark the status column for this file complete,
// otherwise mark it failed
if (!RepairFile(Files[i].ToString()))
{
if (i<FileListView.Items.Count)
{
FileListView.Items[i].SubItems[2].Text = "Failed";
}
}
else
{
if (i<FileListView.Items.Count)
{
FileListView.Items[i].SubItems[2].Text = "Complete";
}
}
}
}
What's Going on: Earlier, in our DragDrop
event handler, files were added to our list view, and an ArrayList
. When our "Go" button is clicked, we iterate through the objects (file paths, in this case) stored in the ArrayList
. Each item is successively passed to the RepairFile
method and, if RepairFile
returns true
, the corresponding item in the list view is marked "Complete"; otherwise, an error message is displayed and the item is marked "Failed".
Summary
Demonstrated here was the use of Drag and Drop event handlers to filter Drag Objects as files from the system, the use of Regex to match the UTF-8 character set, and batch processing of files. There are certainly areas in which this code could be improved, but many of these areas I left up to you to discover, and to keep the code simple.
Using the code
Here is the complete code. This was coded in Visual Studio 2003, so you should simply open and build it similarly.
using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
namespace LinWinRepair
{
/// <summary>
/// Summary description for Form1.
/// </summary>
public class mainUI : System.Windows.Forms.Form
{
private System.Windows.Forms.Panel panel1;
private System.Windows.Forms.Button btnClear;
private System.Windows.Forms.Button btnGo;
private System.Windows.Forms.ListView FileListView;
private System.Windows.Forms.Label label1;
private System.Windows.Forms.ColumnHeader columnHeader4;
private System.Windows.Forms.ColumnHeader columnHeader5;
private System.Windows.Forms.ColumnHeader columnHeader6;
/// <summary>
/// Required designer variable.
/// </summary>
private System.ComponentModel.Container components = null;
public mainUI()
{
//
// Required for Windows Form Designer support
//
InitializeComponent();
//
// TODO: Add any constructor code after InitializeComponent call
//
}
/// <summary>
/// Clean up any resources being used.
/// </summary>
protected override void Dispose( bool disposing )
{
if( disposing )
{
if (components != null)
{
components.Dispose();
}
}
base.Dispose( disposing );
}
#region Windows Form Designer generated code
/// <summary>
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
/// </summary>
private void InitializeComponent()
{
this.panel1 = new System.Windows.Forms.Panel();
this.btnClear = new System.Windows.Forms.Button();
this.btnGo = new System.Windows.Forms.Button();
this.FileListView = new System.Windows.Forms.ListView();
this.label1 = new System.Windows.Forms.Label();
this.columnHeader4 = new System.Windows.Forms.ColumnHeader();
this.columnHeader5 = new System.Windows.Forms.ColumnHeader();
this.columnHeader6 = new System.Windows.Forms.ColumnHeader();
this.panel1.SuspendLayout();
this.SuspendLayout();
//
// panel1
//
this.panel1.Controls.Add(this.btnClear);
this.panel1.Controls.Add(this.btnGo);
this.panel1.Dock = System.Windows.Forms.DockStyle.Bottom;
this.panel1.Location = new System.Drawing.Point(15, 303);
this.panel1.Name = "panel1";
this.panel1.Size = new System.Drawing.Size(370, 40);
this.panel1.TabIndex = 1;
//
// btnClear
//
this.btnClear.Anchor = ((System.Windows.Forms.AnchorStyles)
((System.Windows.Forms.AnchorStyles.Bottom |
System.Windows.Forms.AnchorStyles.Right)));
this.btnClear.Location = new System.Drawing.Point(208, 5);
this.btnClear.Name = "btnClear";
this.btnClear.TabIndex = 6;
this.btnClear.Text = "Clear";
this.btnClear.Click += new System.EventHandler(this.btnClear_Click);
//
// btnGo
//
this.btnGo.Anchor = ((System.Windows.Forms.AnchorStyles)
((System.Windows.Forms.AnchorStyles.Bottom |
System.Windows.Forms.AnchorStyles.Right)));
this.btnGo.Location = new System.Drawing.Point(288, 5);
this.btnGo.Name = "btnGo";
this.btnGo.TabIndex = 5;
this.btnGo.Text = "Go";
this.btnGo.Click += new System.EventHandler(this.btnGo_Click);
//
// FileListView
//
this.FileListView.AllowDrop = true;
this.FileListView.Columns.AddRange(new
System.Windows.Forms.ColumnHeader[] {
this.columnHeader4,
this.columnHeader5,
this.columnHeader6});
this.FileListView.Dock = System.Windows.Forms.DockStyle.Fill;
this.FileListView.Location = new System.Drawing.Point(15, 15);
this.FileListView.Name = "FileListView";
this.FileListView.Size = new System.Drawing.Size(370, 288);
this.FileListView.TabIndex = 0;
this.FileListView.View = System.Windows.Forms.View.Details;
this.FileListView.DragDrop += new
System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
this.FileListView.DragEnter += new
System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
//
// label1
//
this.label1.AllowDrop = true;
this.label1.BackColor = System.Drawing.SystemColors.Window;
this.label1.Location = new System.Drawing.Point(24, 136);
this.label1.Name = "label1";
this.label1.Size = new System.Drawing.Size(360, 23);
this.label1.TabIndex = 2;
this.label1.Text = "Drag and Drop Files here to Repair";
this.label1.TextAlign = System.Drawing.ContentAlignment.MiddleCenter;
this.label1.DragEnter += new
System.Windows.Forms.DragEventHandler(this.DropSpot_DragEnter);
this.label1.DragDrop += new
System.Windows.Forms.DragEventHandler(this.DropSpot_DragDrop);
//
// columnHeader4
//
this.columnHeader4.Text = "File";
this.columnHeader4.Width = 218;
//
// columnHeader5
//
this.columnHeader5.Text = "Size (bytes)";
this.columnHeader5.Width = 87;
//
// columnHeader6
//
this.columnHeader6.Text = "Status";
//
// mainUI
//
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.ClientSize = new System.Drawing.Size(400, 358);
this.Controls.Add(this.label1);
this.Controls.Add(this.FileListView);
this.Controls.Add(this.panel1);
this.DockPadding.All = 15;
this.Name = "mainUI";
this.Text = "LinWin File Repair Tool";
this.panel1.ResumeLayout(false);
this.ResumeLayout(false);
}
#endregion
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{
Application.Run(new mainUI());
}
private void DropSpot_DragEnter(object sender,
System.Windows.Forms.DragEventArgs e)
{
// We only want to accept files, so we only set our DragDropEffects
// if that's what's being dragged in
if (e.Data.GetDataPresent(DataFormats.FileDrop, false)==true)
{
e.Effect = DragDropEffects.All;
}
}
ArrayList Files = new ArrayList();
private void DropSpot_DragDrop(object sender,
System.Windows.Forms.DragEventArgs e)
{
// Get a list of all objects in the Drop Data, that are files
string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
// Iterate through the dropped files
for (int i=0;i<files.Length;i++)
{
// Add the to our ArrayList
Files.Add(files[i]);
// Create our new List View item
ListViewItem item = new ListViewItem();
// Get a file info object
// we use this for getting file size, etc.
System.IO.FileInfo fInfo = new System.IO.FileInfo(files[i]);
item.Text = System.IO.Path.GetFileName(fInfo.Name);
item.SubItems.Add(fInfo.Length.ToString());
item.SubItems.Add("Pending");
FileListView.Items.Add(item);
FileListView.Tag = Files[Files.Count-1];
}
// Refresh the file list - for good measure
this.Refresh();
// If we added files, clear the instruction label
if (FileListView.Items.Count>0) label1.Visible = false;
}
private void btnClear_Click(object sender, System.EventArgs e)
{
// Clear our ArrayList
Files.Clear();
// Clear our File ListView
FileListView.Clear();
// Bring the old instruction label back
label1.Visible=true;
}
private void btnGo_Click(object sender, System.EventArgs e)
{
// Iterate through our files list
for (int i=0;i<Files.Count;i++)
{
// If repair was successful,
// Mark the status column for this file complete,
// otherwise mark it failed
if (!RepairFile(Files[i].ToString()))
{
if (i<FileListView.Items.Count)
{
FileListView.Items[i].SubItems[2].Text = "Failed";
}
}
else
{
if (i<FileListView.Items.Count)
{
FileListView.Items[i].SubItems[2].Text = "Complete";
}
}
}
}
private bool RepairFile(string Path)
{
// Create a file info object
System.IO.FileInfo Fi = new System.IO.FileInfo(Path);
// If the file exists, proceed
if (Fi.Exists)
{
// NOTE: Error trapping omitted for
// readability
// You would want to trap the file operations in
// a try / catch / finally block
// -----------------------------------------------
// Create a StreamReader object
// We use a StreamReader because we are assuming
// that we are dealing with a text file
System.IO.StreamReader S = Fi.OpenText();
// Read the entire file -
// NOTE: This would be better done using buffering
// for performance, but for this example, I omitted it
string FileBuffer = S.ReadToEnd();
// Close our reader
S.Close();
// Call to our VerifyAscii method to ensure that
// this is NOT a binary file
if (VerifyAscii(FileBuffer))
{
// Split our buffer into lines
string[] Lines = FileBuffer.Split('\n');
// Create our StreamWriter
// Again, using a streamWriter, since we are
// dealing with Text
System.IO.StreamWriter W =
new System.IO.StreamWriter(Fi.OpenWrite());
// Loop through our "Lines"
// and use the StreamWriter's WriteLine
// Method to terminate
// the lines with the operating system
// specific carriage return / line feed combination
for (int i=0;i<Lines.Length;i++)
{
W.WriteLine(Lines[i].Trim());
}
// Close our writer
W.Close();
return true;
}
else
{
// Error Message for "non-ascii" files
MessageBox.Show(Path+" \nDoes not Appear to be plain text. " +
" No repair will be performed","File Format Error");
return false;
}
}
return false;
}
private bool VerifyAscii(string Buffer)
{
// Create Regex for matching only the Ascii Table
System.Text.RegularExpressions.Regex R =
new System.Text.RegularExpressions.Regex("[\x00-\xFF]");
// The Size of the block that we want to analyze
// Done this way for performance
// Much overhead (depending on size of file) to Regex the whole thing
int BlockSize = 10;
// Our Iteration variables
int Start;
int Len;
string Block;
System.Text.RegularExpressions.MatchCollection matchColl;
// Iterate through our buffer
for (int i=0;i<(Buffer.Length/BlockSize);i++)
{
// Starting Point for this iteration
Start = (i*5);
// Ternerary operator used to assign length of this block
// we don't want to overshoot the end of the string buffer
Len =
(Start+BlockSize>Buffer.Length) ? (Buffer.Length-Start) : BlockSize;
// Get our block from the buffer
Block = Buffer.Substring(Start,Len);
// Run our Regex, and get our match collection
matchColl = R.Matches(Block);
// If our match count is less that the length of the string,
// we know that we have characters outside of the ascii table
if (matchColl.Count<Len)
{
// Return false, this buffer could not be
// evaluated as Ascii Only
return false;
}
}
// No bad charaters were found,
// so all characters are within the ascii table
// Return true
return true;
}
}
}