Click here to Skip to main content
Click here to Skip to main content

Quick XML Reader

, 23 Feb 2011
Rate this:
Please Sign up or sign in to vote.
A quick XML interpreter for large XML files.

Introduction

For my job at Trezorix, we're required to quite often open and read large (100 MB+) XML files. Usually we open XML files in Notepad, Internet Explorer (IE), or some kind of text editor. However, when you want to open a large XML file, it takes these systems hours to open the file, if these systems do not crash while opening the file. Since we work with huge XML files and we want to be able to view the content of those files reasonably quickly, we decided to take a peek on the web for existing software. We could not find a system that covers our needs and thus decided to develop a tool ourselves.

Approach

The main goal of the tool is to read large XML files and quickly present it to the screen. Most tools reading XML (except for Notepad) first read the entire file and then use an interpreter to put the XML document's structure together. We found that's the weakness of these tools because they need to read the entire XML file before they can display anything. We decided we wanted to run through the document and display data as quickly as possible, and thus developed an on-the-fly interpreter. This interpreter may not be as seamless as you're used to, but the gained performance (in my opinion) weighs much heavier.

Presentation

Although tools like IE are not really capable of opening large XML files, they do have one large pro, the presentation. Because the XML files are fully interpreted, the opening and closing tags in the XML files are matched and IE will allow you to expand and collapse elements, which makes reading the XML data easier and prettier. Second is highlighting the XML content so the user is able to quickly identify elements, attributes, and values. Because of performance reasons, we decided to drop the ability to expand and collapse elements. For highlighting the XML, we decided to make use of RTF.

The code

The code is basic, simple, and to the point. We developed two classes, one for reading and interpreting the XML, and one containing the ability to search through the read XML data. A third class make these two classes come together. Both the reading and searching methods are implemented asynchronously. For reading the XML, a simple while loop does the trick.

using (FileStream streamSource = new FileStream(m_sFilename,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using (XmlReader xmlReader = XmlReader.Create(streamSource))
    {
        StringBuilder sbMarkedUp = new StringBuilder();
        xmlReader.MoveToElement();
        while (xmlReader.Read())
        {
            // Write content as RTF depening on the xmlReader.NodeType
        }
    }
}

The interpreter

The interpreter is very simple. It checks the NodeType and handles the XML accordingly. If an element was found, it will write the element tag to a StringBuilder object. Each line of XML will be written to a generic list of strings. The interpreter decides when to write a line of XML to the list. After the line is added to the generic list, the StringBuilder is cleared and the process repeats itself until the while loop is finished.

Reading portions of XML

The reading class exposes a function called ReadFragment. This function accepts a parameter (Offset) allowing the user to decide where to start the reading. The ReadFragment adds a header line with RTF definitions. Then it starts adding the lines of XML from the generic string list. The property VisibleLines allows the user to define the amount of lines returned by the ReadFragment function.

Events

The reader class exposes four events: StartParsing, EndParsing, ErrorOccured, and ReadyForPresentation which can be used in the GUI. Start- and EndParsing are used to indicate that the process reading the XML file was started or ended. The ErrorOccured event will obviously be raised when reading a file failed for whatever reason. The ReadyForPresentation event is raised when a certain amount of lines is added to the generic list. Handling this event allows you to immediately display interpreted XML to the user.

Searching

The search function is implemented to be able to find phrases within the XML document. It loops through each line in the generic list of strings and looks for the given phrase in each line. When a match is found, an event FoundItem will be raised. The matching word and line number will be returned in the event arguments. The search class will also maintain a list of found items also containing the matching words and line numbers. If the search process completes, a SearchComplete event will be raised.

for (int iCount = 0; iCount < iLines; iCount++)
{
    string stringToSearch = m_lstLinesToSearch[iCount];
    int foundIndex = stringToSearch.IndexOf(m_sSearchString,
        StringComparison.OrdinalIgnoreCase);
    if (foundIndex >= 0)
        AddSearchResult(foundPhrase, iCount + 1);
}

Future plans

We plan to further develop the software so it supports a Find & Replace method and allows to save changes made by the user in the XML files. We also plan to add the ability to collapse and expand elements.

Resources

The demo project uses the DockPanel suite (http://sourceforge.net/projects/dockpanelsuite/) to be able to dock windows.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Eduard Keilholz
Software Developer (Senior) http://www.today-it.nl
Netherlands Netherlands
In 1998 I started as webdesigner programming websites in Perl and later PHP. After two years wrote most of the websites in ASP and from then on lost the feeling with a linux/unix platform.
 
Since 2001 interested in Windows applications and now writing software using mostly C# for about 7 years now.
Follow on   Twitter

Comments and Discussions

 
QuestionProject won't open due to source controls PinmemberMember 28774026-Aug-11 7:09 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140721.1 | Last Updated 24 Feb 2011
Article Copyright 2011 by Eduard Keilholz
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid