Click here to Skip to main content
Click here to Skip to main content

Quick XML Reader

By , 23 Feb 2011
 

Introduction

For my job at Trezorix, we're required to quite often open and read large (100 MB+) XML files. Usually we open XML files in Notepad, Internet Explorer (IE), or some kind of text editor. However, when you want to open a large XML file, it takes these systems hours to open the file, if these systems do not crash while opening the file. Since we work with huge XML files and we want to be able to view the content of those files reasonably quickly, we decided to take a peek on the web for existing software. We could not find a system that covers our needs and thus decided to develop a tool ourselves.

Approach

The main goal of the tool is to read large XML files and quickly present it to the screen. Most tools reading XML (except for Notepad) first read the entire file and then use an interpreter to put the XML document's structure together. We found that's the weakness of these tools because they need to read the entire XML file before they can display anything. We decided we wanted to run through the document and display data as quickly as possible, and thus developed an on-the-fly interpreter. This interpreter may not be as seamless as you're used to, but the gained performance (in my opinion) weighs much heavier.

Presentation

Although tools like IE are not really capable of opening large XML files, they do have one large pro, the presentation. Because the XML files are fully interpreted, the opening and closing tags in the XML files are matched and IE will allow you to expand and collapse elements, which makes reading the XML data easier and prettier. Second is highlighting the XML content so the user is able to quickly identify elements, attributes, and values. Because of performance reasons, we decided to drop the ability to expand and collapse elements. For highlighting the XML, we decided to make use of RTF.

The code

The code is basic, simple, and to the point. We developed two classes, one for reading and interpreting the XML, and one containing the ability to search through the read XML data. A third class make these two classes come together. Both the reading and searching methods are implemented asynchronously. For reading the XML, a simple while loop does the trick.

using (FileStream streamSource = new FileStream(m_sFilename,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using (XmlReader xmlReader = XmlReader.Create(streamSource))
    {
        StringBuilder sbMarkedUp = new StringBuilder();
        xmlReader.MoveToElement();
        while (xmlReader.Read())
        {
            // Write content as RTF depening on the xmlReader.NodeType
        }
    }
}

The interpreter

The interpreter is very simple. It checks the NodeType and handles the XML accordingly. If an element was found, it will write the element tag to a StringBuilder object. Each line of XML will be written to a generic list of strings. The interpreter decides when to write a line of XML to the list. After the line is added to the generic list, the StringBuilder is cleared and the process repeats itself until the while loop is finished.

Reading portions of XML

The reading class exposes a function called ReadFragment. This function accepts a parameter (Offset) allowing the user to decide where to start the reading. The ReadFragment adds a header line with RTF definitions. Then it starts adding the lines of XML from the generic string list. The property VisibleLines allows the user to define the amount of lines returned by the ReadFragment function.

Events

The reader class exposes four events: StartParsing, EndParsing, ErrorOccured, and ReadyForPresentation which can be used in the GUI. Start- and EndParsing are used to indicate that the process reading the XML file was started or ended. The ErrorOccured event will obviously be raised when reading a file failed for whatever reason. The ReadyForPresentation event is raised when a certain amount of lines is added to the generic list. Handling this event allows you to immediately display interpreted XML to the user.

Searching

The search function is implemented to be able to find phrases within the XML document. It loops through each line in the generic list of strings and looks for the given phrase in each line. When a match is found, an event FoundItem will be raised. The matching word and line number will be returned in the event arguments. The search class will also maintain a list of found items also containing the matching words and line numbers. If the search process completes, a SearchComplete event will be raised.

for (int iCount = 0; iCount < iLines; iCount++)
{
    string stringToSearch = m_lstLinesToSearch[iCount];
    int foundIndex = stringToSearch.IndexOf(m_sSearchString,
        StringComparison.OrdinalIgnoreCase);
    if (foundIndex >= 0)
        AddSearchResult(foundPhrase, iCount + 1);
}

Future plans

We plan to further develop the software so it supports a Find & Replace method and allows to save changes made by the user in the XML files. We also plan to add the ability to collapse and expand elements.

Resources

The demo project uses the DockPanel suite (http://sourceforge.net/projects/dockpanelsuite/) to be able to dock windows.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Eduard Keilholz
Software Developer (Senior) http://www.upbound.com
Netherlands Netherlands
In 1998 I started as webdesigner programming websites in Perl and later PHP. After two years wrote most of the websites in ASP and from then on lost the feeling with a linux/unix platform.
 
Since 2001 interested in Windows applications and now writing software using mostly C# for about 7 years now.
Follow on   Twitter

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 2memberEliezer Gensburger30-Jan-12 9:07 
too
GeneralRe: My vote of 2memberEduard Keilholz9-Oct-12 21:25 
What's wrong with it?
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

QuestionProblem with very large xmlmemberMember 40403957-Nov-11 12:22 
QXR was fast with files up to 500MB or so, but it hung on a 1GB file(comments.xml which is part of the stackoverflow.com data dump that you can download). Task Manager showed all memory used up. After six minutes I ended the task. I also noticed that it would not load files that are not well-formed xml and did not give any error report for them. one minor issue is that I was unable to scroll to the end of the file with the scroll bar, but it got very close and was able to use the mouse wheel to get to the end. You might be interested in checking out my XMLMax editor at xponentsoftware-dot-com. It loads the comments.xml file in about 20 seconds into a treeview with collapse/expand and is fully editable. It does not use the so called "load on demand" technique some xml readers use -that technique fails on some XML structures where there is a very large number of child nodes within a single fragment.
 
Bill
QuestionProject won't open due to source controlsmemberMember 28774026-Aug-11 7:09 
Project will not open due to source controls in place. Can the source control be removed before storing your project? Thanks.
GeneralMy vote of 5memberBryanWilkins1-Mar-11 2:41 
Great Job!
GeneralRe: My vote of 5memberEduard Keilholz1-Mar-11 12:14 
Thanks man!
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralMy vote of 5memberMonjurul Habib28-Feb-11 20:15 
good work !
GeneralRe: My vote of 5memberEduard Keilholz1-Mar-11 12:14 
Thanks man!
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

Generallike it - have 5memberPranay Rana24-Feb-11 1:17 
nice one

GeneralRe: like it - have 5memberEduard Keilholz25-Feb-11 4:52 
Thnx!
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralAssuming your motivation is only that described in the Introduction paragraph....memberdamnedyankee24-Feb-11 0:37 
Why not use a better editor designed to handle large files ? Or, why use an editor at all if you are merely searching for text. Tools exists for finding text without having to display the contents of the file.
 
Idiotic editors like Notepad, or MS Word, or other text display application which insist upon reading the entire file before displaying a single line of text are brain dead, and should be deleted from your list of usefull tools.
 
There are many freely available, and very high quality tools you could have chosen to perform this type of function.
 
In short, it seems you have wasted time doing this.
GeneralRe: Assuming your motivation is only that described in the Introduction paragraph....memberbilo8124-Feb-11 5:58 
Sorry,
I disagree to be honest.
They have created their own ad hoc viewer for exactly what they need and nothing more than that.
With other free tools you might have something that it does not work as you actually desire.
Besides, you wrote the code so you can fix a problem or a bug very quickly...add a functionality that you really need...all those stuff.
And of course, they had fun and they gain experience on this kind of problem i guess.
QuestionIf I am not wrong u can use xml Linq for search method instead of for loop?memberR&D_Man22-Feb-11 7:36 
Confused | :confused: If I am not wrong u can use xml Linq for search method instead of for loop? Confused | :confused:
Cool | :cool: Thanks for the article.5 VoteCool | :cool:
Smile | :) Thumbs Up | :thumbsup: Keep it UP.Thumbs Up | :thumbsup: Smile | :)
AnswerRe: If I am not wrong u can use xml Linq for search method instead of for loop?memberEduard Keilholz22-Feb-11 20:57 
Let's find out!
 
I'm sure there's indeed a way to use Linq and/or Lambda to tweak the search routine. Thanks for your vote!
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralLooks coolmvpSacha Barber22-Feb-11 6:06 
Good job...5 from me
Sacha Barber
  • Microsoft Visual C# MVP 2008-2011
  • Codeproject MVP 2008-2011
Your best friend is you.
I'm my best friend too. We share the same views, and hardly ever argue
 
My Blog : sachabarber.net

GeneralRe: Looks coolmemberEduard Keilholz22-Feb-11 20:56 
Wow, compliments from Sacha Barber! I'm pretty happy with that!
 
Thanks a lot!
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralRe: Looks coolmvpSacha Barber22-Feb-11 21:29 
Its good work man.
Sacha Barber
  • Microsoft Visual C# MVP 2008-2011
  • Codeproject MVP 2008-2011
Your best friend is you.
I'm my best friend too. We share the same views, and hardly ever argue
 
My Blog : sachabarber.net

GeneralRe: Looks coolmemberEduard Keilholz17-Feb-12 4:12 
BTW, I just read your blog at http://sachabarber.net/[^] about SignalR (which is great), but noticed the link to the code project on your Publications page is dead. Couldn't find another way to contact you Wink | ;)
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralRe: Looks coolmvpSacha Barber17-Feb-12 5:03 
Yeah SignalR is cool.
Sacha Barber
  • Microsoft Visual C# MVP 2008-2012
  • Codeproject MVP 2008-2011
Open Source Projects
Cinch SL/WPF MVVM

Your best friend is you.
I'm my best friend too. We share the same views, and hardly ever argue
 
My Blog : sachabarber.net

GeneralSeems GoodmemberDaveKerr22-Feb-11 5:09 
Hi Eduard,
 
Seems like a good solution - have you got any metrics you can publish for how this application performs against IE and notepad?
GeneralRe: Seems GoodmemberEduard Keilholz22-Feb-11 5:32 
I will post some benchmarks later on, but for now :
20 Mb XML files will take about 3 seconds, the system starts presenting XML (display it in the GUI) within 0,1 seconds.
For 100 Mb XML files, the numbers are about 10 seconds and 0,1 seconds to respond to the user.
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

GeneralRe: Seems GoodmvpJohn Simmons / outlaw programmer23-Feb-11 4:55 
I don't understand why a larger file would need more time to start displaying data. Assuming the large file and the small file contains the same data set (just different amounts of it), once you get to a certain point - regardless of the file size, it seems that it should start displaying data at pretty close to the same time.
".45 ACP - because shooting twice is just silly" - JSOP, 2010
-----
You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass." - Dale Earnhardt, 1997

GeneralRe: Seems GoodmemberEduard Keilholz23-Feb-11 20:26 
That's what I tried to say. In the comment I tried to explain that loading the entire document takes e certain amount of time depending on the filesize, however the time taken to present the XML is less then 0,1 second regardless of the filesize.
 
Cheers,
Eduard
.: I love it when a plan comes together :.
http://www.zonderpunt.nl

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130617.1 | Last Updated 24 Feb 2011
Article Copyright 2011 by Eduard Keilholz
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid