Click here to Skip to main content
15,881,882 members
Articles / Programming Languages / C# 4.0

Quick XML Reader

Rate me:
Please Sign up or sign in to vote.
4.78/5 (18 votes)
23 Feb 2011CPOL4 min read 85K   14.8K   72   24
A quick XML interpreter for large XML files.

Image 1

Introduction

For my job at Trezorix, we're required to quite often open and read large (100 MB+) XML files. Usually we open XML files in Notepad, Internet Explorer (IE), or some kind of text editor. However, when you want to open a large XML file, it takes these systems hours to open the file, if these systems do not crash while opening the file. Since we work with huge XML files and we want to be able to view the content of those files reasonably quickly, we decided to take a peek on the web for existing software. We could not find a system that covers our needs and thus decided to develop a tool ourselves.

Approach

The main goal of the tool is to read large XML files and quickly present it to the screen. Most tools reading XML (except for Notepad) first read the entire file and then use an interpreter to put the XML document's structure together. We found that's the weakness of these tools because they need to read the entire XML file before they can display anything. We decided we wanted to run through the document and display data as quickly as possible, and thus developed an on-the-fly interpreter. This interpreter may not be as seamless as you're used to, but the gained performance (in my opinion) weighs much heavier.

Presentation

Although tools like IE are not really capable of opening large XML files, they do have one large pro, the presentation. Because the XML files are fully interpreted, the opening and closing tags in the XML files are matched and IE will allow you to expand and collapse elements, which makes reading the XML data easier and prettier. Second is highlighting the XML content so the user is able to quickly identify elements, attributes, and values. Because of performance reasons, we decided to drop the ability to expand and collapse elements. For highlighting the XML, we decided to make use of RTF.

The code

The code is basic, simple, and to the point. We developed two classes, one for reading and interpreting the XML, and one containing the ability to search through the read XML data. A third class make these two classes come together. Both the reading and searching methods are implemented asynchronously. For reading the XML, a simple while loop does the trick.

C#
using (FileStream streamSource = new FileStream(m_sFilename,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using (XmlReader xmlReader = XmlReader.Create(streamSource))
    {
        StringBuilder sbMarkedUp = new StringBuilder();
        xmlReader.MoveToElement();
        while (xmlReader.Read())
        {
            // Write content as RTF depening on the xmlReader.NodeType
        }
    }
}

The interpreter

The interpreter is very simple. It checks the NodeType and handles the XML accordingly. If an element was found, it will write the element tag to a StringBuilder object. Each line of XML will be written to a generic list of strings. The interpreter decides when to write a line of XML to the list. After the line is added to the generic list, the StringBuilder is cleared and the process repeats itself until the while loop is finished.

Reading portions of XML

The reading class exposes a function called ReadFragment. This function accepts a parameter (Offset) allowing the user to decide where to start the reading. The ReadFragment adds a header line with RTF definitions. Then it starts adding the lines of XML from the generic string list. The property VisibleLines allows the user to define the amount of lines returned by the ReadFragment function.

Events

The reader class exposes four events: StartParsing, EndParsing, ErrorOccured, and ReadyForPresentation which can be used in the GUI. Start- and EndParsing are used to indicate that the process reading the XML file was started or ended. The ErrorOccured event will obviously be raised when reading a file failed for whatever reason. The ReadyForPresentation event is raised when a certain amount of lines is added to the generic list. Handling this event allows you to immediately display interpreted XML to the user.

Searching

The search function is implemented to be able to find phrases within the XML document. It loops through each line in the generic list of strings and looks for the given phrase in each line. When a match is found, an event FoundItem will be raised. The matching word and line number will be returned in the event arguments. The search class will also maintain a list of found items also containing the matching words and line numbers. If the search process completes, a SearchComplete event will be raised.

C#
for (int iCount = 0; iCount < iLines; iCount++)
{
    string stringToSearch = m_lstLinesToSearch[iCount];
    int foundIndex = stringToSearch.IndexOf(m_sSearchString,
        StringComparison.OrdinalIgnoreCase);
    if (foundIndex >= 0)
        AddSearchResult(foundPhrase, iCount + 1);
}

Future plans

We plan to further develop the software so it supports a Find & Replace method and allows to save changes made by the user in the XML files. We also plan to add the ability to collapse and expand elements.

Resources

The demo project uses the DockPanel suite (http://sourceforge.net/projects/dockpanelsuite/) to be able to dock windows.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect http://4dotnet.nl
Netherlands Netherlands
I'm developing software for over two decades. I'm working as a cloud solution architect and a team lead at 4DotNet in The Netherlands.

In my current position, I love to help customers with their journey to the cloud. I like to create highly performant software and to help team members to a higher level of software development.

My focus is on the Microsoft development stack, mainly C# and the Microsoft Azure Cloud. I also have a strong affinity with Angular. As a result of my contributions to the community, I received the Microsoft MVP Award for Microsoft Azure.

Comments and Discussions

 
QuestionGreat Tool. Thanks!. I have a 4k resolution on my monitor and text only fills up half the screen Pin
nitrous_00721-Jul-19 5:13
nitrous_00721-Jul-19 5:13 
GeneralMy vote of 2 Pin
Eliezer Gensburger30-Jan-12 9:07
Eliezer Gensburger30-Jan-12 9:07 
GeneralRe: My vote of 2 Pin
Eduard Keilholz9-Oct-12 21:25
Eduard Keilholz9-Oct-12 21:25 
QuestionProblem with very large xml Pin
Member 40403957-Nov-11 12:22
Member 40403957-Nov-11 12:22 
QuestionProject won't open due to source controls Pin
Zipadie Doodah26-Aug-11 7:09
Zipadie Doodah26-Aug-11 7:09 
GeneralMy vote of 5 Pin
BryanWilkins1-Mar-11 2:41
professionalBryanWilkins1-Mar-11 2:41 
GeneralRe: My vote of 5 Pin
Eduard Keilholz1-Mar-11 12:14
Eduard Keilholz1-Mar-11 12:14 
GeneralMy vote of 5 Pin
Monjurul Habib28-Feb-11 20:15
professionalMonjurul Habib28-Feb-11 20:15 
GeneralRe: My vote of 5 Pin
Eduard Keilholz1-Mar-11 12:14
Eduard Keilholz1-Mar-11 12:14 
Generallike it - have 5 Pin
Pranay Rana24-Feb-11 1:17
professionalPranay Rana24-Feb-11 1:17 
nice one

GeneralRe: like it - have 5 Pin
Eduard Keilholz25-Feb-11 4:52
Eduard Keilholz25-Feb-11 4:52 
GeneralAssuming your motivation is only that described in the Introduction paragraph.... Pin
damnedyankee24-Feb-11 0:37
damnedyankee24-Feb-11 0:37 
GeneralRe: Assuming your motivation is only that described in the Introduction paragraph.... Pin
bilo8124-Feb-11 5:58
bilo8124-Feb-11 5:58 
QuestionIf I am not wrong u can use xml Linq for search method instead of for loop? Pin
R&D Ninja22-Feb-11 7:36
R&D Ninja22-Feb-11 7:36 
AnswerRe: If I am not wrong u can use xml Linq for search method instead of for loop? Pin
Eduard Keilholz22-Feb-11 20:57
Eduard Keilholz22-Feb-11 20:57 
GeneralLooks cool Pin
Sacha Barber22-Feb-11 6:06
Sacha Barber22-Feb-11 6:06 
GeneralRe: Looks cool Pin
Eduard Keilholz22-Feb-11 20:56
Eduard Keilholz22-Feb-11 20:56 
GeneralRe: Looks cool Pin
Sacha Barber22-Feb-11 21:29
Sacha Barber22-Feb-11 21:29 
GeneralRe: Looks cool Pin
Eduard Keilholz17-Feb-12 4:12
Eduard Keilholz17-Feb-12 4:12 
GeneralRe: Looks cool Pin
Sacha Barber17-Feb-12 5:03
Sacha Barber17-Feb-12 5:03 
GeneralSeems Good Pin
Dave Kerr22-Feb-11 5:09
mentorDave Kerr22-Feb-11 5:09 
GeneralRe: Seems Good Pin
Eduard Keilholz22-Feb-11 5:32
Eduard Keilholz22-Feb-11 5:32 
GeneralRe: Seems Good Pin
#realJSOP23-Feb-11 4:55
mve#realJSOP23-Feb-11 4:55 
GeneralRe: Seems Good Pin
Eduard Keilholz23-Feb-11 20:26
Eduard Keilholz23-Feb-11 20:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.