|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Note: This is an unedited contribution. If this article is inappropriate,
needs attention or copies someone else's work without reference then please
Report This Article
IntroductionThe TechFestXmlSolution is a tool which help the developer in parsing large xml documents this application is specific for perticular xml file and is taken from Project Gutenberg(http://www.gutenberg.org/) maintains a list of books in RDF format The application is used for searching the xml file for perticular bookid,getting books by index and searching book by text Perforamance and scalability issue:Processing large xml document using Dom object causes a high CPU,Memory and Bandwidth utiliation
Using the Code
In this application we have used xmlReader to go through each line and xmlDocument object to load the part of the xml . xmlReader.moveToContent() method Checks whether the current node is a content (non-white space text, CDATA, Element, EndElement, EntityReference, or EndEntity) node. If the node is not a content node, the reader skips ahead to the next content node or end of file. It skips over nodes of the following type: ProcessingInstruction, DocumentType, Comment, Whitespace, or SignificantWhitespace and xmlReader.Skip() method to skip children of the current node which we do not have to search . See the snap shot of the code while (!_xReader.EOF)
{ // check Node type,node name ,match attribute which is id to search, RDF is root element
if((_xReader.MoveToContent() == XmlNodeType.Element && _xReader.Name == "pgterms:etext"
&& _xReader.GetAttribute(0) == name) || _xReader.Name == "rdf:RDF")
{ // if the node name is rdf the it is root element don't skip it and continue to read
if (_xReader.Name == "rdf:RDF")
{
Console.WriteLine(" Before finding the book CPU usage ->"+sampleCounter.CpuUsage);
_xReader.Read();
}
else
{ // when the node containing Id is found create the document
// object to load the node only here we are loading the node
// in document not whole file for getting the data of node faster
Console.WriteLine(" After finding the book CPU usage ->" + sampleCounter.CpuUsage);
doc = new XmlDocument();
// to get node in memory
XmlNode xnode = doc.ReadNode(_xReader);
// check if element contains any attribute
if (xnode.Attributes.Count > 0)
{ // call the Initialise method of class Book which intializes whole variables
book1.Initialise(xnode);
}
// Print the whole book description after initialising
break;
}
}
else
{ // skip the whole node as it is not of use
_xReader.Skip();
_xReader.MoveToContent();
}
Similarly the application contains 2 more function for getting the books from index and searching the book by title/subject/publisher etc while searching the books at the back this function are writing the output in the text file the path of the text file is given in the App.Config please /// Function getBooks takes start and end index as argument to find that number of books
/// and retunrns the books in list of type Book
public override List
Points to RememberPlease see the schema of the xml file for such type which only the application can be used
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||