Click here to Skip to main content
Email Password   helpLost your password?

Introduction

Did you ever need to find specific text inside an XML file? Or did you ever need to identify, in a set of XML files, those containing a specific text? If yes, probably you used some kind of "multiple file search" utility, or simply the operating system "Search" feature, able to locate files containing a given text string. This is obviously a suitable solution for many cases. But... what about looking for specific text appearing in specific locations of the XML structure? For example: if you need to identify XML files containing a string inside a particular attribute value and *not* somewhere else, a generic text search tool becomes useless.

The simple utility presented here, named "XML Search", addresses this problem, implementing a multiple XML file search mechanism that takes care of the XML file structure.

How the utility works

The utility I wrote is very simple: given an "input" folder, it looks for XML files in that folder (and eventually subfolders), loading them - one by one - into an XmlDocument object, and looking for specific elements that match the search criteria you specified. The search outcome is a list of the successfully matching files, with the matching XML fragment highlighted.

XML Search user interface

Instead of an "input" folder, you can also specify a single file: in this case, the search task is limited to the given file content.

The search criteria are basically identified by an XPath expression you specify in the "Node selection condition" textbox: this expression will be used to select (in each XML file) the XML nodes where the tool will look for the text string you typed in the "Find what" textbox.

Some option buttons allow you to specify if the search has to be done:

You can also use the XML Search tool just to locate XML files containing specific nodes or attributes (when their value doesn't matter): to achieve this, you'll use the "Just check for node existence" option.

In the XPath search condition you may want to omit the root element of the XML structure; in this case you will check the "Start from DocumentElement" checkbox.

If you need a case-sensitive search, you will check the "Match case" checkbox (default search is case-insensitive on the "Find what" value, but be aware that the XPath expression is always case-sensitive).

If you need to exactly look for a given value, you will check the "Match whole text" checkbox; otherwise, the "Find what" value will be searched inside the XML fragment selected by your search criteria.

When the search ends, a list of matching files will be displayed:

Points of interest

The XML Search utility is very simple: it essentially puts together a recursive folder scanning with an XML DOM computation based on the SelectNodes method. The power of .NET classes makes the rest.

For the XML fragments displaying in the bottom panel, I used the very good "XML TreeView Control" found on CodeProject: I have to thank Thomas Siepe for it. I invite you to read his article about the control.

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralMultithreading?
HoyaSaxa93
15:05 17 Aug '09  
Have not looked at your code yet... but what are your thoughts on multithreading this app to process more documents in parallel?

I need to search 40,000+ xml documents and would like to apply a similar concept taking advantage of asynchronous delegates or other threading approach to process more documents simultaneously.

By the way... thanks for posting this article!
GeneralRe: Multithreading?
Alberto Venditti
23:42 30 Aug '09  
Hi.
For sure, the adoption of multithreading could improve searching performance of this app (that was not written to process so many files).
Of course, in a multithreaded approach, you have to define clearly a way to distribute the workload between threads (the current linear/sequential approach is based on GetFiles... you see), in order to avoid duplicated access to the same file by different threads.
A part from this obvious matter, I think the only possible issues could be: the memory footprint (XmlDocuments are quite "hungry") and the I/O on the disk (or on the network, when searching on network shares); both could be a bottleneck not allowing to achieve a high degree of parallalism - some experiments would be needed...

Regards, AV
GeneralRe: Multithreading?
HoyaSaxa93
5:41 31 Aug '09  
Thanks for your feedback and thanks again for the article.
GeneralNice job. Just what I was looking for.
gary caden
9:15 16 Apr '09  
I had a little trouble understanding what to put in the 'node selection condition' field. My xml files had BEML listed in the 'Tree View' of XML Notepade from MS. Once I entered BEML in that field your search utility worked great. Fsat too!Thumbs Up
GeneralRe: Nice job. Just what I was looking for.
JMWTech
5:51 11 Dec '09  
I also didn't quite understand what to put into the node selection field, so I just put a * in and was able to use it to find the string. Great work, helps a ton!
GeneralRe: Nice job. Just what I was looking for.
Alberto Venditti
4:25 10 Jan '10  
The node selection field expects an XPath expression useful to narrow your search on specific node or nodes subtree.
AV
Generalhelp with a problem
Catherine Bell
4:56 26 Mar '09  
I have a list of order_ids in Excel. I have many many xml files containing the original orders. I want to pull the original orders out of the many xml files that match the list of order_ids in the Excel file to create a new XML file.

Will your app do that?
GeneralRe: help with a problem
Alberto Venditti
3:25 27 Mar '09  
Hi.
I far as I understand, you need to:
1) execute multiple searches on a set of XML files,
2) find some files that match your criteria
3) and merge the matching files in a unique target XML file.

This app let you make a single manual generic search over multiple XML files, so it cannot address your problem fully.
But it could be helpful for some inspirations about to achieve the goal at point 2.

AV
Generalvery good, clean and effective and easy to follow
androoo
12:00 26 Sep '06  
thanks very much !
GeneralInner/Outer XML and text searches
VBProEd
12:41 23 Mar '06  
Can you explain the use of Inner/Outer text search? I have a simple XML file and not many options work for me. I have attached a sample of an XML file that I need to perform searches on. Some of the searches I am attempting to try are USR_ID, TRANS_ID and MODIFIED_DATA.

Update: I pasted my XML here but it is not showing all information.


-- modified at 17:42 Thursday 23rd March, 2006
GeneralRe: Inner/Outer XML and text searches
Alberto Venditti
21:46 23 Mar '06  
Well, for a generic XML node x, you can always inspect its properties:

x.InnerText (defined as: "the concatenated values of the node and all its child nodes")
x.InnerXml (defined as: "the markup representing only the child nodes of this node")
x.OuterXml (defined as: "the markup representing this node and all its child nodes")

Please, note:
1) OuterText doesn't exist;
2) "value" of an XML node is the text it contains

The XML slice you supplied doesn't help to understand the differences, because your nodes contain attributes but no text.
So, think about a sample XML slice like the following:

<trANSACTION>
   <trANS_STATUS v=""SUCCESS"">Transaction succeeded.</trANS_STATUS>
   <trANS_ID v=""SE0503F"">TransactionID.</trANS_ID>
</trANSACTION>

By running a statement like:

Dim x As XmlNode = d.SelectSingleNode("/trANSACTION")

you'll get:

x.InnerText:
Transaction succeeded.TransactionID.

x.InnerXml:
<trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS><trANS_ID v="SE0503F">TransactionID.</trANS_ID>

x.OuterXml:
<trANSACTION><trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS><trANS_ID v="SE0503F">TransactionID.</trANS_ID></trANSACTION>



By running a statement like:

Dim x As XmlNode = d.SelectSingleNode("/trANSACTION/trANS_STATUS")

you'll get:

x.InnerText:
Transaction succeeded.

x.InnerXml:
Transaction succeeded.

x.OuterXml:
<trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS>

Hope this helps to clarify how InnerText/InnerXml/OuterXml search types work in the search application.

AV
GeneralRe: Inner/Outer XML and text searches
VBProEd
10:44 25 Mar '06  
Hi...and thanks for the reply.   I am new to XML and I am still searching for a book that explains all the gory detail.   I had a problem pasting a partial XML Audit log file earlier and now I see there is an ignore HTML tags in this message check box.   So, let me re-post the XML information again.

<TRANSACTION>
<HDR>
<ONLINE_PROG_ID v="AR0010N"></ONLINE_PROG_ID>
<FETCH_DTM v="2003-02-24 11:15:44"></FETCH_DTM>
<TRANS_STATUS v="SUCCESS"></TRANS_STATUS>
<HDR_ACTION_CD v="I"></HDR_ACTION_CD>
<USR_ID v="qzqxtf"></USR_ID>
<WK_STATION_ID v="00065BAF2DC7"></WK_STATION_ID>
<TRANS_ID v="AR1010U"></TRANS_ID>
<REVIEW_STATUS v=" "></REVIEW_STATUS>
<KEYNAME v="2"></KEYNAME>
<KEYVALUE v="-1"></KEYVALUE>
<MODIFIED_DATA>
<T00677 v=""></T00677>
<T00240 v=""></T00240>
<T00245 v="1931-02-02"></T00245>
<T00246 v="2003-02-24"></T00246>
<T00248 v="WH"></T00248>
<T00249 v="Arnold"></T00249>
<T00252 v="-1"></T00252>
<T01325 v="2003-02-24"></T01325>
<T00250 v="EN"></T00250>
<T00253 v="N"></T00253>
<T00255 v="Patchouli"></T00255>
<T00256 v=""></T00256>
<T00257 v=""></T00257>
<T02398 v="N"></T02398>
<T00264 v="EN"></T00264>
<T00266 v="M"></T00266>
<T00267 v=""></T00267>
<T00269 v="WI"></T00269>
<T00270 v=""></T00270>
<T00268 v="U"></T00268>
<T00261 v=""></T00261>
<T04108 v="RG"></T04108>
<T00241 v=""></T00241>
<T00647 v=""></T00647>
<T00678 v=""></T00678>
<T00835 v=""></T00835>
<T00834 v=""></T00834>
<T00834 v=""></T00834>
<T00835 v=""></T00835>
<T00244 v=""></T00244>
<T00247 v=""></T00247>
<T01767 v="0"></T01767>
<T01776 v=""></T01776>
<T01794 v="N"></T01794>
<T00263 v="Y"></T00263>
<T01800 v="Martinez"></T01800>
<T01799 v=""></T01799>
<T00251 v=""></T00251>
<T01801 v=""></T01801>
<T01802 v=""></T01802>
<T01803 v="CA"></T01803>
<T01805 v="Muir"></T01805>
<T01806 v="30"></T01806>
<T01807 v=""></T01807>
<T01808 v=""></T01808>
<T01809 v="RD"></T01809>
<T01811 v=""></T01811>
<T01812 v=""></T01812>
<T01813 v="94553"></T01813>
<T01814 v=""></T01814>
<T02143 v=""></T02143>
<T02288 v=""></T02288>
<T02277 v="Martinez"></T02277>
<T02276 v=""></T02276>
<T02282 v=""></T02282>
<T02283 v=""></T02283>
<T02285 v="CA"></T02285>
<T02286 v="Muir"></T02286>
<T02287 v="30"></T02287>
<T02289 v=""></T02289>
<T02290 v="RD"></T02290>
<T02291 v=""></T02291>
<T02292 v=""></T02292>
<T02293 v="94553"></T02293>
<T02294 v=""></T02294>
<T00258 v=""></T00258>
<T02582 v=""></T02582>
<T02587 v="-1"></T02587>
<T02583 v=""></T02583>
<T00265 v="Y"></T00265>
<T00845 v=""></T00845>
<T04167 v=""></T04167>
<T04170 v=""></T04170>
</MODIFIED_DATA>
</HDR>
</TRANSACTION>

This partial XML audit log is what I want to search.   Auditors want to know for a given a USR_ID what that person touched on a given day or date range.   Everything sandwiched between the MODIFIED_DATA tags are database column tags and the associated data values modified in a database. I need to be able to search an XML file and dump the search results into a VB.NET datagrid (well, maybe) or build an HTM document for analysis.
QuestionQuestions about Microsoft Exams you passed
egc101
10:31 28 Oct '05  
Hello,

I found your article while reading CodeProject newsletter.
After reading it, I saw that you passed many Microsoft Exams.

I would like to pass some exams as well (MCP, MCAD and MCSD .NET).
I would like to use self-paced training.

So I've got a question : do you think it's possible to get Certified using self-paced training?

Quick Profile:
I'm programming since I'm 9.
But I didn't follow any 'High School' path...

Note:
I do know this forum is not the really 'good' place to post this question, but your email is not mentioned. Smile

Thank you.
Bye.


Greg
egc@free.fr

AnswerRe: Questions about Microsoft Exams you passed
Alberto Venditti
22:52 28 Oct '05  
Yes, Greg, that's not the right place Sniff
I'll answer you by e-mail.
GeneralHow about a code download?
seanwright
6:49 29 Sep '05  
Um, I don't see a download of any kind...

GeneralRe: How about a code download?
Alberto Venditti
9:20 29 Sep '05  
Try the download link on top of article (and eventually strengthen your glasses Smile )
GeneralRe: How about a code download?
Hypnotron
8:34 30 Jul '06  
lol


Last Updated 28 Sep 2005 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010