Introduction
Did you ever need to find specific text inside an XML file? Or did you ever need to identify, in a set of XML files, those containing a specific text? If yes, probably you used some kind of "multiple file search" utility, or simply the operating system "Search" feature, able to locate files containing a given text string. This is obviously a suitable solution for many cases. But... what about looking for specific text appearing in specific locations of the XML structure? For example: if you need to identify XML files containing a string inside a particular attribute value and *not* somewhere else, a generic text search tool becomes useless.
The simple utility presented here, named "XML Search", addresses this problem, implementing a multiple XML file search mechanism that takes care of the XML file structure.
How the utility works
The utility I wrote is very simple: given an "input" folder, it looks for XML files in that folder (and eventually subfolders), loading them - one by one - into an XmlDocument object, and looking for specific elements that match the search criteria you specified. The search outcome is a list of the successfully matching files, with the matching XML fragment highlighted.

Instead of an "input" folder, you can also specify a single file: in this case, the search task is limited to the given file content.
The search criteria are basically identified by an XPath expression you specify in the "Node selection condition" textbox: this expression will be used to select (in each XML file) the XML nodes where the tool will look for the text string you typed in the "Find what" textbox.
Some option buttons allow you to specify if the search has to be done:
- on the
InnerText property of the selected nodes
- on the
InnerXML property of the selected nodes
- on the
OuterXML property of the selected nodes
- on a specific attribute value of the selected nodes
You can also use the XML Search tool just to locate XML files containing specific nodes or attributes (when their value doesn't matter): to achieve this, you'll use the "Just check for node existence" option.
In the XPath search condition you may want to omit the root element of the XML structure; in this case you will check the "Start from DocumentElement" checkbox.
If you need a case-sensitive search, you will check the "Match case" checkbox (default search is case-insensitive on the "Find what" value, but be aware that the XPath expression is always case-sensitive).
If you need to exactly look for a given value, you will check the "Match whole text" checkbox; otherwise, the "Find what" value will be searched inside the XML fragment selected by your search criteria.
When the search ends, a list of matching files will be displayed:
- by clicking on a specific item in the list, the bottom panel will show the matching XML fragment found in that file (normally, the matching XML fragment is extracted from the
OuterXML property of a matching node).
- by right-clicking on a specific item in the list, you'll get a context menu allowing you to copy on the Windows Clipboard: the entire content of the selected file, the filename of the selected file, the full path of the selected file (last two options are also active when multiple items are selected in the file list).
Points of interest
The XML Search utility is very simple: it essentially puts together a recursive folder scanning with an XML DOM computation based on the SelectNodes method. The power of .NET classes makes the rest.
For the XML fragments displaying in the bottom panel, I used the very good "XML TreeView Control" found on CodeProject: I have to thank Thomas Siepe for it. I invite you to read his article about the control.
|
|
 |
 | Multithreading? HoyaSaxa93 | 15:05 17 Aug '09 |
|
 |
Have not looked at your code yet... but what are your thoughts on multithreading this app to process more documents in parallel?
I need to search 40,000+ xml documents and would like to apply a similar concept taking advantage of asynchronous delegates or other threading approach to process more documents simultaneously.
By the way... thanks for posting this article!
|
|
|
|
 |
|
 |
Hi. For sure, the adoption of multithreading could improve searching performance of this app (that was not written to process so many files). Of course, in a multithreaded approach, you have to define clearly a way to distribute the workload between threads (the current linear/sequential approach is based on GetFiles... you see), in order to avoid duplicated access to the same file by different threads. A part from this obvious matter, I think the only possible issues could be: the memory footprint (XmlDocuments are quite "hungry") and the I/O on the disk (or on the network, when searching on network shares); both could be a bottleneck not allowing to achieve a high degree of parallalism - some experiments would be needed...
Regards, AV
|
|
|
|
 |
|
 |
Thanks for your feedback and thanks again for the article.
|
|
|
|
 |
 | Nice job. Just what I was looking for. gary caden | 9:15 16 Apr '09 |
|
 |
I had a little trouble understanding what to put in the 'node selection condition' field. My xml files had BEML listed in the 'Tree View' of XML Notepade from MS. Once I entered BEML in that field your search utility worked great. Fsat too!
|
|
|
|
 |
|
 |
I also didn't quite understand what to put into the node selection field, so I just put a * in and was able to use it to find the string. Great work, helps a ton!
|
|
|
|
 |
|
 |
The node selection field expects an XPath expression useful to narrow your search on specific node or nodes subtree. AV
|
|
|
|
 |
 | help with a problem Catherine Bell | 4:56 26 Mar '09 |
|
 |
I have a list of order_ids in Excel. I have many many xml files containing the original orders. I want to pull the original orders out of the many xml files that match the list of order_ids in the Excel file to create a new XML file.
Will your app do that?
|
|
|
|
 |
|
 |
Hi. I far as I understand, you need to: 1) execute multiple searches on a set of XML files, 2) find some files that match your criteria 3) and merge the matching files in a unique target XML file.
This app let you make a single manual generic search over multiple XML files, so it cannot address your problem fully. But it could be helpful for some inspirations about to achieve the goal at point 2.
AV
|
|
|
|
 |
 | very good, clean and effective and easy to follow androoo | 12:00 26 Sep '06 |
|
|
 |
 | Inner/Outer XML and text searches VBProEd | 12:41 23 Mar '06 |
|
 |
Can you explain the use of Inner/Outer text search? I have a simple XML file and not many options work for me. I have attached a sample of an XML file that I need to perform searches on. Some of the searches I am attempting to try are USR_ID, TRANS_ID and MODIFIED_DATA.
Update: I pasted my XML here but it is not showing all information.
-- modified at 17:42 Thursday 23rd March, 2006
|
|
|
|
 |
|
 |
Well, for a generic XML node x, you can always inspect its properties:
x.InnerText (defined as: "the concatenated values of the node and all its child nodes") x.InnerXml (defined as: "the markup representing only the child nodes of this node") x.OuterXml (defined as: "the markup representing this node and all its child nodes")
Please, note: 1) OuterText doesn't exist; 2) "value" of an XML node is the text it contains
The XML slice you supplied doesn't help to understand the differences, because your nodes contain attributes but no text. So, think about a sample XML slice like the following:
<trANSACTION> <trANS_STATUS v=""SUCCESS"">Transaction succeeded.</trANS_STATUS> <trANS_ID v=""SE0503F"">TransactionID.</trANS_ID> </trANSACTION>
By running a statement like:
Dim x As XmlNode = d.SelectSingleNode("/trANSACTION")
you'll get:
x.InnerText: Transaction succeeded.TransactionID.
x.InnerXml: <trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS><trANS_ID v="SE0503F">TransactionID.</trANS_ID>
x.OuterXml: <trANSACTION><trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS><trANS_ID v="SE0503F">TransactionID.</trANS_ID></trANSACTION>
By running a statement like:
Dim x As XmlNode = d.SelectSingleNode("/trANSACTION/trANS_STATUS")
you'll get:
x.InnerText: Transaction succeeded.
x.InnerXml: Transaction succeeded.
x.OuterXml: <trANS_STATUS v="SUCCESS">Transaction succeeded.</trANS_STATUS>
Hope this helps to clarify how InnerText/InnerXml/OuterXml search types work in the search application.
AV
|
|
|
|
 |
|
 |
Hi...and thanks for the reply. I am new to XML and I am still searching for a book that explains all the gory detail. I had a problem pasting a partial XML Audit log file earlier and now I see there is an ignore HTML tags in this message check box. So, let me re-post the XML information again.
<TRANSACTION> <HDR> <ONLINE_PROG_ID v="AR0010N"></ONLINE_PROG_ID> <FETCH_DTM v="2003-02-24 11:15:44"></FETCH_DTM> <TRANS_STATUS v="SUCCESS"></TRANS_STATUS> <HDR_ACTION_CD v="I"></HDR_ACTION_CD> <USR_ID v="qzqxtf"></USR_ID> <WK_STATION_ID v="00065BAF2DC7"></WK_STATION_ID> <TRANS_ID v="AR1010U"></TRANS_ID> <REVIEW_STATUS v=" "></REVIEW_STATUS> <KEYNAME v="2"></KEYNAME> <KEYVALUE v="-1"></KEYVALUE> <MODIFIED_DATA> <T00677 v=""></T00677> <T00240 v=""></T00240> <T00245 v="1931-02-02"></T00245> <T00246 v="2003-02-24"></T00246> <T00248 v="WH"></T00248> <T00249 v="Arnold"></T00249> <T00252 v="-1"></T00252> <T01325 v="2003-02-24"></T01325> <T00250 v="EN"></T00250> <T00253 v="N"></T00253> <T00255 v="Patchouli"></T00255> <T00256 v=""></T00256> <T00257 v=""></T00257> <T02398 v="N"></T02398> <T00264 v="EN"></T00264> <T00266 v="M"></T00266> <T00267 v=""></T00267> <T00269 v="WI"></T00269> <T00270 v=""></T00270> <T00268 v="U"></T00268> <T00261 v=""></T00261> <T04108 v="RG"></T04108> <T00241 v=""></T00241> <T00647 v=""></T00647> <T00678 v=""></T00678> <T00835 v=""></T00835> <T00834 v=""></T00834> <T00834 v=""></T00834> <T00835 v=""></T00835> <T00244 v=""></T00244> <T00247 v=""></T00247> <T01767 v="0"></T01767> <T01776 v=""></T01776> <T01794 v="N"></T01794> <T00263 v="Y"></T00263> <T01800 v="Martinez"></T01800> <T01799 v=""></T01799> <T00251 v=""></T00251> <T01801 v=""></T01801> <T01802 v=""></T01802> <T01803 v="CA"></T01803> <T01805 v="Muir"></T01805> <T01806 v="30"></T01806> <T01807 v=""></T01807> <T01808 v=""></T01808> <T01809 v="RD"></T01809> <T01811 v=""></T01811> <T01812 v=""></T01812> <T01813 v="94553"></T01813> <T01814 v=""></T01814> <T02143 v=""></T02143> <T02288 v=""></T02288> <T02277 v="Martinez"></T02277> <T02276 v=""></T02276> <T02282 v=""></T02282> <T02283 v=""></T02283> <T02285 v="CA"></T02285> <T02286 v="Muir"></T02286> <T02287 v="30"></T02287> <T02289 v=""></T02289> <T02290 v="RD"></T02290> <T02291 v=""></T02291> <T02292 v=""></T02292> <T02293 v="94553"></T02293> <T02294 v=""></T02294> <T00258 v=""></T00258> <T02582 v=""></T02582> <T02587 v="-1"></T02587> <T02583 v=""></T02583> <T00265 v="Y"></T00265> <T00845 v=""></T00845> <T04167 v=""></T04167> <T04170 v=""></T04170> </MODIFIED_DATA> </HDR> </TRANSACTION>
This partial XML audit log is what I want to search. Auditors want to know for a given a USR_ID what that person touched on a given day or date range. Everything sandwiched between the MODIFIED_DATA tags are database column tags and the associated data values modified in a database. I need to be able to search an XML file and dump the search results into a VB.NET datagrid (well, maybe) or build an HTM document for analysis.
|
|
|
|
 |
 | Questions about Microsoft Exams you passed egc101 | 10:31 28 Oct '05 |
|
 |
Hello,
I found your article while reading CodeProject newsletter. After reading it, I saw that you passed many Microsoft Exams.
I would like to pass some exams as well (MCP, MCAD and MCSD .NET). I would like to use self-paced training.
So I've got a question : do you think it's possible to get Certified using self-paced training?
Quick Profile: I'm programming since I'm 9. But I didn't follow any 'High School' path...
Note: I do know this forum is not the really 'good' place to post this question, but your email is not mentioned.
Thank you. Bye.
Greg egc@free.fr
|
|
|
|
 |
|
 |
Yes, Greg, that's not the right place I'll answer you by e-mail.
|
|
|
|
 |
 | How about a code download? seanwright | 6:49 29 Sep '05 |
|
 |
Um, I don't see a download of any kind...
|
|
|
|
 |
|
 |
Try the download link on top of article (and eventually strengthen your glasses )
|
|
|
|
 |
|
|
 |
|
|
Last Updated 28 Sep 2005 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010