Click here to Skip to main content
Click here to Skip to main content

XSearch - a class that implements a search engine-style advanced search

By , 16 May 2007
 

Introduction

CXSearch encapsulates a class that implements a search engine-style advanced search - for example, the Google search engine:

or the Yahoo search engine:

In the above examples, there are four input fields, which I will refer to in this article as
  1. the ALL field - all words in this field must be present for a successful search
  2. the EXACT PHRASE field - this field contains a single phrase* (one or more words); the phrase must be present for a successful search
  3. the AT LEAST ONE field - at least one of the words in this field must be present for a successful search
  4. the WITHOUT field - if any of the words in this field are present, the search fails
* In the current implementation of CXSearch, double quotes (") do not have a special meaning in any of the fields.

CXSearch is based on code from Scot Brennecke's article A Multiple Substring Search Class.

CXSearch In Action

The demo program shows how CXSearch can be used to mimic a search engine-style advanced search:

In the screenshot, you can see the different word types highlighted with different colors as they are found in the text, and the match counts next to each field. Note that this search failed because the WITHOUT word WATERY was found (highlighted in light red). If you checked the Match case checkbox and ran the search again, it would succeed, because WATERY (upper case) would not be found.

Demo Program Options

The File to search field is the file that you want to search. This can be any text file; the demo program has been set up to search for certain words found in moby.txt.

The four search input fields have been discussed above. For a successful search, the words and phrase of the ALL and the EXACT PHRASE fields must be present, and at least one of the words in the AT LEAST ONE field must be present. Also, none of the words in the WITHOUT field can be present.

In the current implementation, there is no provision for special handling of the double quote (") character, which is used by most search engines to allow grouping of words into a single phrase (for the most part, this is a convenience feature, which saves the user from having to re-enter (or cut & paste) the ALL field into the EXACT PHRASE field, although there might be some marginal benefit when used in the AT LEAST ONE or WITHOUT fields).

After the four input fields, there are three options not normally provided by search engines:

  1. Match case - when selected, the case of the words in the four input fields must match the case of the search text.
  2. Whole words only - when selected, only whole words in the search text will be matched - i.e., only words preceded by and followed by non-word characters. A non-word character is anything other than letters (a-z), numerals (0-9), and the underscore character (_).
  3. First match in file - when selected, the search will terminate on the first match, regardless of the type of match. This may cause the search to fail, in the case of multiple ALL words. Typically, for simple searches, this option is used to improve performance, since the entire file (or whatever) is not searched. When this option is selected for the first time, the following warning is displayed:

After selecting any of the above options, press Search to see the effect of the new options.

Returning the Search Results

The demo program allows you to select the way the results are returned:

  • SendMessage - a message is sent each time a match is found
  • CPtrArray - matches are added to a CPtrArray that is passed by the caller.
From the user's perspective, there is no difference in the way the demo program operates.

Demo Program Implementation Notes

The functionality of XSearch is contained in just one class, CXSearch, which invokes Brennecke's class that I already mentioned. Aside from that, there is no special code or custom controls used in the demo program. The edit control used for displaying the highlighted search text is a standard RichEdit control. All the other controls are also plain vanilla.

To keep track of word matches, the following struct is used:

    struct XSEARCH_WORD
    {
        XSEARCH_WORD()
        {
            eWordType = ALL;
            strWord   = _T("");
            nCount    = 0;
            nCharPos  = 0;
        }

        WORD_TYPE   eWordType;      // type of match
        CString     strWord;        // word or phrase to match
        int         nCount;         // number of matches found
        UINT        nCharPos;       // char starting position (0 - N)
    };
where WORD_TYPE is defined as
    enum WORD_TYPE
    {
        ALL = 0,
        EXACT_PHRASE,
        AT_LEAST_ONE,
        WITHOUT
    };

CXSearch APIs

Here are some of the functions provided by CXSearch:
  • Constructor - Construct uninitialized CXSearch object. Before using the object, AddWord() must be called.
    ////////////////////////////////////////////////////////////////////////
    //
    // CXSearch()
    //
    // Purpose:     Construct CXSearch object
    //
    // Parameters:  None
    //
    // Returns:     None
    //
  • AddWord() - Add search word to one of four internal arrays
    ////////////////////////////////////////////////////////////////////////
    //
    // AddWord()
    //
    // Purpose:     Add search word to one of four internal arrays
    //
    // Parameters:  lpszWord  - address of word/phrase string
    //              eWordType - type of word to add
    //
    // Returns:     BOOL - TRUE = success
    //
    // Notes:       AddWord adds the lpszWord string pointer to one of four
    //              internal CPtrArray.
    //
  • AddWords() - Add search word(s) from delimited string
    ///////////////////////////////////////////////////////////////////////
    //
    // AddWords()
    //
    // Purpose:     Add search word(s) from delimited string
    //
    // Parameters:  lpszWord   - address of word/phrase string
    //              eWordType  - type of word to add
    //              lpszDelims - pointer to string that contains word 
    //                           delimiter characters
    //
    // Returns:     BOOL - TRUE = success
    //
    // Notes:       AddWords adds words from the lpszWord string via 
    //              AddWord()
    //
  • DoSearch() - Perform search in lpszbuffer for words added via AddWord()
    ////////////////////////////////////////////////////////////////////////
    //
    // DoSearch()
    //
    // Purpose:     Perform search in lpszbuffer for words added via AddWord
    //
    // Parameters:  lpszBuffer - address of buffer containing text to search
    //              pWnd       - handle to window that will receive match
    //                           notification
    //              pArray     - address of CPtrArray that matched word 
    //                           will be added to (a XSEARCH_WORD pointer)
    //
    // Returns:     BOOL - TRUE = success
    //
    // Notes:       Either pWnd or pArray may be NULL, but not both. If both
    //              are specified, only pArray will be used to return 
    //              matches.
    //
In addition to the above, the functions SetMatchCase(), SetWholeWords(), and SetFirstMatch() are available to set the search criteria.

How To Use

To integrate CXSearch class into your app, you first need to add following files to your project:

  • XSearch.cpp
  • XSearch.h
  • XStringSet.cpp
  • XStringSet.h

For details on how to use CXSearch object, refer to the code in XSearchTestDlg.cpp.

Future Work

  • add support for double quotes (")
  • implement Unicode compatibility
  • remove dependence on MFC

Acknowledgments

Revision History

Version 1.0 - 2005 July 26

  • Initial public release

Usage

This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Hans Dietrich
Software Developer (Senior) Hans Dietrich Software
United States United States
Member
I attended St. Michael's College of the University of Toronto, with the intention of becoming a priest. A friend in the University's Computer Science Department got me interested in programming, and I have been hooked ever since.
 
Recently, I have moved to Los Angeles where I am doing consulting and development work.
 
For consulting and custom software development, please see www.hdsoft.org.






Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralA limitationmemberwindflying10 May '11 - 13:53 
Hi,Hans. I find it doesn't work if I want to search "Little Lit". It only can find "Lit". Can it find all matches in one traverse of the text.
Generalc# codemembermehdizadeh300016 Nov '10 - 5:25 
please if any one has c# code email it to me
my email :mehdizadeh3000@gmail.com
thank you for helping me
mehdizadeh
bye
Questionvery nice a app, Is there one with Unicode Compatibility?memberneoosk28 May '07 - 18:23 
very nice a app, Is there one with Unicode Compatibility?

GeneralC#memberRaffee31 Jan '07 - 2:14 
If anyone has a C# version of this please let me know.
 
raffee.parseghian@gmail.com
QuestionHow To Search In A Large Text FilememberXiaoYu22 Oct '06 - 17:28 
Hi Dear Hans Dietrich,
I can not open and search strings in a large text file,276395 lines,57.1MB.
This large file also can not be opened with
Notepad.exe, but opend by EditPlus.exe.
Would you please give me some advice?Or I will try to open the large file in Binary mode?
Thanks a million.
 
***
We are making progress everyday.

Generaldoubtmemberbal@25 Sep '06 - 5:38 
u r article is very nice and nifty.
i want detailed about search engine algorithm.plz help me

QuestionHave you got a C# version?memberwuxsh14 Aug '05 - 17:15 
I found source code is C++ version, Have you got a C# version?
Generalsimple optimizationmemberJoey Bloggs28 Jul '05 - 23:51 
Do the without any search first and skip the rest if a match is found. Easy to do ?

GeneralRe: simple optimizationmemberHans Dietrich29 Jul '05 - 8:14 
That sounds like a good idea. I will try to work on this next week.
 
Best wishes,
Hans

GeneralDidn't find match.memberWREY27 Jul '05 - 0:12 
In the "exact phrase" window, I entered, "it is" (without the double quotes) and left "Match case" unchecked.
 
It came back showing '0' as the amount of times it found the selection. Yet I counted at least three times, "it is" is included in the text file.
 
Confused | :confused:
 
William
 
Fortes in fide et opere!
GeneralRe: Didn't find match.memberbevpet27 Jul '05 - 6:34 
i tried what u said, got 7 hits for 'it is'
GeneralRe: Didn't find match.memberHans Dietrich27 Jul '05 - 6:43 
Like bevpet, I also got 7 matches. I was able to reproduce what you saw by checking the Whole Words checkbox, and adding a space at the end of "it is ". Otherwise I could not reproduce what you saw.
 
I think there should be some whitespace trimming of the words. Thanks for pointing this out.

GeneralRe: Didn't find match.memberJames R. Twine24 May '07 - 2:19 
   FWIW - I would suggest that you not do that.  There are legimiate reasons for wanting to search for trailing (or leading) whitespace.
 
   Peace!
 
-=- James
Please rate this message - let me know if I helped or not!
If you think it costs a lot to do it right, just wait until you find out how much it costs to do it wrong!
Avoid driving a vehicle taller than you and remember that Professional Driver on Closed Course does not mean your Dumb Ass on a Public Road!
See DeleteFXPFiles

Generalnicememberbevpet26 Jul '05 - 8:05 
excellent piece of work.
 
Peter

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130516.1 | Last Updated 16 May 2007
Article Copyright 2005 by Hans Dietrich
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid