XSearch - a class that implements a search engine-style advanced search






4.85/5 (29 votes)
XSearch implements a search engine-style advanced search, including ALL, EXACT PHRASE, AT LEAST ONE, and WITHOUT words. XSearch is based on a multiple-substring search algorithm.
Introduction
CXSearch
encapsulates a class that implements a search engine-style advanced search - for example, the Google search engine:
- the ALL field - all words in this field must be present for a successful search
- the EXACT PHRASE field - this field contains a single phrase* (one or more words); the phrase must be present for a successful search
- the AT LEAST ONE field - at least one of the words in this field must be present for a successful search
- the WITHOUT field - if any of the words in this field are present, the search fails
CXSearch
, double quotes (") do not have a special meaning in any of the fields. CXSearch
is based on code from Scot Brennecke's article A Multiple Substring Search Class.
CXSearch In Action
The demo program shows howCXSearch
can be used to mimic a search engine-style advanced search:
Demo Program Options
The File to search field is the file that you want to search. This can be any text file; the demo program has been set up to search for certain words found in moby.txt.
The four search input fields have been discussed above. For a successful search, the words and phrase of the ALL and the EXACT PHRASE fields must be present, and at least one of the words in the AT LEAST ONE field must be present. Also, none of the words in the WITHOUT field can be present.
In the current implementation, there is no provision for special handling of the double quote (") character, which is used by most search engines to allow grouping of words into a single phrase (for the most part, this is a convenience feature, which saves the user from having to re-enter (or cut & paste) the ALL field into the EXACT PHRASE field, although there might be some marginal benefit when used in the AT LEAST ONE or WITHOUT fields).
After the four input fields, there are three options not normally provided by search engines:
- Match case - when selected, the case of the words in the four input fields must match the case of the search text.
- Whole words only - when selected, only whole words in the search text will be matched - i.e., only words preceded by and followed by non-word characters. A non-word character is anything other than letters (a-z), numerals (0-9), and the underscore character (_).
- First match in file - when selected, the search will terminate on the first match, regardless of the type of match. This may cause the search to fail, in the case of multiple ALL words. Typically, for simple searches, this option is used to improve performance, since the entire file (or whatever) is not searched. When this option is selected for the first time, the following warning is displayed:
Returning the Search Results
The demo program allows you to select the way the results are returned:
- SendMessage - a message is sent each time a match is found
- CPtrArray - matches are added to a CPtrArray that is passed by the caller.
Demo Program Implementation Notes
The functionality of XSearch is contained in just one class,CXSearch
, which invokes Brennecke's class that I already mentioned. Aside from that, there is no special code or custom controls used in the demo program. The edit control used for displaying the highlighted search text is a standard RichEdit control. All the other controls are also plain vanilla.
To keep track of word matches, the following struct is used:
struct XSEARCH_WORD { XSEARCH_WORD() { eWordType = ALL; strWord = _T(""); nCount = 0; nCharPos = 0; } WORD_TYPE eWordType; // type of match CString strWord; // word or phrase to match int nCount; // number of matches found UINT nCharPos; // char starting position (0 - N) };where
WORD_TYPE
is defined as
enum WORD_TYPE { ALL = 0, EXACT_PHRASE, AT_LEAST_ONE, WITHOUT };
CXSearch APIs
Here are some of the functions provided byCXSearch
:
- Constructor - Construct uninitialized
CXSearch
object. Before using the object,AddWord()
must be called.//////////////////////////////////////////////////////////////////////// // // CXSearch() // // Purpose: Construct CXSearch object // // Parameters: None // // Returns: None //
- AddWord() - Add search word to one of four internal arrays
//////////////////////////////////////////////////////////////////////// // // AddWord() // // Purpose: Add search word to one of four internal arrays // // Parameters: lpszWord - address of word/phrase string // eWordType - type of word to add // // Returns: BOOL - TRUE = success // // Notes: AddWord adds the lpszWord string pointer to one of four // internal CPtrArray. //
- AddWords() - Add search word(s) from delimited string
/////////////////////////////////////////////////////////////////////// // // AddWords() // // Purpose: Add search word(s) from delimited string // // Parameters: lpszWord - address of word/phrase string // eWordType - type of word to add // lpszDelims - pointer to string that contains word // delimiter characters // // Returns: BOOL - TRUE = success // // Notes: AddWords adds words from the lpszWord string via // AddWord() //
- DoSearch() - Perform search in
lpszbuffer
for words added viaAddWord()
//////////////////////////////////////////////////////////////////////// // // DoSearch() // // Purpose: Perform search in lpszbuffer for words added via AddWord // // Parameters: lpszBuffer - address of buffer containing text to search // pWnd - handle to window that will receive match // notification // pArray - address of CPtrArray that matched word // will be added to (a XSEARCH_WORD pointer) // // Returns: BOOL - TRUE = success // // Notes: Either pWnd or pArray may be NULL, but not both. If both // are specified, only pArray will be used to return // matches. //
SetMatchCase()
, SetWholeWords()
, and SetFirstMatch()
are available to set the search criteria.
How To Use
To integrate CXSearch
class into your app, you first need to add following files to your project:
- XSearch.cpp
- XSearch.h
- XStringSet.cpp
- XStringSet.h
For details on how to use CXSearch
object, refer to the code in XSearchTestDlg.cpp.
Future Work
- add support for double quotes (")
- implement Unicode compatibility
- remove dependence on MFC
Acknowledgments
- Thanks to Scot Brennecke for A Multiple Substring Search Class
Revision History
Version 1.0 - 2005 July 26
- Initial public release
Usage
This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.