Click here to Skip to main content
Click here to Skip to main content
Add your own
alternative version

A Multiple Substring Search Class: CIVStringSet

, 25 Apr 2010 CPOL
A string array class using MFC or STL that performs very fast multiple string searches
civstringset.zip
Write Triangle
Test
QueryBuilder
civstringset_demo.zip
StringSetSample
res
StringSetSample.ico
CIVStringSet_MFC_Source.zip
civstringset_source.zip
IVCode
IVStringSet
MFC
STL
CIVStringSet_STL_Source.zip
// StringSet.H: interface for the CIVStringSet class.
//
// Written 30 June 2002 by Scot T Brennecke
// Thanks to Moishe Halibard and Moshe Rubin for their article,
//    "A Multiple Substring Search Algorithm" in the June 2002
//    edition of C/C++ Users Journal.  This class is based on
//    the algorthim therein described, but extended to return
//    all strings and use MFC classes.

#pragma once

#pragma warning(disable: 4100 4786)
#include <vector>
#include <string>
#include <list>

class CIVStringSet : public std::vector<std::string>
{
    public:
        CIVStringSet( WORD wInitialWidth = 64 ) ;  // Initial width of FSM
        virtual ~CIVStringSet() ;

        bool        Add( LPCTSTR pszWord ) ;                     // a single word
        bool        Add( const std::string & rstrWord ) ;        // a single word
        int         Add( LPCTSTR pszWords, LPCTSTR pszDelims ) ; // multiple words, delimited with chars from pszDelims
        int         Add( LPCTSTR pszzWords, int nWords ) ;       // nWords words, each 0 term'd, with extra 0 at end
        int         Add( std::vector<std::string> astrWords ) ;  // all the elements of an array of strings
        int         Add( std::list<std::string> lstrWords ) ;    // all the elements of a list of strings
                    
        UINT        FindFirstIn( std::string strText, int & rnFirst ) ; // Begin iteration
        UINT        FindNext( int & rnNext ) ;                          // Continue interation

        typedef std::pair<int,UINT>              CWordPosPair ;     // first is index of word in array, second is position in text
        typedef std::list< std::pair<int,UINT> > CWordPosPairList ; // list of pairs to be returned by FindAllIn
        size_t      FindAllIn( std::string strText, CWordPosPairList & rlstrWords ) ; // Iterate all at once and make list
        
    protected:
        DWORD    (* m_apnFSM)[128] ;   // Finite State Machine. Array of 128 char arrays
        size_t      m_nCurDim ;        // Dimension of allocated width of FSM
        size_t      m_nUsedCols ;      // Used portion of allocated width
        WORD        m_wMaxUsedState ;  // largest state value used
        std::string m_strSearch ;      // Current search string
        UINT        m_nCurTextChar ;   // Current position in search string

        bool        InsertWord( LPCTSTR pszWord, WORD wIndex ) ; // put the new word into the FSM for given index
        bool        SetColDim( size_t nNewDim ) ;                // set the current width to at least nNewDim columns
} ;

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Scot Brennecke
Software Developer (Senior) Microsoft
United States United States
Scot is an Escalation Engineer for the Microsoft Developer Support Languages team. He helps software developers who are Microsoft customers find bugs in their own, or Microsoft's, code.
 
Scot spends most of his time writing, reading, or thinking about C++ software, thereby classifying him as a geek.

| Advertise | Privacy | Mobile
Web02 | 2.8.141022.1 | Last Updated 25 Apr 2010
Article Copyright 2002 by Scot Brennecke
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid