Click here to Skip to main content
15,896,414 members
Articles / Programming Languages / C++

Implementing a std::map Replacement that Never Runs Out of Memory and Instructions on Producing an ARPA Compliant Language Model to Test the Implementation

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
14 Dec 2008CPOL9 min read 54K   276   36  
An article on improving STL containers to cache to disk in order to lift memory limitation issues.
#ifndef __TOKENIZER_H__
#define __TOKENIZER_H__

#include <istream>

using namespace std;

// CTokenizer is a base class defining how tokenization should proceed.

class CTokenizer
{

public:

	// Constructor:

	// REQUIREMENTS:
	// An istream successfully opened for reading.
	// PROMISES:
	// The object will be ready to return tokens with GetNextToken if HasMoreToken return true.

	CTokenizer(istream &inputStream) throw();

	// Destructor:

	// REQUIREMENTS:
	// None.
	// PROMISES:
	// None.

	virtual ~CTokenizer() throw();

	// GetNextToken():

	// REQUIREMENTS:
	// HasMoreToken() must have returned true for this call to return an actual token, otherwise, it returns an empty string.
	// PROMISES:
	// The next token from the input stream.

	virtual string GetNextToken() throw()  = 0;

	// HasMoreToken():

	// REQUIREMENTS:
	// None.
	// PROMISES:
	// If the return value is true, GetNextToken() will return a token, otherwise, no more tokens are available from the input stream.

	virtual bool HasMoreToken() throw() = 0;

protected:

	istream &m_stream;
};

#endif

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Canada Canada
Philippe Roy was a key contributor throughout his 20+ years career with many high-profile companies such as Nuance Communications, IBM (ViaVoice and ProductManager), VoiceBox Technologies, just to name a few. He is creative and proficient in OO coding and design, knowledgeable about the intellectual-property world (he owns many patents), tri-lingual, and passionate about being part of a team that creates great solutions.

Oh yes, I almost forgot to mention, he has a special thing for speech recognition and natural language processing... The magic of first seeing a computer transform something as chaotic as sound and natural language into intelligible and useful output has never left him.

Comments and Discussions