Click here to Skip to main content
12,454,477 members (52,213 online)
Click here to Skip to main content
Add your own
alternative version

Stats

64.1K views
440 downloads
13 bookmarked
Posted

Fast Textual Data Processing using Hashed Strings

, 6 Mar 2004 GPL3
Rate this:
Please Sign up or sign in to vote.
Managing a textual numbering system for hashing words into numbers

Introduction

Strings are considered heavy nesting loop generators. Comparisons, searching, and sorting strings are considered time consuming operations. Indexing textual data is a good step, however setting searching key as string yields bad performance for search algorithms.

In this article we introduce an ADT abstracting strings with hash-codes. Comparisons will no more be done with byte comparisons. Each words in the alphabet will be assigned an invertible hash-code.

Hashing Function

Hashing functions aim to represent a collection of data in a single code. Codes may overlap, so a good hash function aims to minimise overlapping between codes. Systems using hashing should also manage overflow problems. What we represent here could be called as hashing. However, it is invertible and assures uniquness. Actually, it manages a numbering system for textual data.

Consider the word "baby", calculating hash-code is simply as converting from binary to decimal numbering systems. In this case, we convert from 26th numbring system to decimal, where 26 is the number of english alphbet characters.

("baba")26=0x260+1x261+0x262+1x263=(17626)10

("babb")26=1x260+1x261+0x262+1x263=(17627)10

This means that "baba" is less than "babb". Comparison is done with numbers. No more nesting loops is needed for comparisons.

Class hashedstring

class hashedstring

{

public:
    hashedstring():_pdata(0), _hashkey(0), _size(0){}
    hashedstring(const hashedstring& hstr):_pdata(strdup(hstr._pdata)), _
        hashkey(hstr._hashkey), _size(hstr._size){}
    hashedstring(const char* pdata):_pdata(strdup(pdata)), 
        _hashkey(0), _size(strlen(pdata))
    {
        _computehashkey();
    }
   
    hashedstring operator+(const hashedstring& s) const 
    {
        hashedstring rtn=*this;
        strcat(rtn._pdata, s._pdata);
        rtn._hashkey+=_hashkey*pow(26, s._size);
        rtn._size+=s._size;
        return rtn;
    }

 
    hashedstring operator+=(const hashedstring& s)
    { 
        strcat(_pdata, s._pdata);
        _hashkey=_hashkey*pow(26, s._size)+s._hashkey;
        _size+=s._size;
        return *this;
    }
    
    hashedstring operator=(const hashedstring& s) 
    {
        clear();
        _pdata=strdup(s._pdata);
        _hashkey+=s._hashkey;
        _size=s._size;
        return *this;
    }

 
    void clear()
    {
        if(_pdata)
            delete []_pdata;
        _pdata=0;
        _size=0;
        _hashkey=0;
    }

 
    ~hashedstring(){}
    
    operator const double() const {return _hashkey;}
    
    
protected:
    

    double _computehashkey()
    {
        unsigned int i=0;
        _hashkey=0;
        for(i=_size-1; i!=-1; i--)
            _hashkey+=(tolower(_pdata[i])-'a')*pow(26, _size-1-i);
        return _hashkey;
    }
    
    char* _pdata;
    double _hashkey;
    unsigned int _size;

};

Conclusions

Maintaining alphabet numbering system provides faster string processing. Calculating hash codes, an offline process, should be added in indexing huge textual data. Then the system will enjoy fast comparisons, search, and sort routines.

hashedstring s1("baby");
hashedstring s2("babybaba");
hashedstring s3("babybabb");

vprint(s1);

s1+="baba";

vprint(s1);
vprint(s2);
vprint(s3);

vprint(s1<s2);
vprint(s2<s3);
vprint(s2==s3);

vprint(s1==s2);

Results

s1= 17626
s1= 8054676578
s2= 8054676578
s3= 8054676579
s1<s2= 0
s2<s3= 1
s2==s3= 0
s1==s2= 1

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)

Share

About the Author

cMoXIV
Engineer
Australia Australia
No Biography provided

You may also be interested in...

Pro
Pro

Comments and Discussions

 
GeneralVERY Very bad article Pin
SB200329-Dec-09 5:07
memberSB200329-Dec-09 5:07 
GeneralRe: VERY Very bad article Pin
Mohammed Hossny28-Sep-10 15:49
memberMohammed Hossny28-Sep-10 15:49 
GeneralTurkmenistana = Turkmenistanb Pin
Zegee12-Jul-07 1:05
memberZegee12-Jul-07 1:05 
QuestionWhere can I find a working implementation? Pin
Robert F.23-Sep-04 0:13
sussRobert F.23-Sep-04 0:13 
AnswerRe: Where can I find a working implementation? Pin
Mohammed Hossny23-Sep-04 0:22
memberMohammed Hossny23-Sep-04 0:22 
GeneralRe: Where can I find a working implementation? Pin
Robert F23-Sep-04 0:38
sussRobert F23-Sep-04 0:38 
GeneralRe: Where can I find a working implementation? Pin
Mohammed Hossny23-Sep-04 1:15
memberMohammed Hossny23-Sep-04 1:15 
GeneralRe: Where can I find a working implementation? Pin
Robert F.23-Sep-04 3:23
sussRobert F.23-Sep-04 3:23 
GeneralRe: Where can I find a working implementation? Pin
Mohammed Hossny23-Sep-04 12:16
memberMohammed Hossny23-Sep-04 12:16 
GeneralOptimizator : "Does it speaks to you ?" Pin
Kochise18-Mar-04 21:20
memberKochise18-Mar-04 21:20 
GeneralRe: Optimizator : &quot;Does it speaks to you ?&quot; Pin
Mohammed Hossny20-Mar-04 20:56
memberMohammed Hossny20-Mar-04 20:56 
GeneralYes, that's it, hashing at compile time... Pin
Kochise21-Mar-04 4:32
memberKochise21-Mar-04 4:32 
Generalaaa = aa = a = 0 Pin
Chris Roberts8-Mar-04 7:49
memberChris Roberts8-Mar-04 7:49 
GeneralRe: aaa = aa = a = 0 Pin
Mohammed Hossny8-Mar-04 23:27
memberMohammed Hossny8-Mar-04 23:27 
QuestionCould you fix the horizontal scrolling annoyance? Pin
WREY3-Mar-04 10:35
memberWREY3-Mar-04 10:35 
AnswerRe: Could you fix the horizontal scrolling annoyance? Pin
Mohammed Hossny7-Mar-04 11:45
memberMohammed Hossny7-Mar-04 11:45 
GeneralRe: Could you fix the horizontal scrolling annoyance? Pin
jongman7-Mar-04 17:09
memberjongman7-Mar-04 17:09 
GeneralNice, but... it does not work. Pin
anickol2-Mar-04 21:57
memberanickol2-Mar-04 21:57 
GeneralRe: Nice, but... it does not work. Pin
Jon2-Mar-04 23:02
memberJon2-Mar-04 23:02 
GeneralRe: Nice, but... it does not work. Pin
Mohammed Hossny5-Mar-04 9:40
memberMohammed Hossny5-Mar-04 9:40 
QuestionCollision space? Pin
sdoyle2-Mar-04 6:25
membersdoyle2-Mar-04 6:25 
AnswerRe: Collision space? Pin
Mohammed Hossny2-Mar-04 7:20
memberMohammed Hossny2-Mar-04 7:20 
GeneralNice concept... not practical.. Pin
Peter Mares2-Mar-04 2:53
memberPeter Mares2-Mar-04 2:53 
GeneralRe: Nice concept... not practical.. Pin
Mohammed Hossny2-Mar-04 3:04
memberMohammed Hossny2-Mar-04 3:04 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160826.1 | Last Updated 7 Mar 2004
Article Copyright 2004 by cMoXIV
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid