12,454,477 members (52,213 online)
alternative version

64.1K views
13 bookmarked
Posted

Fast Textual Data Processing using Hashed Strings

, 6 Mar 2004 GPL3
 Rate this:
Managing a textual numbering system for hashing words into numbers

Introduction

Strings are considered heavy nesting loop generators. Comparisons, searching, and sorting strings are considered time consuming operations. Indexing textual data is a good step, however setting searching key as string yields bad performance for search algorithms.

In this article we introduce an ADT abstracting strings with hash-codes. Comparisons will no more be done with byte comparisons. Each words in the alphabet will be assigned an invertible hash-code.

Hashing Function

Hashing functions aim to represent a collection of data in a single code. Codes may overlap, so a good hash function aims to minimise overlapping between codes. Systems using hashing should also manage overflow problems. What we represent here could be called as hashing. However, it is invertible and assures uniquness. Actually, it manages a numbering system for textual data.

Consider the word "baby", calculating hash-code is simply as converting from binary to decimal numbering systems. In this case, we convert from 26th numbring system to decimal, where 26 is the number of english alphbet characters.

("baba")26=0x260+1x261+0x262+1x263=(17626)10

("babb")26=1x260+1x261+0x262+1x263=(17627)10

This means that "baba" is less than "babb". Comparison is done with numbers. No more nesting loops is needed for comparisons.

Class hashedstring

```class hashedstring

{

public:
hashedstring():_pdata(0), _hashkey(0), _size(0){}
hashedstring(const hashedstring& hstr):_pdata(strdup(hstr._pdata)), _
hashkey(hstr._hashkey), _size(hstr._size){}
hashedstring(const char* pdata):_pdata(strdup(pdata)),
_hashkey(0), _size(strlen(pdata))
{
_computehashkey();
}

hashedstring operator+(const hashedstring& s) const
{
hashedstring rtn=*this;
strcat(rtn._pdata, s._pdata);
rtn._hashkey+=_hashkey*pow(26, s._size);
rtn._size+=s._size;
return rtn;
}

hashedstring operator+=(const hashedstring& s)
{
strcat(_pdata, s._pdata);
_hashkey=_hashkey*pow(26, s._size)+s._hashkey;
_size+=s._size;
return *this;
}

hashedstring operator=(const hashedstring& s)
{
clear();
_pdata=strdup(s._pdata);
_hashkey+=s._hashkey;
_size=s._size;
return *this;
}

void clear()
{
if(_pdata)
delete []_pdata;
_pdata=0;
_size=0;
_hashkey=0;
}

~hashedstring(){}

operator const double() const {return _hashkey;}

protected:

double _computehashkey()
{
unsigned int i=0;
_hashkey=0;
for(i=_size-1; i!=-1; i--)
_hashkey+=(tolower(_pdata[i])-'a')*pow(26, _size-1-i);
return _hashkey;
}

char* _pdata;
double _hashkey;
unsigned int _size;

};```

Conclusions

Maintaining alphabet numbering system provides faster string processing. Calculating hash codes, an offline process, should be added in indexing huge textual data. Then the system will enjoy fast comparisons, search, and sort routines.

```hashedstring s1("baby");
hashedstring s2("babybaba");
hashedstring s3("babybabb");

vprint(s1);

s1+="baba";

vprint(s1);
vprint(s2);
vprint(s3);

vprint(s1<s2);
vprint(s2<s3);
vprint(s2==s3);

vprint(s1==s2);
```

Results

```s1= 17626
s1= 8054676578
s2= 8054676578
s3= 8054676579
s1<s2= 0
s2<s3= 1
s2==s3= 0
s1==s2= 1
```

Share

 Engineer Australia
No Biography provided

You may also be interested in...

 Pro Pro

 First Prev Next
 VERY Very bad article SB200329-Dec-09 5:07 SB2003 29-Dec-09 5:07
 Re: VERY Very bad article Mohammed Hossny28-Sep-10 15:49 Mohammed Hossny 28-Sep-10 15:49
 Turkmenistana = Turkmenistanb Zegee12-Jul-07 1:05 Zegee 12-Jul-07 1:05
 Where can I find a working implementation? Robert F.23-Sep-04 0:13 Robert F. 23-Sep-04 0:13
 Re: Where can I find a working implementation? Mohammed Hossny23-Sep-04 0:22 Mohammed Hossny 23-Sep-04 0:22
 Re: Where can I find a working implementation? Robert F23-Sep-04 0:38 Robert F 23-Sep-04 0:38
 Re: Where can I find a working implementation? Mohammed Hossny23-Sep-04 1:15 Mohammed Hossny 23-Sep-04 1:15
 Re: Where can I find a working implementation? Robert F.23-Sep-04 3:23 Robert F. 23-Sep-04 3:23
 Re: Where can I find a working implementation? Mohammed Hossny23-Sep-04 12:16 Mohammed Hossny 23-Sep-04 12:16
 Optimizator : "Does it speaks to you ?" Kochise18-Mar-04 21:20 Kochise 18-Mar-04 21:20
 Re: Optimizator : "Does it speaks to you ?" Mohammed Hossny20-Mar-04 20:56 Mohammed Hossny 20-Mar-04 20:56
 Yes, that's it, hashing at compile time... Kochise21-Mar-04 4:32 Kochise 21-Mar-04 4:32
 aaa = aa = a = 0 Chris Roberts8-Mar-04 7:49 Chris Roberts 8-Mar-04 7:49
 Re: aaa = aa = a = 0 Mohammed Hossny8-Mar-04 23:27 Mohammed Hossny 8-Mar-04 23:27
 Could you fix the horizontal scrolling annoyance? WREY3-Mar-04 10:35 WREY 3-Mar-04 10:35
 Re: Could you fix the horizontal scrolling annoyance? Mohammed Hossny7-Mar-04 11:45 Mohammed Hossny 7-Mar-04 11:45
 Re: Could you fix the horizontal scrolling annoyance? jongman7-Mar-04 17:09 jongman 7-Mar-04 17:09
 Nice, but... it does not work. anickol2-Mar-04 21:57 anickol 2-Mar-04 21:57
 Re: Nice, but... it does not work. Jon2-Mar-04 23:02 Jon 2-Mar-04 23:02
 Re: Nice, but... it does not work. Mohammed Hossny5-Mar-04 9:40 Mohammed Hossny 5-Mar-04 9:40
 Collision space? sdoyle2-Mar-04 6:25 sdoyle 2-Mar-04 6:25
 Re: Collision space? Mohammed Hossny2-Mar-04 7:20 Mohammed Hossny 2-Mar-04 7:20
 Nice concept... not practical.. Peter Mares2-Mar-04 2:53 Peter Mares 2-Mar-04 2:53
 Re: Nice concept... not practical.. Mohammed Hossny2-Mar-04 3:04 Mohammed Hossny 2-Mar-04 3:04
 Last Visit: 31-Dec-99 18:00     Last Update: 29-Aug-16 18:31 Refresh 1

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Web01 | 2.8.160826.1 | Last Updated 7 Mar 2004