Hash algorithm question

Question

5.00/5 (2 votes)

See more:

I should add for clarity that strings are guaranteed to be unique.I use a classic djb2 hash function - see
http://www.cse.yorku.ca/~oz/hash.html

UINT32 gHashCh(BYTE *str)
{
	UINT32 hash = 5381;
	int c;

	while (c = *str++)
	{
		hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
	}

	return hash;
}

Question #1. How do I calculate (mathematically) the probability that the hash will NOT be unique if calculated on a number of strings? I know the worst case length (16 charactes) and the maximum possible number of strings (lets say, 1000)
Question #2. Can someone give me an example of 2 different strings that produce an identical hash given algorithm above.

Posted 10-Nov-11 12:21pm

michaelmel

Updated 10-Nov-11 17:43pm

Mohibur Rashid

v4

Add a Solution

Comments

Sergey Alexandrovich Kryukov 10-Nov-11 18:52pm

You are not too trusting! Got my 5 for the question.
--SA

michaelmel 10-Nov-11 18:54pm

Well, I do work in life safety related industry :)

1 solution

Add a Solution

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Peter_in_2780 · Accepted Answer · 2011-11-10T13:29:00

A1. An "ideal" hash will spread the strings uniform-randomly over the hash space (2^32). If you have 1000 strings, then your probability of collision in that case is approximately 1 - exp[(- 1000 * 999)/(2 * 2^32)] which is small (about 1.16e-4), but may not be negligible in your case. Your chosen hash may perform as well as that, or a bit worse, or a lot worse, depending on your input space.

A2. You can do this pretty quickly. From the reference below, if you pick a bit over 77000 strings, you will have a better than 50% probability of finding a collision. It'll take you a few minutes to write the code, and probably less than a second for it to find a collision.

[ref]http://en.wikipedia.org/wiki/Birthday_paradox[^]

Cheers,
Peter

[edit]inserted missing left bracket in formula[/edit]

Hash algorithm question

1 solution

Solution 1

Add your solution here

Preview 0

Hash algorithm question

1 solution

Solution 1

Add your solution here

Preview 0

Existing Members

...or Join us