12,452,083 members (63,027 online)
Article
Add your own
alternative version

123.3K views
2.7K downloads
68 bookmarked
Posted

# Fastest strstr-like function in C!?

, 9 Sep 2011 CPOL
 Rate this:
Please Sign up or sign in to vote.
Tuned function for searching a needle in a haystack.
This is an old version of the currently published article.

## Introduction

The C strstr-like function presented below (written by Georgi 'Kaze') is named Railgun_Quadruplet_6ppp and is based on well-known Boyer-Moore-Horspool and simplified Karp-Rabin algorithms. The main goal: to achieve fastest execution for all kind of pattern/needle and string/haystack lengths (starting from 2 bytes).

## Background

Function homepage:
http://www.sanmayce.com/Railgun/index.html

Very informative resource on strstr search techniques at:
http://www-igm.univ-mlv.fr/~lecroq/string/

#### Tests done with 16 patterns with 197MB haystack (being a pure English book-like text)

The machine is a mainstream laptop on 2.1GHz with Intel Merom CPU.

## Using the Code

The code is pretty simple and straightforward.

```// Here the strstr-like function Railgun_Quadruplet_6ppp is given,  written by Georgi 'Kaze', 2011-Sep-07.
//
// ### Mix(2in1) of Karp-Rabin & Boyer-Moore-Horspool algorithm [
// Caution: For better speed the case 'if (cbPattern==1)' was removed, so Pattern must be longer than 1 char.
char * Railgun_Quadruplet_6ppp (char * pbTarget,
char * pbPattern,
unsigned long cbTarget,
unsigned long cbPattern)
{
char * pbTargetMax = pbTarget + cbTarget;
register unsigned long  ulHashPattern;
unsigned long ulHashTarget;
//unsigned long count; //r.6+
signed long count;
//unsigned long countSTATIC; //r.6+
signed long countSTATIC;
//  unsigned long countRemainder;

/*
const unsigned char SINGLET = *(char *)(pbPattern);
const unsigned long Quadruplet2nd = SINGLET<<8;
const unsigned long Quadruplet3rd = SINGLET<<16;
const unsigned long Quadruplet4th = SINGLET<<24;
*/
unsigned char SINGLET;
unsigned long Quadruplet2nd;
unsigned long Quadruplet3rd;
unsigned long Quadruplet4th;

unsigned long  AdvanceHopperGrass;

long i; //BMH needed
int a, j, bm_bc[ASIZE]; //BMH needed
unsigned char ch; //BMH needed
//    unsigned char lastch, firstch; //BMH needed

if (cbPattern > cbTarget)
return(NULL);

// Doesn't work when cbPattern = 1
// The next IF-fragment works very well with cbPattern>1, OBVIOUSLY IT MUST BE UNROLLED(but crippled with less functionality) SINCE either cbPattern=2 or cbPattern=3!
if ( cbPattern<4) { // This IF makes me unhappy: it slows down from 390KB/clock to 367KB/clock for 'fast' pattern. This fragment(for 2..3 pattern lengths) is needed because I need a function different than strchr but sticking to strstr i.e. lengths above 1 are to be handled.
pbTarget = pbTarget+cbPattern;
ulHashPattern = ( (*(char *)(pbPattern))<<8 ) + *(pbPattern+(cbPattern-1));
//        countSTATIC = cbPattern-2;

if ( cbPattern==3) {
for ( ;; )
{
if ( ulHashPattern == ( (*(char *)(pbTarget-3))<<8 ) + *(pbTarget-1) ) {
if ( *(char *)(pbPattern+1) == *(char *)(pbTarget-2) ) return((pbTarget-3));
}
if ( (char)(ulHashPattern>>8) != *(pbTarget-2) ) pbTarget++;
pbTarget++;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else {
}
for ( ;; )
{
// The line below gives for 'cbPattern'>=1:
// Karp_Rabin_Kaze_4_OCTETS_hits/Karp_Rabin_Kaze_4_OCTETS_clocks: 4/543
// Karp_Rabin_Kaze_4_OCTETS performance: 372KB/clock
/*
if ( (ulHashPattern == ( (*(char *)(pbTarget-cbPattern))<<8 ) + *(pbTarget-1)) && !memcmp(pbPattern, pbTarget-cbPattern, (unsigned int)cbPattern) )
return((long)(pbTarget-cbPattern));
*/

// The fragment below gives for 'cbPattern'>=2:
// Karp_Rabin_Kaze_4_OCTETS_hits/Karp_Rabin_Kaze_4_OCTETS_clocks: 4/546
// Karp_Rabin_Kaze_4_OCTETS performance: 370KB/clock

/*
//For 2 and 3 [
if ( ulHashPattern == ( (*(char *)(pbTarget-cbPattern))<<8 ) + *(pbTarget-1) ) {
//         count = countSTATIC;
count = cbPattern-2;
//         while ( count && *(char *)(pbPattern+1+(countSTATIC-count)) == *(char *)(pbTarget-cbPattern+1+(countSTATIC-count)) ) {
while ( count && *(char *)(pbPattern+1) == *(char *)(pbTarget-2) ) { // Crippling i.e. only 2 and 3 chars are allowed!
count--;
}
if ( count == 0) return((pbTarget-cbPattern));
}
if ( (char)(ulHashPattern>>8) != *(pbTarget-cbPattern+1) ) pbTarget++;
//For 2 and 3 ]
*/

if ( ulHashPattern == ( (*(char *)(pbTarget-2))<<8 ) + *(pbTarget-1) )
return((pbTarget-2));
if ( (char)(ulHashPattern>>8) != *(pbTarget-1) ) pbTarget++;

// The fragment below gives for 'cbPattern'>=2:
// Karp_Rabin_Kaze_4_OCTETS_hits/Karp_Rabin_Kaze_4_OCTETS_clocks: 4/554
// Karp_Rabin_Kaze_4_OCTETS performance: 364KB/clock
/*
if ( ulHashPattern == ( (*(char *)(pbTarget-cbPattern))<<8 ) + *(pbTarget-1) ) {
count = countSTATIC>>2;
countRemainder = countSTATIC % 4;

while ( count && *(unsigned long *)(pbPattern+1+((count-1)<<2)) == *(unsigned long *)(pbTarget-cbPattern+1+((count-1)<<2)) ) {
count--;
}
//if (count == 0) {  // Disastrous degradation only from this line(317KB/clock when 1+2x4+2+1 bytes pattern: 'skillessness'; 312KB/clock when 1+1x4+2+1 bytes pattern: 'underdog'), otherwise 368KB/clock.
while ( countRemainder && *(char *)(pbPattern+1+(countSTATIC-countRemainder)) == *(char *)(pbTarget-cbPattern+1+(countSTATIC-countRemainder)) ) {
countRemainder--;
}
//if ( countRemainder == 0) return((long)(pbTarget-cbPattern));
if ( count+countRemainder == 0) return((long)(pbTarget-cbPattern));
//}
}
*/

pbTarget++;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else { //if ( cbPattern<4)
if (cbTarget<961) // This value is arbitrary(don't know how exactly), it ensures(at least must) better performance than 'Boyer_Moore_Horspool'.
{
pbTarget = pbTarget+cbPattern;
ulHashPattern = *(unsigned long *)(pbPattern);
//        countSTATIC = cbPattern-1;

//SINGLET = *(char *)(pbPattern);
SINGLET = ulHashPattern & 0xFF;
Quadruplet2nd = SINGLET<<8;
Quadruplet3rd = SINGLET<<16;
Quadruplet4th = SINGLET<<24;

for ( ;; )
{
AdvanceHopperGrass = 0;
ulHashTarget = *(unsigned long *)(pbTarget-cbPattern);

if ( ulHashPattern == ulHashTarget ) { // Three unnecessary comparisons here, but 'AdvanceHopperGrass' must be calculated - it has a higher priority.
//         count = countSTATIC;
//         while ( count && *(char *)(pbPattern+1+(countSTATIC-count)) == *(char *)(pbTarget-cbPattern+1+(countSTATIC-count)) ) {
//	       if ( countSTATIC==AdvanceHopperGrass+count && SINGLET != *(char *)(pbTarget-cbPattern+1+(countSTATIC-count)) ) AdvanceHopperGrass++;
//               count--;
//         }
count = cbPattern-1;
while ( count && *(char *)(pbPattern+(cbPattern-count)) == *(char *)(pbTarget-count) ) {
if ( cbPattern-1==AdvanceHopperGrass+count && SINGLET != *(char *)(pbTarget-count) ) AdvanceHopperGrass++;
count--;
}
if ( count == 0) return((pbTarget-cbPattern));
} else { // The goal here: to avoid memory accesses by stressing the registers.
if ( Quadruplet2nd != (ulHashTarget & 0x0000FF00) ) {
AdvanceHopperGrass++;
if ( Quadruplet3rd != (ulHashTarget & 0x00FF0000) ) {
AdvanceHopperGrass++;
if ( Quadruplet4th != (ulHashTarget & 0xFF000000) ) AdvanceHopperGrass++;
}
}
}

AdvanceHopperGrass++;

pbTarget = pbTarget + AdvanceHopperGrass;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else { //if (cbTarget<961)
//countSTATIC = cbPattern-2; //r.6+
//countSTATIC = cbPattern-2-3; //r.6++
countSTATIC = cbPattern-2-2; //r.6+++ This line fixes the bug from r.6++!
ulHashPattern = *(unsigned long *)(pbPattern);
/* Preprocessing */
for (a=0; a < ASIZE; a++) bm_bc[a]=cbPattern;
for (j=0; j < cbPattern-1; j++) bm_bc[pbPattern[j]]=cbPattern-j-1;

/* Searching */
//lastch=pbPattern[cbPattern-1];
//firstch=pbPattern[0];
i=0;
while (i <= cbTarget-cbPattern) {
ch=pbTarget[i+cbPattern-1];
//if (ch ==lastch)
//if (memcmp(&pbTarget[i],pbPattern,cbPattern-1) == 0) OUTPUT(i);
//if (ch == lastch && pbTarget[i] == firstch && memcmp(&pbTarget[i],pbPattern,cbPattern-1) == 0) return(i);  // Kaze: The idea(to prevent execution of slower 'memcmp') is borrowed from Karp-Rabin i.e. to perform a slower check only when the target "looks like".
//if (ch == pbPattern[cbPattern-1] && pbTarget[i] == pbPattern[0]) // r.6+
//if (ch == pbPattern[cbPattern-1] && *(long *)&pbTarget[i] == *(long *)&pbPattern[0]) // No problema here since we have 4[+] long pattern here. Overlapping (1 byte recompared) when length=4, grmbl.
if (ch == pbPattern[cbPattern-1] && *(long *)&pbTarget[i] == ulHashPattern) // No problema here since we have 4[+] long pattern here. Overlapping (1 byte recompared) when length=4, grmbl.
{
count = countSTATIC;
//while ( count && *(char *)(pbPattern+1+(countSTATIC-count)) == *(char *)(&pbTarget[i]+1+(countSTATIC-count)) ) { // r.6+
while ( count !=0 && *(char *)(pbPattern+1+3+(countSTATIC-count)) == *(char *)(&pbTarget[i]+1+3+(countSTATIC-count)) ) { // if pattern length is 4 or 5 we have count=-1 and count=0 respectively i.e. no need of comparing in-between chars.
count--;
}
if ( count <= 0) return(pbTarget+i);
}
i+=bm_bc[ch];
}
return(NULL);
} //if (cbTarget<961)
} //if ( cbPattern<4)
}
// ### Mix(2in1) of Karp-Rabin & Boyer-Moore-Horspool algorithm ]```

## Points of Interest

The interesting part about tests I have done: 'memcmp' and 'memcpy' when inlined (properly) give much better results. The thing which pop-ups again-and-again: Never take anything for granted. A bug (which doesn't affect the previous results, correct) was crushed, now r.6+++ is ready to go.

## History

This is revision 6+++ written 2011-Sep-07:

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

## About the Author

 Other Bulgaria
A Bulgarian old boy interested in search techniques, nothing special.

## You may also be interested in...

 Pro Pro

## Comments and Discussions

Discussions posted for the Published version of this article. Posting a message here will take you to the publicly available article in order to continue your conversation in public.

 First PrevNext
 Excellent Manikandan1018-May-14 3:09 Manikandan10 18-May-14 3:09
 Re: Excellent Sanmayce18-May-14 6:35 Sanmayce 18-May-14 6:35
 Re: Excellent Manikandan1020-May-14 22:55 Manikandan10 20-May-14 22:55
 Railgun_Swampshine_BailOut used as match finder in LZSS Sanmayce15-Apr-14 8:32 Sanmayce 15-Apr-14 8:32
 Round #2 Sanmayce17-Apr-14 5:46 Sanmayce 17-Apr-14 5:46
 Round #2, recuperation Sanmayce19-Apr-14 8:46 Sanmayce 19-Apr-14 8:46
 Round #3 Sanmayce21-Apr-14 6:35 Sanmayce 21-Apr-14 6:35
 Round #4 Sanmayce25-Apr-14 5:27 Sanmayce 25-Apr-14 5:27
 Round #5, secret weapon deployed Sanmayce27-Apr-14 4:43 Sanmayce 27-Apr-14 4:43
 Round #5, Aftermath: the supremacy of Nakamichi 'Kaidanji' Sanmayce2-May-14 5:58 Sanmayce 2-May-14 5:58
 Some question KarstenK4-Nov-13 22:43 KarstenK 4-Nov-13 22:43
 Re: Some question Sanmayce5-Nov-13 4:05 Sanmayce 5-Nov-13 4:05
 My vote of 4 Sanmayce3-Nov-13 7:19 Sanmayce 3-Nov-13 7:19
 My vote of 5 Volynsky Alex16-Oct-13 8:06 Volynsky Alex 16-Oct-13 8:06
 Re: My vote of 5 Sanmayce16-Oct-13 8:24 Sanmayce 16-Oct-13 8:24
 Re: My vote of 5 Volynsky Alex16-Oct-13 9:25 Volynsky Alex 16-Oct-13 9:25
 Where to find INSPIRATION? Sanmayce12-Oct-13 7:51 Sanmayce 12-Oct-13 7:51
 Superfast 16-threaded exact searcher: 16 Railguns in action and more Sanmayce9-Feb-13 6:16 Sanmayce 9-Feb-13 6:16
 Fastest 32bit hash function in C Sanmayce29-Sep-12 7:25 Sanmayce 29-Sep-12 7:25
 My latest Windows benchmark package Sanmayce20-Mar-12 7:26 Sanmayce 20-Mar-12 7:26
 Horspool vs BNDM Mischa Sandberg4-Mar-12 18:06 Mischa Sandberg 4-Mar-12 18:06
 Re: Horspool vs BNDM Sanmayce5-Mar-12 5:35 Sanmayce 5-Mar-12 5:35
 Re: Unix Mischa Sandberg5-Mar-12 14:40 Mischa Sandberg 5-Mar-12 14:40
 Re: Horspool vs BNDM Sanmayce7-Mar-12 3:23 Sanmayce 7-Mar-12 3:23
 BNDM_64 vs 'Tridentx64' Sanmayce7-Mar-12 5:05 Sanmayce 7-Mar-12 5:05
 The best practical on-line resource on String Matching Sanmayce7-Mar-12 5:41 Sanmayce 7-Mar-12 5:41
 Railgun_Doublet_x64: the fastest mini-memmem known to me Sanmayce26-Feb-12 4:47 Sanmayce 26-Feb-12 4:47
 Re: Railgun_Doublet_x64: the fastest mini-memmem known to me Mischa Sandberg4-Mar-12 17:52 Mischa Sandberg 4-Mar-12 17:52
 Re: Railgun_Doublet_x64: the fastest mini-memmem known to me Sanmayce5-Mar-12 5:02 Sanmayce 5-Mar-12 5:02
 Decision threshold for switching to 2byte horspool Mischa Sandberg20-Feb-12 19:38 Mischa Sandberg 20-Feb-12 19:38
 Re: Decision threshold for switching to 2byte horspool Sanmayce21-Feb-12 5:30 Sanmayce 21-Feb-12 5:30
 Try this test ... Mischa Sandberg17-Feb-12 10:14 Mischa Sandberg 17-Feb-12 10:14
 Re: Try this test ... Sanmayce18-Feb-12 6:07 Sanmayce 18-Feb-12 6:07
 Re: Try this test ... Mischa Sandberg19-Feb-12 12:43 Mischa Sandberg 19-Feb-12 12:43
 Hasherezade Mischa Sandberg1-Feb-12 10:45 Mischa Sandberg 1-Feb-12 10:45
 Re: Hasherezade Mischa Sandberg1-Feb-12 11:36 Mischa Sandberg 1-Feb-12 11:36
 Re: Hasherezade Sanmayce3-Feb-12 5:13 Sanmayce 3-Feb-12 5:13
 BNDM: more than a Top Gun Sanmayce7-Feb-12 6:25 Sanmayce 7-Feb-12 6:25
 'Gulliver' with Order 3 presence check Sanmayce9-Feb-12 1:20 Sanmayce 9-Feb-12 1:20
 GNU's strstr strikes back Sanmayce16-Feb-12 1:53 Sanmayce 16-Feb-12 1:53
 Re: GNU's strstr strikes back Mischa Sandberg16-Feb-12 11:39 Mischa Sandberg 16-Feb-12 11:39
 Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Sanmayce18-Feb-12 4:10 Sanmayce 18-Feb-12 4:10
 Re: Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Mischa Sandberg21-Feb-12 8:47 Mischa Sandberg 21-Feb-12 8:47
 Re: Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Sanmayce23-Feb-12 6:51 Sanmayce 23-Feb-12 6:51
 Re: Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Mischa Sandberg23-Feb-12 7:37 Mischa Sandberg 23-Feb-12 7:37
 Re: Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Sanmayce23-Feb-12 8:47 Sanmayce 23-Feb-12 8:47
 Re: Railgun_Quadruplet_r8Triplet fastest for 3bytes patterns Mischa Sandberg23-Feb-12 9:56 Mischa Sandberg 23-Feb-12 9:56
 Homo sapiens chromosome 1 under 'Hasherezade' fire Sanmayce21-Feb-12 4:25 Sanmayce 21-Feb-12 4:25
 32bit vs 64bit: my first knock-down Sanmayce24-Feb-12 7:24 Sanmayce 24-Feb-12 7:24
 Re: 32bit vs 64bit: my first knock-down Mischa Sandberg24-Feb-12 11:31 Mischa Sandberg 24-Feb-12 11:31
 Last Visit: 31-Dec-99 18:00     Last Update: 28-Aug-16 10:28 Refresh 123 Next »

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.160826.1 | Last Updated 9 Sep 2011
Article Copyright 2011 by Sanmayce
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid