Introduction
The C strstr-like function presented below (written by Georgi 'Kaze') is named Railgun_Quadruplet_6ppp and is based on well-known Boyer-Moore-Horspool and simplified Karp-Rabin algorithms. The main goal: to achieve fastest execution for all kind of pattern/needle and string/haystack lengths (starting from 2 bytes).
Railgun_Quadruplet_6ppp is now obsolete due to arrival of Railgun_Quadruplet_7 which boosts the BMH fragment by more than a percent. See further below for its commentless and tabulated source.
Background
Function homepage:
http://www.sanmayce.com/Railgun/index.html
Note: In order to reproduce the tests shown on the picture (below) you need the test package (57.2 MB) downloadable at function's homepage, thanks Randor for pointing out.
Very informative resource on strstr search techniques at:
http://www-igm.univ-mlv.fr/~lecroq/string/
Tests done with 16 patterns with 197MB haystack (being a pure English book-like text)
The machine is a mainstream laptop on 2.1GHz with Intel Merom CPU.
Using the Code
The code is pretty simple and straightforward.
char * Railgun_Quadruplet_6ppp (char * pbTarget,
char * pbPattern,
unsigned long cbTarget,
unsigned long cbPattern)
{
char * pbTargetMax = pbTarget + cbTarget;
register unsigned long ulHashPattern;
unsigned long ulHashTarget;
signed long count;
signed long countSTATIC;
unsigned char SINGLET;
unsigned long Quadruplet2nd;
unsigned long Quadruplet3rd;
unsigned long Quadruplet4th;
unsigned long AdvanceHopperGrass;
long i; int a, j, bm_bc[ASIZE]; unsigned char ch;
if (cbPattern > cbTarget)
return(NULL);
if ( cbPattern<4) { pbTarget = pbTarget+cbPattern;
ulHashPattern = ( (*(char *)(pbPattern))<<8 ) + *(pbPattern+(cbPattern-1));
if ( cbPattern==3) {
for ( ;; )
{
if ( ulHashPattern == ( (*(char *)(pbTarget-3))<<8 ) + *(pbTarget-1) ) {
if ( *(char *)(pbPattern+1) == *(char *)(pbTarget-2) ) return((pbTarget-3));
}
if ( (char)(ulHashPattern>>8) != *(pbTarget-2) ) pbTarget++;
pbTarget++;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else {
}
for ( ;; )
{
if ( ulHashPattern == ( (*(char *)(pbTarget-2))<<8 ) + *(pbTarget-1) )
return((pbTarget-2));
if ( (char)(ulHashPattern>>8) != *(pbTarget-1) ) pbTarget++;
pbTarget++;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else { if (cbTarget<961) {
pbTarget = pbTarget+cbPattern;
ulHashPattern = *(unsigned long *)(pbPattern);
SINGLET = ulHashPattern & 0xFF;
Quadruplet2nd = SINGLET<<8;
Quadruplet3rd = SINGLET<<16;
Quadruplet4th = SINGLET<<24;
for ( ;; )
{
AdvanceHopperGrass = 0;
ulHashTarget = *(unsigned long *)(pbTarget-cbPattern);
if ( ulHashPattern == ulHashTarget ) { count = cbPattern-1;
while ( count && *(char *)(pbPattern+(cbPattern-count)) == *(char *)(pbTarget-count) ) {
if ( cbPattern-1==AdvanceHopperGrass+count && SINGLET != *(char *)(pbTarget-count) ) AdvanceHopperGrass++;
count--;
}
if ( count == 0) return((pbTarget-cbPattern));
} else { if ( Quadruplet2nd != (ulHashTarget & 0x0000FF00) ) {
AdvanceHopperGrass++;
if ( Quadruplet3rd != (ulHashTarget & 0x00FF0000) ) {
AdvanceHopperGrass++;
if ( Quadruplet4th != (ulHashTarget & 0xFF000000) ) AdvanceHopperGrass++;
}
}
}
AdvanceHopperGrass++;
pbTarget = pbTarget + AdvanceHopperGrass;
if (pbTarget > pbTargetMax)
return(NULL);
}
} else { countSTATIC = cbPattern-2-2; ulHashPattern = *(unsigned long *)(pbPattern);
for (a=0; a < ASIZE; a++) bm_bc[a]=cbPattern;
for (j=0; j < cbPattern-1; j++) bm_bc[pbPattern[j]]=cbPattern-j-1;
i=0;
while (i <= cbTarget-cbPattern) {
ch=pbTarget[i+cbPattern-1];
if (ch == pbPattern[cbPattern-1] && *(long *)&pbTarget[i] == ulHashPattern) {
count = countSTATIC;
while ( count !=0 && *(char *)(pbPattern+1+3+(countSTATIC-count)) == *(char *)(&pbTarget[i]+1+3+(countSTATIC-count)) ) { count--;
}
if ( count <= 0) return(pbTarget+i);
}
i+=bm_bc[ch];
}
return(NULL);
} } }
Stomp stomp I arrived, here comes the Railgun_Quadruplet revision 7:
char * Railgun_Quadruplet_7 (char * pbTarget, char * pbPattern, unsigned long cbTarget, unsigned long cbPattern)
{
char * pbTargetMax = pbTarget + cbTarget;
register unsigned long ulHashPattern;
unsigned long ulHashTarget;
signed long count;
signed long countSTATIC;
unsigned char SINGLET;
unsigned long Quadruplet2nd;
unsigned long Quadruplet3rd;
unsigned long Quadruplet4th;
unsigned long AdvanceHopperGrass;
long i;
int a, j, bm_bc[ASIZE];
unsigned char ch;
if (cbPattern > cbTarget) return(NULL);
if (cbPattern<4) {
pbTarget = pbTarget+cbPattern;
ulHashPattern = ( (*(char *)(pbPattern))<<8 ) + *(pbPattern+(cbPattern-1));
if (cbPattern==3) {
for ( ;; ) {
if ( ulHashPattern == ( (*(char *)(pbTarget-3))<<8 ) + *(pbTarget-1) ) {
if ( *(char *)(pbPattern+1) == *(char *)(pbTarget-2) ) return((pbTarget-3));
}
if ( (char)(ulHashPattern>>8) != *(pbTarget-2) ) pbTarget++;
pbTarget++;
if (pbTarget > pbTargetMax) return(NULL);
}
} else {
}
for ( ;; ) {
if ( ulHashPattern == ( (*(char *)(pbTarget-2))<<8 ) + *(pbTarget-1) ) return((pbTarget-2));
if ( (char)(ulHashPattern>>8) != *(pbTarget-1) ) pbTarget++;
pbTarget++;
if (pbTarget > pbTargetMax) return(NULL);
}
} else {
if (cbTarget<961) {
pbTarget = pbTarget+cbPattern;
ulHashPattern = *(unsigned long *)(pbPattern);
SINGLET = ulHashPattern & 0xFF;
Quadruplet2nd = SINGLET<<8;
Quadruplet3rd = SINGLET<<16;
Quadruplet4th = SINGLET<<24;
for ( ;; ) {
AdvanceHopperGrass = 0;
ulHashTarget = *(unsigned long *)(pbTarget-cbPattern);
if ( ulHashPattern == ulHashTarget ) {
count = cbPattern-1;
while ( count && *(char *)(pbPattern+(cbPattern-count)) == *(char *)(pbTarget-count) ) {
if ( cbPattern-1==AdvanceHopperGrass+count && SINGLET != *(char *)(pbTarget-count) ) AdvanceHopperGrass++;
count--;
}
if ( count == 0) return((pbTarget-cbPattern));
} else {
if ( Quadruplet2nd != (ulHashTarget & 0x0000FF00) ) {
AdvanceHopperGrass++;
if ( Quadruplet3rd != (ulHashTarget & 0x00FF0000) ) {
AdvanceHopperGrass++;
if ( Quadruplet4th != (ulHashTarget & 0xFF000000) ) AdvanceHopperGrass++;
}
}
}
AdvanceHopperGrass++;
pbTarget = pbTarget + AdvanceHopperGrass;
if (pbTarget > pbTargetMax) return(NULL);
}
} else {
countSTATIC = cbPattern-2-2;
ulHashPattern = *(unsigned long *)(pbPattern);
for (a=0; a < ASIZE; a++) bm_bc[a]=cbPattern;
for (j=0; j < cbPattern-1; j++) bm_bc[pbPattern[j]]=cbPattern-j-1;
i=0;
while (i <= cbTarget-cbPattern) {
if ( *(unsigned long *)&pbTarget[i] == ulHashPattern ) {
count = countSTATIC;
while ( count !=0 && *(char *)(pbPattern+1+3+(countSTATIC-count)) == *(char *)(&pbTarget[i]+1+3+(countSTATIC-count)) ) count--;
if ( count == 0) return(pbTarget+i);
}
i= i + bm_bc[(unsigned char)pbTarget[i+cbPattern-1]];
}
return(NULL);
}
}
}
The tweak (in BMH fragment) which speed-ups significantly the obsolete (already) r.6+++ is this:
countSTATIC = cbPattern-2-2; ulHashPattern = *(unsigned long *)(pbPattern);
for (a=0; a < ASIZE; a++) bm_bc[a]=cbPattern;
for (j=0; j < cbPattern-1; j++) bm_bc[pbPattern[j]]=cbPattern-j-1;
i=0;
while (i <= cbTarget-cbPattern) {
if ( *(unsigned long *)&pbTarget[i] == ulHashPattern ) {
count = countSTATIC;
while ( count !=0 && *(char *)(pbPattern+(countSTATIC-count)+4) == *(char *)(&pbTarget[i]+(countSTATIC-count)+4) ) { count--;
}
if ( count == 0) return(pbTarget+i);
}
i= i + bm_bc[(unsigned char)pbTarget[i+cbPattern-1]];
}
return(NULL);
In order to see what path I had to traverse to make this tweak you may see the discussion board (r.7m thread).
The tests below show NOT the full boost simply because they average times for all lenghts thus losing/diminishing the high speeds times (being 4-3-2 times lesser compared with pattern lenghts 2-3-4-5-6).
D:\_KAZE_new-stuff>strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox.exe
strstr_SHORT-SHOWDOWN, revision 7, written by Kaze.
Input Pattern(up to 19+2000 chars): frustration
Doing Search for Pattern(11bytes) into String(206908949bytes) as-one-line ...
WARNING! The next line works with BMH only for pattern 4[+] long, otherwise (for 2 and 3) other searcher takes over!
Railgun_6pp_hits/Railgun_6pp_clocks: 1355/101
Railgun_6pp performance: 2000KB/clock
Doing Search for Pattern(11bytes) into String(206908949bytes) as-one-line ...
WARNING! The next line works with BMH only for pattern 4[+] long, otherwise (for 2 and 3) other searcher takes over!
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1355/91
Railgun_Quadruplet_7 performance: 2220KB/clock
Note: Executing the next two tests 256 times i.e. the search is for 256x8x2 patterns!
Doing Search for 8x2 Patterns into String(206908949bytes) as-one-line ... Not-Counting-Hits-Just-Returns-First-One
Found ('an') at 157 position, Railgun_Quadruplet_6pp performance: 0KB/clock
Found ('to') at 853 position, Railgun_Quadruplet_6pp performance: 0KB/clock
Found ('TDK') at -4325408 position, Railgun_Quadruplet_6pp performance: 922KB/clock
Found ('the') at 511 position, Railgun_Quadruplet_6pp performance: 0KB/clock
Found ('fast') at 42381 position, Railgun_Quadruplet_6pp performance: 41KB/clock
Found ('easy') at 30163 position, Railgun_Quadruplet_6pp performance: 29KB/clock
Found ('grmbl') at -4325408 position, Railgun_Quadruplet_6pp performance: 1167KB/clock
Found ('email') at 69587982 position, Railgun_Quadruplet_6pp performance: 1078KB/clock
Found ('pasting') at 134600126 position, Railgun_Quadruplet_6pp performance: 1383KB/clock
Found ('amazing') at 1629806 position, Railgun_Quadruplet_6pp performance: 93KB/clock
Found ('underdog') at 137424498 position, Railgun_Quadruplet_6pp performance: 1698KB/clock
Found ('superdog') at -4325408 position, Railgun_Quadruplet_6pp performance: 1603KB/clock
Found ('participants') at 16060 position, Railgun_Quadruplet_6pp performance: 15KB/clock
Found ('skillessness') at -4325408 position, Railgun_Quadruplet_6pp performance: 1836KB/clock
Found ('I should have known') at 41831880 position, Railgun_Quadruplet_6pp performance: 2403KB/clock
Found ('human consciousness') at 3863 position, Railgun_Quadruplet_6pp performance: 3KB/clock
Railgun_Quadruplet_6pp 8x2 i.e. average performance: 1342KB/clock
ReallyTraversed: 310,477,846,528 bytes
Doing Search for 8x2 Patterns into String(206908949bytes) as-one-line ... Not-Counting-Hits-Just-Returns-First-One
Found ('an') at 157 position, Railgun_Quadruplet_7 performance: 0KB/clock
Found ('to') at 853 position, Railgun_Quadruplet_7 performance: 0KB/clock
Found ('TDK') at -4325408 position, Railgun_Quadruplet_7 performance: 990KB/clock
Found ('the') at 511 position, Railgun_Quadruplet_7 performance: 0KB/clock
Found ('fast') at 42381 position, Railgun_Quadruplet_7 performance: 41KB/clock
Found ('easy') at 30163 position, Railgun_Quadruplet_7 performance: 29KB/clock
Found ('grmbl') at -4325408 position, Railgun_Quadruplet_7 performance: 1069KB/clock
Found ('email') at 69587982 position, Railgun_Quadruplet_7 performance: 1078KB/clock
Found ('pasting') at 134600126 position, Railgun_Quadruplet_7 performance: 1383KB/clock
Found ('amazing') at 1629806 position, Railgun_Quadruplet_7 performance: 1591KB/clock
Found ('underdog') at 137424498 position, Railgun_Quadruplet_7 performance: 1698KB/clock
Found ('superdog') at -4325408 position, Railgun_Quadruplet_7 performance: 1422KB/clock
Found ('participants') at 16060 position, Railgun_Quadruplet_7 performance: 15KB/clock
Found ('skillessness') at -4325408 position, Railgun_Quadruplet_7 performance: 2149KB/clock
Found ('I should have known') at 41831880 position, Railgun_Quadruplet_7 performance: 2403KB/clock
Found ('human consciousness') at 3863 position, Railgun_Quadruplet_7 performance: 3KB/clock
Railgun_Quadruplet_7 8x2 i.e. average performance: 1354KB/clock
ReallyTraversed: 310,477,846,528 bytes
Doing Search for 8x2 Patterns into String(206908949bytes) as-one-line ...
Found ('an') 1987797 time(s), BM_HORSPOOL performance: 293KB/clock
Found ('to') 1076629 time(s), BM_HORSPOOL performance: 293KB/clock
Found ('TDK') 0 time(s), BM_HORSPOOL performance: 516KB/clock
Found ('the') 2114180 time(s), BM_HORSPOOL performance: 379KB/clock
Found ('fast') 5945 time(s), BM_HORSPOOL performance: 585KB/clock
Found ('easy') 5191 time(s), BM_HORSPOOL performance: 585KB/clock
Found ('grmbl') 0 time(s), BM_HORSPOOL performance: 759KB/clock
Found ('email') 1 time(s), BM_HORSPOOL performance: 756KB/clock
Found ('pasting') 2 time(s), BM_HORSPOOL performance: 918KB/clock
Found ('amazing') 323 time(s), BM_HORSPOOL performance: 1074KB/clock
Found ('underdog') 4 time(s), BM_HORSPOOL performance: 1069KB/clock
Found ('superdog') 0 time(s), BM_HORSPOOL performance: 1074KB/clock
Found ('participants') 147 time(s), BM_HORSPOOL performance: 1603KB/clock
Found ('skillessness') 0 time(s), BM_HORSPOOL performance: 1422KB/clock
Found ('I should have known') 1 time(s), BM_HORSPOOL performance: 1433KB/clock
Found ('human consciousness') 519 time(s), BM_HORSPOOL performance: 1820KB/clock
BM_HORSPOOL 8x2 i.e. average performance: 666KB/clock
Doing Search for 8x2 Patterns into String(206908949bytes) as-one-line ...
Found ('an') 1987797 time(s), Railgun_6pp performance: 678KB/clock
Found ('to') 1076629 time(s), Railgun_6pp performance: 643KB/clock
Found ('TDK') 0 time(s), Railgun_6pp performance: 1174KB/clock
Found ('the') 2114180 time(s), Railgun_6pp performance: 678KB/clock
Found ('fast') 5945 time(s), Railgun_6pp performance: 918KB/clock
Found ('easy') 5191 time(s), Railgun_6pp performance: 990KB/clock
Found ('grmbl') 0 time(s), Railgun_6pp performance: 1287KB/clock
Found ('email') 1 time(s), Railgun_6pp performance: 1167KB/clock
Found ('pasting') 2 time(s), Railgun_6pp performance: 1603KB/clock
Found ('amazing') 323 time(s), Railgun_6pp performance: 1603KB/clock
Found ('underdog') 4 time(s), Railgun_6pp performance: 1603KB/clock
Found ('superdog') 0 time(s), Railgun_6pp performance: 1820KB/clock
Found ('participants') 147 time(s), Railgun_6pp performance: 2149KB/clock
Found ('skillessness') 0 time(s), Railgun_6pp performance: 2126KB/clock
Found ('I should have known') 1 time(s), Railgun_6pp performance: 2557KB/clock
Found ('human consciousness') 519 time(s), Railgun_6pp performance: 2126KB/clock
Railgun_6pp 8x2 i.e. average performance: 1218KB/clock
Doing Search for 8x2 Patterns into String(206908949bytes) as-one-line ...
Found ('an') 1987797 time(s), Railgun_Quadruplet_7 performance: 643KB/clock
Found ('to') 1076629 time(s), Railgun_Quadruplet_7 performance: 678KB/clock
Found ('TDK') 0 time(s), Railgun_Quadruplet_7 performance: 1074KB/clock
Found ('the') 2114180 time(s), Railgun_Quadruplet_7 performance: 643KB/clock
Found ('fast') 5945 time(s), Railgun_Quadruplet_7 performance: 1074KB/clock
Found ('easy') 5191 time(s), Railgun_Quadruplet_7 performance: 1074KB/clock
Found ('grmbl') 0 time(s), Railgun_Quadruplet_7 performance: 1287KB/clock
Found ('email') 1 time(s), Railgun_Quadruplet_7 performance: 1167KB/clock
Found ('pasting') 2 time(s), Railgun_Quadruplet_7 performance: 1603KB/clock
Found ('amazing') 323 time(s), Railgun_Quadruplet_7 performance: 1603KB/clock
Found ('underdog') 4 time(s), Railgun_Quadruplet_7 performance: 1820KB/clock
Found ('superdog') 0 time(s), Railgun_Quadruplet_7 performance: 1603KB/clock
Found ('participants') 147 time(s), Railgun_Quadruplet_7 performance: 2557KB/clock
Found ('skillessness') 0 time(s), Railgun_Quadruplet_7 performance: 2126KB/clock
Found ('I should have known') 1 time(s), Railgun_Quadruplet_7 performance: 2557KB/clock
Found ('human consciousness') 519 time(s), Railgun_Quadruplet_7 performance: 2557KB/clock
Railgun_Quadruplet_7 8x2 i.e. average performance: 1232KB/clock
^C
D:\_KAZE_new-stuff>
In my opinion it is one of the seven commandments of speed: do not put the cart before the horse i.e. for adjacent (32?) memory accesses first extract the byte with lowest address.
strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox.exe, revision 7, some quick inputs:
D:\_KAZE_new-stuff>strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox.exe
strstr_SHORT-SHOWDOWN, revision 7, written by Kaze.
Input Pattern(up to 19+2000 chars): fast
Doing Search for Pattern(4bytes) into String(206908949bytes) as-one-line ...
WARNING! The next line works with BMH only for pattern 4[+] long, otherwise (for 2 and 3) other searcher takes over!
Railgun_6pp_hits/Railgun_6pp_clocks: 5945/215
Railgun_6pp performance: 939KB/clock
Doing Search for Pattern(4bytes) into String(206908949bytes) as-one-line ...
WARNING! The next line works with BMH only for pattern 4[+] long, otherwise (for 2 and 3) other searcher takes over!
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 5945/193
Railgun_Quadruplet_7 performance: 1046KB/clock
^C
...
Input Pattern(up to 19+2000 chars): from
Railgun_6pp_hits/Railgun_6pp_clocks: 83155/196
Railgun_6pp performance: 1030KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 83155/194
Railgun_Quadruplet_7 performance: 1041KB/clock
Input Pattern(up to 19+2000 chars): rabea
Railgun_6pp_hits/Railgun_6pp_clocks: 0/176
Railgun_6pp performance: 1148KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/162
Railgun_Quadruplet_7 performance: 1247KB/clock
Input Pattern(up to 19+2000 chars): makes
Railgun_6pp_hits/Railgun_6pp_clocks: 9281/172
Railgun_6pp performance: 1174KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 9281/162
Railgun_Quadruplet_7 performance: 1247KB/clock
Input Pattern(up to 19+2000 chars): monkey
Railgun_6pp_hits/Railgun_6pp_clocks: 1691/141
Railgun_6pp performance: 1433KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1691/139
Railgun_Quadruplet_7 performance: 1453KB/clock
Input Pattern(up to 19+2000 chars): punny
Railgun_6pp_hits/Railgun_6pp_clocks: 0/156
Railgun_6pp performance: 1295KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/155
Railgun_Quadruplet_7 performance: 1303KB/clock
Input Pattern(up to 19+2000 chars): funny
Railgun_6pp_hits/Railgun_6pp_clocks: 179/157
Railgun_6pp performance: 1287KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 179/156
Railgun_Quadruplet_7 performance: 1295KB/clock
Input Pattern(up to 19+2000 chars): ramjet
Railgun_6pp_hits/Railgun_6pp_clocks: 0/152
Railgun_6pp performance: 1329KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/137
Railgun_Quadruplet_7 performance: 1474KB/clock
Input Pattern(up to 19+2000 chars): fallen
Railgun_6pp_hits/Railgun_6pp_clocks: 2578/151
Railgun_6pp performance: 1338KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 2578/138
Railgun_Quadruplet_7 performance: 1464KB/clock
Input Pattern(up to 19+2000 chars): elephant
Railgun_6pp_hits/Railgun_6pp_clocks: 1198/124
Railgun_6pp performance: 1629KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1198/112
Railgun_Quadruplet_7 performance: 1804KB/clock
Input Pattern(up to 19+2000 chars): layalina
Railgun_6pp_hits/Railgun_6pp_clocks: 0/121
Railgun_6pp performance: 1669KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/110
Railgun_Quadruplet_7 performance: 1836KB/clock
Input Pattern(up to 19+2000 chars): I should have known
Railgun_6pp_hits/Railgun_6pp_clocks: 1/89
Railgun_6pp performance: 2270KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1/84
Railgun_Quadruplet_7 performance: 2405KB/clock
Input Pattern(up to 19+2000 chars): human consciousness
Railgun_6pp_hits/Railgun_6pp_clocks: 519/80
Railgun_6pp performance: 2525KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 519/75
Railgun_Quadruplet_7 performance: 2694KB/clock
Input Pattern(up to 19+2000 chars): But he's looking right through me
Railgun_6pp_hits/Railgun_6pp_clocks: 0/84
Railgun_6pp performance: 2405KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/76
Railgun_Quadruplet_7 performance: 2658KB/clock
Input Pattern(up to 19+2000 chars): I'm living in an age that calls darkness light
Railgun_6pp_hits/Railgun_6pp_clocks: 0/72
Railgun_6pp performance: 2806KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/67
Railgun_Quadruplet_7 performance: 3015KB/clock
Input Pattern(up to 19+2000 chars): Following the wanderings of a dream - a dream that keeps my soul alive
Railgun_6pp_hits/Railgun_6pp_clocks: 0/69
Railgun_6pp performance: 2928KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/63
Railgun_Quadruplet_7 performance: 3207KB/clock
Input Pattern(up to 19+2000 chars): I notice what matters and I got nothing to lose but darkness and shadows
Railgun_6pp_hits/Railgun_6pp_clocks: 0/66
Railgun_6pp performance: 3061KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/63
Railgun_Quadruplet_7 performance: 3207KB/clock
Input Pattern(up to 19+2000 chars): Beth Ditto
Railgun_6pp_hits/Railgun_6pp_clocks: 0/118
Railgun_6pp performance: 1712KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/107
Railgun_Quadruplet_7 performance: 1888KB/clock
Input Pattern(up to 19+2000 chars): SR71
Railgun_6pp_hits/Railgun_6pp_clocks: 0/175
Railgun_6pp performance: 1154KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/182
Railgun_Quadruplet_7 performance: 1110KB/clock
Input Pattern(up to 19+2000 chars): Apache
Railgun_6pp_hits/Railgun_6pp_clocks: 1/162
Railgun_6pp performance: 1247KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1/133
Railgun_Quadruplet_7 performance: 1519KB/clock
Input Pattern(up to 19+2000 chars): Fibonacci
Railgun_6pp_hits/Railgun_6pp_clocks: 0/106
Railgun_6pp performance: 1906KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/98
Railgun_Quadruplet_7 performance: 2061KB/clock
Input Pattern(up to 19+2000 chars): Tesla
Railgun_6pp_hits/Railgun_6pp_clocks: 0/175
Railgun_6pp performance: 1154KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/160
Railgun_Quadruplet_7 performance: 1262KB/clock
Input Pattern(up to 19+2000 chars): Albert
Railgun_6pp_hits/Railgun_6pp_clocks: 932/154
Railgun_6pp performance: 1312KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 932/137
Railgun_Quadruplet_7 performance: 1474KB/clock
Input Pattern(up to 19+2000 chars): Toshiba
Railgun_6pp_hits/Railgun_6pp_clocks: 0/130
Railgun_6pp performance: 1554KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 0/121
Railgun_Quadruplet_7 performance: 1669KB/clock
Input Pattern(up to 19+2000 chars): Orion
Railgun_6pp_hits/Railgun_6pp_clocks: 1/175
Railgun_6pp performance: 1154KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1/160
Railgun_Quadruplet_7 performance: 1262KB/clock
Input Pattern(up to 19+2000 chars): pharaoh
Railgun_6pp_hits/Railgun_6pp_clocks: 26/129
Railgun_6pp performance: 1566KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 26/122
Railgun_Quadruplet_7 performance: 1656KB/clock
Input Pattern(up to 19+2000 chars): lemon
Railgun_6pp_hits/Railgun_6pp_clocks: 33/178
Railgun_6pp performance: 1135KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 33/161
Railgun_Quadruplet_7 performance: 1255KB/clock
Input Pattern(up to 19+2000 chars): grammar
Railgun_6pp_hits/Railgun_6pp_clocks: 218/123
Railgun_6pp performance: 1642KB/clock
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 218/117
Railgun_Quadruplet_7 performance: 1727KB/clock
D:\_KAZE_new-stuff>
Grmbl, pattern 'SR71' compared to 'fast' gives me a hint how things are far from 'fastest'.
Points of Interest
The interesting part about tests I have done: 'memcmp' and 'memcpy' when inlined (properly) give much better results. The thing which pop-ups again-and-again: Never take anything for granted. A bug (which doesn't affect the previous results, correct) was crushed, now r.6+++ is ready to go.
As you can see from the C (commentless and tabulated) and ASM sources (see the top of the page for downloading):
The Microsoft 16.00.30319.01 compiled Railgun_Quadruplet_6ppp is 00c6b-00a00+5 = 624 bytes long.
The Quadruplet (where cbTarget<961) fragment loop is 00bc3-00b17+5 = 177 bytes long.
The Boyer-Moore-Horspool (where cbTarget>=961) fragment loop is 00c61-00c20+2 = 67 bytes long.
An interesting question for more experienced coders:
In case of reducing the BMH loop down by 3 bytes (i.e. to 64 bytes), can we obtain some additional/significant boost?
History
This is revision 6+++ written 2011-Sep-07.
Also revision 7 written 2011-Sep-25.