|
|
Comments and Discussions
|
|
 |
|

|
Last night (2012 Feb 25) achieved 13%-20% boost of my strstr-like (i.e. short strings/patterns cases) function compared to GNU's Berg's strstr function - both compiled as 64bit code using Microsoft 16 and Intel 12.1 compilers. For full benchmark see the commented lines further below. After some ups-and-downs experienced in Windows 7 64bit with Intel 12.1 64bit and Microsoft 16 64bit C compilers I revised Railgun in order to fit it better in the new environment which resulted in appearance of Railgun_Doublet. The function is tiny: with 00cc5-00c71+1+1 = 86 bytes main-loop.
GNU Berg's performance (compiled with Microsoft 2010 64bit compiler): 376KB/clock+400KB/clock+564KB/clock+346KB/clock+540KB/clock+395KB/clock+535KB/clock+401KB/clock+542KB/clock+455KB/clock+510KB/clock+468KB/clock+541KB/clock+469KB/clock+557KB/clock+482KB/clock=7581
GNU Berg's performance (compiled with Intel C++ 64 Compiler XE 12.1): 296KB/clock+320KB/clock+421KB/clock+276KB/clock+417KB/clock+323KB/clock+415KB/clock+326KB/clock+415KB/clock+367KB/clock+405KB/clock+378KB/clock+415KB/clock+378KB/clock+418KB/clock+388KB/clock=5958
Railgun_Doublet performance (compiled with Microsoft 2010 64bit compiler): 421KB/clock+476KB/clock+551KB/clock+393KB/clock+550KB/clock+534KB/clock+553KB/clock+550KB/clock+559KB/clock+557KB/clock+553KB/clock+562KB/clock+568KB/clock+573KB/clock+587KB/clock+594KB/clock=8581
Railgun_Doublet performance (compiled with Intel C++ 64 Compiler XE 12.1): 348KB/clock+397KB/clock+464KB/clock+328KB/clock+463KB/clock+452KB/clock+465KB/clock+463KB/clock+469KB/clock+468KB/clock+464KB/clock+470KB/clock+477KB/clock+479KB/clock+490KB/clock+494KB/clock=7191
BNDM_32 performance (compiled with Microsoft 2010 64bit compiler): 267KB/clock+290KB/clock+462KB/clock+255KB/clock+390KB/clock+379KB/clock+451KB/clock+395KB/clock+400KB/clock+433KB/clock+419KB/clock+420KB/clock+436KB/clock+447KB/clock+435KB/clock+441KB/clock=6320
BNDM_32 performance (compiled with Intel C++ 64 Compiler XE 12.1): 241KB/clock+262KB/clock+409KB/clock+234KB/clock+352KB/clock+343KB/clock+402KB/clock+359KB/clock+367KB/clock+392KB/clock+383KB/clock+384KB/clock+398KB/clock+407KB/clock+400KB/clock+405KB/clock=5738
Summary for all 16 patterns:
GNU Berg's performance: 7581 Microsoft / 5958 Intel
Railgun_Doublet performance: 8581 Microsoft / 7191 Intel
BNDM_32 performance: 6320 Microsoft / 5738 Intel
Using Microsoft:
Railgun_Doublet is (8581-7581)/7581*100% = 13% faster than GNU Berg's
Railgun_Doublet is (8581-6320)/6320*100% = 35% faster than BNDM_32
Using Intel:
Railgun_Doublet is (7191-5958)/5958*100% = 20% faster than GNU Berg's
Railgun_Doublet is (7191-5738)/5738*100% = 25% faster than BNDM_32
For full benchmark dumps: Railgun homepage.
// Notes on 80x86 and x64, 2012-Feb-25:
// Three compilers were used (first two on Windows 7 64bit, the third on Windows XP 32bit):
// Intel(R) C++ 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.1.258 Build 20111011
// Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64
// Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
// I have been using x64 for more than 12 hours and quickly the picture has become clear: code written for 32bit must be replaced with dedicated 64bit counterpart, relying on former only is a gambling venture.
// For example Railgun_Doublet dominates when compiled as 64bit (both with Intel XE 2011 12.1 and Microsoft 2010 16.00.30319.01 for x64), but for 32bit it is inferior compared to Railgun_Quadruplet_8Triplet.
// Summary for strstr-like (i.e. memmem for short strings/patterns) usage:
// - for 32bit use Railgun_Quadruplet_8Triplet
// - for 64bit use Railgun_Doublet
// My interest has been shifted from strstr toward memmem, Railgun_Doublet will fill the gap (short strings/patterns cases) in Railgun_r8_Mimino_x64 whereas 'Trident2'+'Hasherezade' will deal with memmem part, 'Hasherezade' should be surely tuned for 64bit.
Get down get down get down get it on show love and give it up
What are you waiting on?
|
|
|
|

|
Just checking: "Railgun_Doublet" is the special case for cbPattern == 2? Or something else using a two-byte key?
"Dreams come true, not free" -- S.Sondheim
|
|
|
|

|
The second.
It is a 64bit (only) short patterns/strings micro-memmem, by micro (mini was not precise) I mean the lowest range possible. I am confused enough of all ranges and searchers, so I wanted to see A LOT of benchmarks on Sandy/Ivy bridge machines running Windows 7 64bit - I asked for help on two overclock forums but sadly the overclockers there are not interested.
Recently I have read that Ivy bridge runs on 3500MHz nominal reaching without problem 6-7GHz - I wanted to see how CPU clock and RAM latency along with RAM bandwidth affect current 64bit functions. My desire is to tune 'Mimino' for fastest machines on the PC market as step one, but misery holds me back these days.
Get down get down get down get it on show love and give it up
What are you waiting on?
|
|
|
|

|
As I mentioned elsewhere, I've modified the winstr routines based on railgun's approach to short-pattern scans: trust the hardware to do unaligned loads faster than software can shift-and-add bytes into words. The 2- and 4-byte pattern cases amount to 6-instruction loops; the 3-byte pattern case is a 9-instruction loop. That's a definite 30-40% win. Thanks.
I went through the thrash tests again, to see if I could triage algorithm choice based on (cbPattern, cbTarget), which have ranges (2..256) and (8..65536).
For "nice" input, where pbPattern[0] is rare in pbTarget, SSE2 has a ridiculous advantage for cbPattern = 2..16; likewise BNDM for cbPattern = 17..66. For cbPattern=67+, Elsiane is best for a larger and larger wedge between them (cbTarget=1K for cbPattern=67; cbTarget=2K..32K for cbPattern=130; ...).
However, what I'm looking for is consistent behaviour in the face of weighted nasty input
For cbPattern = 2, the SSE2 routine take 35% the time of SSE4.2 strstr ... and everything else (railgun, winstr, BNDM) is at least 250% the time of strstr. Looks like that's a dead horse for intelligent algorithms versus hardware, until you get to much larger cbTargets.
For cbPattern = 3..9, nothing beats strstr. For cbTarget=32K+, Gulliver is wildly variable (80%-275% the time of strstr). That suggest that it EVENTUALLY overtakes strstr, but that's beyond my current test range.
For cbPattern = 10..15, Gulliver is best for cbTarget=32K+.
For cbPattern = 16..32 bytes, wins is best for tiny targets (2..128), BNDM for midsize(256..16384) and Gulliver for 32K+
For cbPattern > 32, wins is consistently best for (2..512), Elsiane for (1K..16K) and Gulliver for 32K+.
Hasherezade is always > Gulliver*2.
To me, that suggests that you might consider (higher) thresholds for switching to 2byte Horspool and Hash. Just some thoughts.
"Dreams come true, not free" -- S.Sondheim
|
|
|
|

|
Thanks for the tests, I have some high hopes for 'Trident2'('Gulliver' reinforced by BNDM_32), by sacrificing some Speed-Performance in order to achieve stronger Skip-Performance 'Trident2' is my current favorite. I wonder whether the current 3 chars unrolling will suffice or it should be done for depth 4?! For me 'Gulliver' is a beaten card, the same goes for Horspool&Sunday. My quick DNA test (a post further below) prompts for mix of 'Trident2' and 'Hasherezade'. As for strstr-like (not memmem) I think Railgun_Quadruplet_8Triplet has to be left as it is.
For last some days an idea of compiling-all-into-one is popping on-and-on, I see it as Railgun_r8_Mimino (after one of my favorite movies), however I do no not intend to flood (until it gets thoroughly tested) the already present hitters with a new one.
>To me, that suggests that you might consider (higher) thresholds for switching to 2byte Horspool and Hash.
Yes, it should be 32+KB in my view.
Get down get down get down get it on show love and give it up
What are you waiting on?
|
|
|
|

|
For pattern = "ABCDE", match against a target string that repeats "ABCD" 10000 times, with a final "ABCDE" to prove that matching happens.
Now try the same thing with pattern = "ABCDEF", and target string is "ABCDE" repeated 8000 times, with a final "ABCDEEF".
Edge cases are interesting ...
"Dreams come true, not free" -- S.Sondheim
|
|
|
|

|
Mischa I couldn't see your point.
The first test:
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>dir OSHO.TXT
02/18/2012 06:55 PM 40,005 osho.txt
1 File(s) 40,005 bytes
0 Dir(s) 3,317,592,064 bytes free
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>type OSHO.TXT
ABCD...ABCDABCDE
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox.exe
strstr_SHORT-SHOWDOWN, revision 8Triplet_7Trident_7Hasherezade_vs_7Gulliver2_vs_7Elsiane_vs_7sun_vs_7_vs_7sunhorse_vs_7deuce_vs_BMF, written by Kaze.
Full credits to: R.S. Boyer, J.S. Moore, R.N. Horspool, D.M. Sunday.
Usage: strstr_SHORT-SHOWDOWN.exe [anystring] [anystring]
Example1(keyboard test): strstr_SHORT-SHOWDOWN.exe
Example2(DUMBO 8x2 test): strstr_SHORT-SHOWDOWN.exe go
Example3(6x2 and 52 tests): strstr_SHORT-SHOWDOWN.exe go go
Input Pattern(up to 19+2000 chars): ABCDE
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7sun_hits/Railgun_Quadruplet_7sun_clocks: 1/1
Railgun_Quadruplet_7sun performance: 39KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1/6
Railgun_Quadruplet_7 performance: 6KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7Trident_hits/Railgun_Quadruplet_7Trident_clocks: 1/4
Railgun_Quadruplet_7Trident performance: 9KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
BNDM_32_hits/BNDM_32_clocks: 1/5
BNDM_32 performance: 7KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7sunhorse_hits/Railgun_Quadruplet_7sunhorse_clocks: 1/3
Railgun_Quadruplet_7sunhorse performance: 13KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7deuce_hits/Railgun_Quadruplet_7deuce_clocks: 1/4
Railgun_Quadruplet_7deuce performance: 9KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7Elsiane_hits/Railgun_Quadruplet_7Elsiane_clocks: 1/5
Railgun_Quadruplet_7Elsiane performance: 7KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7Gulliver_hits/Railgun_Quadruplet_7Gulliver_clocks: 1/5
Railgun_Quadruplet_7Gulliver performance: 7KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Railgun_Quadruplet_7Hasherezade_hits/Railgun_Quadruplet_7Hasherezade_clocks: 1/4
Railgun_Quadruplet_7Hasherezade performance: 9KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Skip-Performance(bigger-the-better): 400%, 10001 skips/iterations
Boyer_Moore_Flensburg_hits/Boyer_Moore_Flensburg_clocks: 1/5
Boyer_Moore_Flensburg performance: 7KB/clock
Doing Search for Pattern(5bytes) into String(40005bytes) as-one-line ...
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Two-Way_hits/Two-Way_clocks: 1/4
Two-Way performance: 9KB/clock
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>
The other test:
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>dir OSHO.TXT
02/18/2012 06:32 PM 40,006 OSHO.TXT
1 File(s) 40,006 bytes
0 Dir(s) 3,317,620,736 bytes free
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>type OSHO.TXT
ABCDE...ABCDEABCDEF
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>strstr_SHORT-SHOWDOWN_Microsoft_v16_Ox.exe
strstr_SHORT-SHOWDOWN, revision 8Triplet_7Trident_7Hasherezade_vs_7Gulliver2_vs_7Elsiane_vs_7sun_vs_7_vs_7sunhorse_vs_7deuce_vs_BMF, written by Kaze.
Full credits to: R.S. Boyer, J.S. Moore, R.N. Horspool, D.M. Sunday.
Usage: strstr_SHORT-SHOWDOWN.exe [anystring] [anystring]
Example1(keyboard test): strstr_SHORT-SHOWDOWN.exe
Example2(DUMBO 8x2 test): strstr_SHORT-SHOWDOWN.exe go
Example3(6x2 and 52 tests): strstr_SHORT-SHOWDOWN.exe go go
Input Pattern(up to 19+2000 chars): ABCDEF
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7sun_hits/Railgun_Quadruplet_7sun_clocks: 1/2
Railgun_Quadruplet_7sun performance: 19KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7_hits/Railgun_Quadruplet_7_clocks: 1/4
Railgun_Quadruplet_7 performance: 9KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7Trident_hits/Railgun_Quadruplet_7Trident_clocks: 1/5
Railgun_Quadruplet_7Trident performance: 7KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
BNDM_32_hits/BNDM_32_clocks: 1/4
BNDM_32 performance: 9KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7sunhorse_hits/Railgun_Quadruplet_7sunhorse_clocks: 1/6
Railgun_Quadruplet_7sunhorse performance: 6KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7deuce_hits/Railgun_Quadruplet_7deuce_clocks: 1/4
Railgun_Quadruplet_7deuce performance: 9KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7Elsiane_hits/Railgun_Quadruplet_7Elsiane_clocks: 1/3
Railgun_Quadruplet_7Elsiane performance: 13KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7Gulliver_hits/Railgun_Quadruplet_7Gulliver_clocks: 1/4
Railgun_Quadruplet_7Gulliver performance: 9KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Railgun_Quadruplet_7Hasherezade_hits/Railgun_Quadruplet_7Hasherezade_clocks: 1/3
Railgun_Quadruplet_7Hasherezade performance: 13KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Skip-Performance(bigger-the-better): 500%, 8001 skips/iterations
Boyer_Moore_Flensburg_hits/Boyer_Moore_Flensburg_clocks: 1/5
Boyer_Moore_Flensburg performance: 7KB/clock
Doing Search for Pattern(6bytes) into String(40006bytes) as-one-line ...
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Skip-Performance(bigger-the-better): 100%, 40001 skips/iterations
Two-Way_hits/Two-Way_clocks: 1/4
Two-Way performance: 9KB/clock
D:\Downloads\_2012-Feb-18\_KAZE_strstr_SHORT-SHOWDOWN__7Trident2_7Hasherezade_7Gulliver2>
I fixed a time bug (I knew about it but didn't bother until now because of OSHO.TXT never being traversed for under 1 clock):
(long)(TotalRoughSearchTime)>>4
should be
1+((long)(TotalRoughSearchTime)>>4)
in order to avoid division by zero for small strings (under 1 clock).
Get down get down get down get it on show love and give it up
What are you waiting on?
|
|
|
|

|
(sigh) nevermind, the apparent sharp drop in performance was dute to a bug in my code that manufactured the target.
"Dreams come true, not free" -- S.Sondheim
|
|
|
|

|
It might be worth starting a new thread per version.
"Windows-Linux trading places" isn't exactly descriptive any more
For the realworld tests, 7H is always better than 7G, though not by a big fraction.
For the thrash tests, the setup overhead seems to just kill it.
My realword tests are strings of 2..140 bytes against each line of a log file
with lines of 150..1350 bytes.
Just a reminder that I do not run tests against osho.txt the same way you do:
I refactor the railgun function to return on match, not just count hits.
Finding all strings that match in one pass is an unusual use case;
the only thing that comes to mind is when tokenizing a string based on a multibyte
separator. "strstr" and "memmem" don't work that way; hence my testing protocol.
So it looks like you're taking railgun in the direction of a search function
that has a 'compile' and an 'execute' part, and moving away from competing with "strstr". "strstr" is hard to compete with, because the interface does not LET you amortize compile time.
In future tests, I'll separate compile and execute times.
I've added another algorithm to the test set: BNDM ("Backward Nondeterministic DAWG Matching") or, as I like to think of it, "suffix shift-and". It tends to be faster than Horspool (any flavour) for pattern up to the bitwidth of "int", which suggests putting in as an intermediate case between the "<961byte" and the horspool/hasherezade code, for such patterns. As usual, please pardon my condensed coding style.
char *bndm(char *target, int tgtlen, char *pattern, int patlen)
{
uint8_t *tgt = (uint8_t*)target, *pat = (uint8_t*)pattern;
int i, j;
uint32_t mask, maskv[256] = {0};
for (i = 0; i < patlen; ++i)
maskv[pat[i]] |= 1 << (patlen - 1 - i);
for (i = 0; i < tgtlen - patlen; i += j) {
mask = maskv[tgt[patlen - 1 + i]];
for (j = patlen; mask;) {
if (!--j) return target + i;
mask = (mask << 1) & maskv[tgt[i + j - 1]];
}
}
return NULL;
}
I'm just messing with the 128-bit version using XMM registers. The search is faster, but the setup time is (sigh) too high -- from which you can divine that MY objective is faster algorithms that plug into generic API's. Naively, it has to initialize a 256*16 byte table.
"Dreams come true, not free" -- S.Sondheim
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
Tuned function for searching a needle in a haystack
| Type | Article |
| Licence | CPOL |
| First Posted | 9 Sep 2011 |
| Views | 40,449 |
| Downloads | 650 |
| Bookmarked | 16 times |
|
|