Click here to Skip to main content
Click here to Skip to main content

Fastest Exact/Fuzzy/Wildcard Full-text Searcher!?

, 10 Dec 2013
Rate this:
Please Sign up or sign in to vote.
Optimized multi-threaded console full-text searcher

Introduction

This article is for those who appreciate speed.

Instead of "losing" time to learn/tweak/benchmark Exact/Fuzzy/Wildcard searching techniques, here come my three etudes heavily tested and thoroughly optimized, written in C, in a working environment, i.e. as practical implementations.

You can always buy books with algorithms while to obtain a working (at superfast speeds) source code is an entirely different case. Why such misalignment? So many people well versed in mathematics & programming and so much work half-done! Money, money, money, surprisingly after years of wondering the real cause (hidden behind the money) has been locked - the lack of faith. My belief is that one exuberant way to follow is to preserve the spirit of Nikola Tesla, an idol of mine, for whom the driving force was not money but innocence.

Suppose you need to extract all lines with a specific IP or filename from a 42GB log file, fast!

Or, suppose you need to find 'edelweiss' while you search for 'edelvais', 'ON THE FLY'!

Who ya gonna call?

  • Ghostbusters won't do.
  • GREP, nah-nah, I said fast.
  • SQL, skip it, it is so wondrous that makes of dummies "PRO programmers". I love amateurism, [French, from Latin amator, lover, from amare, to love.], where the innocence has not faded away.

    Ahh
    Where do we go now?
    I don't know
    Innocence over
    Fading fast
    You're still promising perfection, perfection
    With empty words
    With empty words
    With empty words
    With empty words
    And it's hard to break a habit
    You're lost inside it
    Who do you love now?
    A moment of coldness
    Cuts through me (cuts through me)
    Closing down the way I feel
    How come I don't see so clearly, so clearly
    Fighting to breathe
    Fighting to breathe
    Fighting to breathe
    Fighting to breathe
    /Dannii Minogue/

    Here comes the FASTEST full-text searcher known to me, Kazahana.

Background

Useful Links

Kazahana at a glance:

  • An open-source project;
  • Unmatched speed performance;
  • Always 16 threads enforced via OpenMP;
  • Utilizes the I/O read bandwidth at its fullest, e.g. at 234.3MB/s on SSD with average 236.6MB/s Linear Read, 1MB block;
  • Dumps physical lines (LF ending) holding your pattern;
  • Works under Linux/Windows both as 32/64bit code;
  • Exact matching via 'Railgun_Sekireigan_Wolfram' - at superiour speed;
  • Highly optimized fuzzy string matching via Wagner–Fischer algorithm, 3 Kaze's heuristics used;
  • EXHAUSTIVE fuzzy string matching via Wagner–Fischer algorithm mode, powerful;
  • Wildcard matching with firm grip, 9 wildcards available;
  • FASTEST Wildcard matching with "classical" 2 wildcards available, traversal speed at 75+MB/s PER THREAD on Core 2 T7500;
  • Adjustable small (usually half the size of the last level CPU cache) master buffer, 1-7MB, sometimes 350+MB;
  • 100% FREE;
  • Written in C by machinely yours Kaze.

Using the Code

The compilation is simple and straightforward, just use respectively:
Intel, Windows:

icl /O3 /Qunroll Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  /FAcs /FeKazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12 
  /Qopenmp /Qopenmp-link:static -DCommence_OpenMP -D_icl_mumbo_jumbo_

MinGW, Windows:

gcc -O3 -funroll-loops -static -o Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_GCC_472 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  -fopenmp -DCommence_OpenMP -D_gcc_mumbo_jumbo_

GCC, Linux:

gcc -O3 -funroll-loops -static -o Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD_GCC_471 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c 
  -fopenmp -D_FILE_OFFSET_BITS=64 -DCommence_OpenMP -D_gcc_mumbo_jumbo_
// Change appropriately in 'Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM.c':
#define _WIN32_ENVIRONMENT_
//#define _POSIX_ENVIRONMENT_

The three regimes of searching Fuzzy/Wildcard/Exact in that order are given below:

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-
  ++fix+nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 4 
  "beautifability" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,065,535 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/179,525/18
Kazahana: Performance: 34 KB/clock
Kazahana: Performance: 2,890 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 110/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 18,577,955 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
dutiability
eatability
bearability
equitability
exuviability
fatigability
breathability
buffability
certifiability
practicability
quantifiability
realisability
realizability
rectifiability
refutability
reputability
negotiability
satiability

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 
  "*fathom*" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,196,607 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/17
Kazahana: Performance: 60 KB/clock
Kazahana: Performance: 5,046 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 63/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 11,228,173 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
[*fathom*] fathom /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomable /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomage /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomed /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomers /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathometer /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathometers /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathoming /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomless /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomlessly /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathomlessness /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] fathoms /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomability /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomable /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomableness /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomably /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[*fathom*] unfathomed /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM_HEXADECAD-Threads_IntelV12.exe" 
  "molyb" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram, copyleft Kaze 2013-Nov-15.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,185,042 bytes/clock
Kazahana: Dumped xgrams: 24
Kazahana: Performance: 223 KB/clock
Kazahana: Performance: Total/fread() clocks: 17/16
Kazahana: Performance: I/O time, i.e. fread() time, is 94 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 10,902,419 ticks
Kazahana: Done.

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>type Kazahana.txt
[molyb] ferrimolybdite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] ferromolybdenum /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdaena /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdates /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdena /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenosis /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenous /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdenum /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdoenzyme /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdomancy /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdomenite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdophosphate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdophyllite /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdopterin /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdosis /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] molybdous /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] phosphomolybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] phosphomolybdic /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[molyb] polymolybdate /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM>

As the old Chinese proverb goes, two pictures are worth 2000 words:

Picture #1

683665/Bonboniera_rag.png

For above full-text search for "metal fatigue" in Wikipedia file (performed on my laptop 'Bonboniera'), the stats are:

Total time vs I/O time: 183941 clocks vs 164638 clocks, meaning the Search time (the time spent in 'Railgun_Sekireigan_Wolfram') equals 183941-164638 = 19303 clocks, or all the occurrences of the needle "metal fatigue" were found at 45,261,093,005 bytes / 19303 clocks = 2236MB/s, "normally" such needle is sought at 3500MB/s as one monolith block, here the master buffer was 7168KB, divided into 16 chunks gives (43164MB/7168KB)*16 = 98660 'WolfRAM' invocations, I call this MUTSI performance. The more precise timing confirms that I/O time is 359617617625 ticks / 2200000000 ticks = 163 seconds.

Picture #2

683665/Results_Core2_T7500_small.png

The above benchmark has been done on my laptop with Core 2 T7500 2200MHz (with 5454MB/s Memory Read and 4014MB/s Memory Copy according to EVEREST). The cause for such chocked results (1.2-1.4GB/s) is the need to find/extract the lines containing the patterns, however the real brake is not that logic, 'memchr()' partially, but the 'memcpy()'. Some benchmarks on 'Ivy Bridge' gave 42GB/s memory read while the memory copy struggled to reach 19-20GB/s. Since the haystack is traversed indirectly, i.e., memcpy() is involved, it takes 2-3 seconds to copy the haystack to master buffer. These two brakes take 75% (on my laptop with Core 2) of the whole time.

All the 16 occurrences of the needle "metal fatigue" were found at 1024 Megabytes / (781-574) clocks = 4946MB/s, muffinesque!

Kazahana: Dumped xgrams: 16
Kazahana: Performance: 1,342 KB/clock
Kazahana: Performance: Total/fread() clocks: 781/574
Kazahana: Performance: I/O time, i.e. fread() time, is 73 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,331,116,253 ticks

Yum-yum, the ratio 4946MB/s:2236MB/s or 2.2:1 shows more distinctively how cool is 'WolfRAM'.
Double-check: 1,331,116,253 ticks / 2200000000 ticks = 0.6 seconds ~ 574 clocks.

Back to those 42GB, if the machine is a Beast, e.g. has 64GB Main RAM (system cache could house the whole file) and 16 threads you can expect |???| 'Hana Senpu' 'Flower Whirlwind' the very meaning of Kazahana.

Seeing, how even the most basic tasks as finding some text creep, makes me sick.
Full-text processing is important, unable to extract off-line needed Wikipedia's lines within a second shouts how jammed are the hardware & software of today.

On a daily basis no-no on an hourly basis, I need ALL sentences/paragraphs containing specified word/phrase, e.g. I want all the 338 occurrences of "metal fatigue" within Wikipedia in form:

[metal fatigue] The [[ICE 1]] trains were equipped with single-cast [[wheelset 
  (rail transport)|wheelsets]], known as [[monobloc]] wheels. Once in service it soon became apparent 
  that this design could, as a result of [[metal fatigue]] and out-of-round conditions, result in [[resonance]] 
  and vibration at cruising speed. Passengers noticed this particularly in the restaurant car, where there 
  were reports of loud vibrations in the dinnerware and of glasses "creeping" across tables. /enwiki-20130904-pages-articles.xml/
[metal fatigue] At Eschede in Lower Saxony, Germany a high speed passenger train became derailed in 1998, 
and 101 persons died. The primary cause was the fracture from metal fatigue of a wheel tyre; the train failed 
to negotiate two sets of points and struck the pier of an overbridge. It was the most serious railway accident 
in Germany, and also the most serious on any high speed (over {{convert|200|kph|mph}}) line. 
Ultrasonic testing had failed to reveal the incipient fracture.<ref name = eschede>Erich Preuß, 
''Eschede, 10 Uhr 59. Die Geschichte einer Eisenbahn-Katastrophe'', GeraNova Zeitschriftenverlag, 2002, 
ISBN 3-932785-21-5</ref> See [[Eschede train disaster]]. /enwiki-20130904-pages-articles.xml/
[metal fatigue] * The [[De Havilland Comet#Comet disasters of 1954|De Havilland Comet disasters of 1954]], 
later determined to be structural failures due to unanticipated metal fatigue at the corners of square 
windows used by the Comet 1. /enwiki-20130904-pages-articles.xml/
...

In my view, nowadays PCs are a thing of the past already, that is, they will be replaced by much faster CHEAPER mainstream computers - it is prophesied by Slava Sevryukova a revered by me Bulgarian ... prophetess.
Simply, the nearing big technological jump will blow [away] our narrowmindedness, Kazahana is only one petite typhoon-class forerunner.

"The modern form of 'typhoon' was influenced by a borrowing from the Cantonese variety of Chinese, namely the word 'taaîfung', and respelled to make it look more like Greek. 'Taaîfung', meaning literally "great wind," was coincidentally similar to the Arabic borrowing and is first recorded in English guise as 'tuffoon' in 1699."
/Heritage/

Zephyr
Knowing you is a pleasure
But I hear the storms
Calling me
Calling me
Zephyr
They're calling me
So bring on the fire
Bring on the ice
Give me all you got
I'm through rolling dice
Steady my guns
Steady my mind
Once I make my play
There's no rewinding
Zephyr
A taste would be like heaven
I'll fight the fight
To feel you once again
Again, again, again, again
/Conjure One - Zephyr, 'Exilarch' album/

Since it is Open Project, I hope to improve Kazahana with the help of more experienced coders and finally a search tool matching incoming speedy hardware to become freely available.

An Add-on from 2013-Nov-22

Last thing first. Very disappointed I am by the combination "gcc+OpenMP+recursion", my advice is to avoid this trio unless you are versed in all the three. Why, I lost 2 hours to wrestle with "MinGW+OpenMP+recursion" and "Linux+OpenMP+recursion", simply crash "Segmentation fault" happens when wildcard search is performed (with long lines) and gcc+OpenMP are in use?!
I won't waste more time in this "trolldom". One way to end the crisis is to replace OpenMP with PTHREADS, but this is not my priority ATM.
Don't make a mistake about it, the problem is not in Kazahana but in the trio, also even MONAD (single-threaded) ELF/EXE ragdolls all the GREPesque creepers, that is, it walks compared to them.
Intel's executables work just fine - both the 32/64bit multi-threaded EXEs fly in the very same cases where the trio crashes.

Yesterday, a minor change was done in order to allow fuzzy searching into lines of all lengths, further below a search into Wikipedia dump is given, the change itself:

//if (m>MaxLineLength)
//{ printf( "\nKazahana: Incoming xgram exceeding the limit.\n" ); exit( 2 ); }
// Above two commented lines are too severe, changed with next line allowing to search into lines bigger than our needs:
if (m<=MaxLineLength)

It was done this way because I needed fuzzy only for specific x-gram files. This change opens the possibility to look into Wikipedia file fuzzily, uh-huh. The bigger the Levenshtein Distance, the heavier calculations are needed, i.e., the threads are fed well (in contrast to EXACT mode where mostly Main Memory I/O is stressed), let's traverse the Wikipedia 1024MB dump for LD = 18 which will give all the "mutations" of our string, not by the way, such type of searching is very useful when you need all variants of a given SENTENCE when you need to see how plausible are the existing variants within your LD. In other words, Levenshtein is extremely useful not only for WORDS & PHRASES (so-called x-grams) but also for SENTENCES and ... PARAGRAPHS, wut-wut! Imagine a composite line (Wikipedia is built from such lines): a physical line composed of several sentences which appear as one paragraph after the visual wrapping. That's where the 16 threads scream, FUZZY & WILDCARD, for 2 threads they almost double the performance, for 4 threads likewise - quadruple.

Two tests, one under Linux the other under Windows are given below. The three search modes are "tortured".

Test #1, Linux 32bit, Kazahana as 32bit single-threaded ELF (gcc 4.7.1; options: -O3 -funroll-loops -static):

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 
  "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,001,204 bytes/clock
Kazahana: Dumped xgrams: 16
Kazahana: Performance: 1 KB/clock
Kazahana: Performance: Total/fread() clocks: 900,001/550,000
Kazahana: Performance: I/O time, i.e. fread() time, is 61 percents
Kazahana: Done.

real 0m0.934s
user 0m0.380s
sys  0m0.552s

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 
  "*[[*Category:*pai*nters *" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,014 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 0 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 75,640,001/560,000
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Done.

real 1m15.750s
user 1m15.169s
sys  0m0.584s

time ./Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD_GCC_471 18 
  "[[Category:Italian painters of the 20th century]]" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,089 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 0 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 11,980,001/550,000
Kazahana: Performance: I/O time, i.e. fread() time, is 4 percents
Kazahana: Done.

real 0m12.332s
user 0m11.613s
sys  0m0.716s

Also, I wanted to put on the map the performance of the three types of searching side-by-side of Intel's 32bit & 64bit code, a quick shootout follows:

Test #2, Windows 7 64bit, Kazahana as 32/64bit multi-threaded EXE (Intel 12.1; options: /O3 /Qunroll /Qopenmp /Qopenmp-link:static):

D:\Kazahana_C_ELF_EXE\EXE>dir kaz*

11/22/2013  03:44 AM               389 Kazahana_compile_WolfRAM+_Intel.bat
11/22/2013  04:07 AM           779,675 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+.c
11/22/2013  04:11 AM         7,543,701 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_32bit.cod
11/22/2013  04:09 AM         8,296,283 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_64bit.cod
11/22/2013  04:11 AM           466,944 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_HEXADECAD-Threads_IntelV12_32bit.exe
11/22/2013  04:09 AM           555,520 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_HEXADECAD-Threads_IntelV12_64bit.exe
11/22/2013  04:11 AM           153,088 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD-Thread_IntelV12_32bit.exe
11/22/2013  04:09 AM           183,808 Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+_MONAD-Thread_IntelV12_64bit.exe

D:\Kazahana_C_ELF_EXE\EXE>dir en*

11/20/2013  03:12 AM     1,073,741,824 enwiki-20130904-pages-articles.7z.001

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,001,183,695 bytes/clockKazahana: Dumped xgrams: 16
Kazahana: Performance: 1,157 KB/clock
Kazahana: Performance: Total/fread() clocks: 906/671
Kazahana: Performance: I/O time, i.e. fread() time, is 74 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,397,776,523 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe "*[[*Category:*pai*nters *" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,031,162 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 30 KB/clock
Kazahana: Performance: 271 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 34,523/684
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,420,836,593 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_32bit.exe 18 "[[Category:Italian painters of the 20th century]]" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,000,146,546 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 142 KB/clock
Kazahana: Performance: 1,276 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 7,349/730
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,426,790,820 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe "metal fatigue" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,001,125,318 bytes/clockKazahana: Dumped xgrams: 16
Kazahana: Performance: 1,100 KB/clock
Kazahana: Performance: Total/fread() clocks: 953/717
Kazahana: Performance: I/O time, i.e. fread() time, is 75 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,380,011,061 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe "*[[*Category:*pai*nters *" enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK

/; 00,000,029,966 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 29 KB/clock
Kazahana: Performance: 261 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 35,913/509
Kazahana: Performance: I/O time, i.e. fread() time, is 1 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,426,409,831 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  _HEXADECAD-Threads_IntelV12_64bit.exe 18 "[[Category:Italian painters of the 20th century]]" 
  enwiki-20130904-pages-articles.7z.001 1536 
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, 
  r. 1-++fix+nowait_critical_nixFIX_Wolfram+, copyleft Kaze 2013-Nov-21.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,174,890 bytes/clockKazahana: Total/Checked/Dumped xgrams: 9,382,307/2,005,354/69
Kazahana: Performance: 170 KB/clock
Kazahana: Performance: 1,522 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,163/647
Kazahana: Performance: I/O time, i.e. fread() time, is 10 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,436,490,308 ticks
Kazahana: Done.

D:\Kazahana_C_ELF_EXE\EXE>type kazahana.txt 
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Films set in the 13th century]]
[[Category:West Indian cricketers of the 21st century]]
[[Category:Films set in the 13th century]]
[[Category:Italian painters of the 13th century]]
[[Category:Italian painters of the 14th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian Ministers of the Interior]]
[[Category:Films set in the 23rd century]]
[[Category:Italian people of Irish descent]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 18th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Novels set in the 18th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Chilean Ministers of the Interior]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian people of Greek descent]]
[[Category:Italian painters of the 20th century]]
[[Category:Italian painters of the 21st century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 24th century]]
[[Category:Films set in the 24th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 15th century]]
[[Category:Films set in the 16th century]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian people of French descent]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian people of the Risorgimento]]
[[Category:Films set in the 23rd century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Italian painters of the 17th century]]
[[Category:Italian painters of the 18th century]]
[[Category:Films set in the 19th century]]
[[Category:Films set in the 17th century]]
[[Category:Italian painters of the 19th century]]
[[Category:Italian painters of the 20th century]]
[[Category:Novels set in the 19th century]]
[[Category:Films set in the 14th century]]
[[Category:Italian people of Swedish descent]]
[[Category:Italian people of French descent]]
[[Category:Italian people of the Risorgimento]]
[[Category:Italian painters of the 15th century]]
[[Category:Italian painters of the 16th century]]
[[Category:Novels set in the 19th century]]
[[Category:Films set in the 20th century]]
[[Category:Italian people of French descent]]
[[Category:Films set in the 16th century]]
[[Category:Films set in the 23rd century]]
[[Category:Films set in the 10th century]]

D:\Kazahana_C_ELF_EXE\EXE>

Hm, I expected to see 'French/Dutch/Russian/... painters', oh it is only 1/42 of the whole Wikipedia. As you can see, all the 69 hits within Levenshtein Distance 18 for string "[[Category:Italian painters of the 20th century]]" are dumped at speed:

* 32bit code: 9,382,307/7,349 = 1,276,678 lines per second.
* 64bit code: 9,382,307/6,163 = 1,522,360 lines per second.

Converted to more human readable format, to see these 69 occurrences it takes 7.4/6.2 seconds on my laptop. Mmm... more threads are wanted.

In short, speaking of 2 threads, the EXACT search regime is bound to Main RAM copy and shows no significant boost while FUZZY & WILDCARD modes run twice as fast.

Finally, it would be nice someone knowing the reason for gcc's buggy behavior to share it with us. In the meantime, the source plus the single-threaded ELF and multi-threaded EXEs are available at download section, enjoy!

An Add-on from 2013-Nov-24

I'm sorry, fixing a stupid bug forced me to recompile and to reupload.

The bug was that arrays below were declared as 'unsigned int' in Kazahana while in Galadriel they were 'int', the fix itself:

int LevenshteinT1[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
...
int LevenshteinTf[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]

Galadriel source is included in the ZIP file, the whole Wagner-Fischer section was initially written in my tool Galadriel and just then copied-and-pasted in Kazahana, during the transfer stupidly I had prefixed 'int' with 'unsigned' (values cannot be negative but they determine the subtraction), the original fragment is given:

int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
...
if ( 0 < wrdlen && wrdlen < MaxLineLength +1+1)
{
    wrd[ wrdlen ] = 0;
    if ( wrd[ wrdlen-1 ] == 13 ) //CR
        wrd[ --wrdlen ] = 0;
// A simple heuristic #1: Don't enter the nasty loops unless MaximumLevenshteinDistance >= ABS(m-n).
m = wrdlen; // strlen(wrd);
if (m>MaxLineLength)
{ printf( "Galadriel: Incoming xgram exceeding the limit.\n" ); return( 1 ); }
#if defined(_WIN32ASM_)
if (AtMostLevenshteinDistance >= abs_AF(m-n))
#else
if (AtMostLevenshteinDistance >= MAX(m,n)-MIN(m,n)) // This is the only add-on for r.1+
#endif
{
// LD [
// A simple heuristic #3: Don't recalculate rows for identical leftmost chars we already have.
// Caution: heuristic #3 works in pair with heuristic #2
// (the bail out position is less or equal to cached chars), whereas heuristic #1 is standalone.
StartingPosition = 1;
while (    wrdCACHED[StartingPosition-1] && wrdCACHED[StartingPosition-1]==wrd[StartingPosition-1])
// No need of && wrd[StartingPosition-1] 
    StartingPosition++;
// The bail out 'i' value (heuristic #2) affects our cached value here, 'StartingPosition' cannot be greater than 'i':
StartingPosition = MIN(StartingPosition, i);
// Printing Matrix [
// printf("\n%s", wrd);
// Printing Matrix ]
    WordsChecked++;
    SkipHeuristic=0;
    for(i=StartingPosition;i<=m;i++) { // StartingPosition is in range 1..
// Printing Matrix [
// printf("\n");
// Printing Matrix ]
    for (j=1;j<=n;j++) {
        if(wrd[i-1] == argv[2][j-1])
            Levenshtein[i][j] = Levenshtein[i-1][j-1];
        else
#if defined(_WIN32ASM_)
            Levenshtein[i][j] = min_AF(Levenshtein[i-1][j]+1, 
              Levenshtein[i][j-1]+1, Levenshtein[i-1][j-1]+1); // Variant 1: 237,270 xgrams/s
#else
            Levenshtein[i][j] = MIN(MIN((Levenshtein[i-1][j]+1),(Levenshtein[i][j-1]+1)),
              (Levenshtein[i-1][j-1]+1)); // Variant 2: This MS code is faster than above jumpless code! // 358,327 xgrams/s
            //{Levenshtein[i][j] = MIN(MIN(Levenshtein[i-1][j],Levenshtein[i][j-1]),
            //  Levenshtein[i-1][j-1]); --Levenshtein[i][j];} // Variant 3: This compound
            // line is much slower than above inc-inc-inc code! // 237,270 xgrams/s
#endif
// Printing Matrix [
// printf("%s ", _ui64toaKAZEzerocomma(Levenshtein[i][j], llTOaDigits, 10)+(26-2) );
// Printing Matrix ]
        }

        // A simple heuristic #2: Discontinue the nasty vertical loop (i.e. m)
        // when the LD in cell in the last column minus the remaining vertical cycles is greater than our MAX LD:
        if ( Levenshtein[i][n] - (m-i) >= 0 && AtMostLevenshteinDistance < 
          Levenshtein[i][n] - (m-i) ) {SkipHeuristic=1; break;}
          // Caution: Levenshtein[i][n] can be less than (m-i), this changes nothing the logic is the same.
        //          C  C  C  C  C  C  C  C  C  C  C  C  C
        //          o  o  o  o  o  o  o  o  o  o  o  o  o
        //          l  l  l  l  l  l  l  l  l  l  l  l  l
        //          u  u  u  u  u  u  u  u  u  u  u  u  u
        //          m  m  m  m  m  m  m  m  m  m  m  m  m
        //          n  n  n  n  n  n  n  n  n  n  n  n  n
        //          0  0  0  0  0  0  0  0  0  1  1  1  1
        //          1  2  3  4  5  6  7  8  9  0  1  2  3
        //          p  s  y  c  h  e  d  l  i  c  i  z  e
        // p Row01: 00 01 02 03 04 05 06 07 08 09 10 11 12 !Caution!
        // Here the last cell Levenshtein[i][n] - (m-i) = 12 - (14-1) = -1
        // s Row02: 01 00 01 02 03 04 05 06 07 08 09 10 11
        // y Row03: 02 01 00 01 02 03 04 05 06 07 08 09 10
        // c Row04: 03 02 01 00 01 02 03 04 05 06 07 08 09
        // h Row05: 04 03 02 01 00 01 02 03 04 05 06 07 08
        // e Row06: 05 04 03 02 01 00 01 02 03 04 05 06 07
        // d Row07: 06 05 04 03 02 01 00 01 02 03 04 05 06
        // e Row08: 07 06 05 04 03 02 01 01 02 03 04 05 05
        // l Row09: 08 07 06 05 04 03 02 01 02 03 04 05 06
        // i Row10: 09 08 07 06 05 04 03 02 01 02 03 04 05
        // c Row11: 10 09 08 07 06 05 04 03 02 01 02 03 04
        // i Row12: 11 10 09 08 07 06 05 04 03 02 01 02 03
        // z Row13: 12 11 10 09 08 07 06 05 04 03 02 01 02
        // e Row14: 13 12 11 10 09 08 07 06 05 04 03 02 01
    }
    if (SkipHeuristic==0 && AtMostLevenshteinDistance >= Levenshtein[m][n]) {
        fprintf( fp_outLINE, "%s\n", wrd); DumpedLines++;
        if ((DumpedLines & 0xff) == 0xff)
            fflush(fp_outLINE); // Not sure: CTRL+C doesn't flush?!
    }
    // The bail out 'i' value (heuristic #2) affects our cached value here, 'i' is the needed one:
    memcpy(wrdCACHED, wrd, wrdlen+1); // +1 because we need the ASCII 000 termination
// LD ]
}            
}

And to highlight this mistake:

#define MaxLineLength 126+100

int main(int argc, char **argv) {

//unsigned int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]
unsigned int AtMostLevenshteinDistance;
int i,j,m,n;

AtMostLevenshteinDistance = 2;
m=14;
i=1;
n=12;
Levenshtein[i][n]=12;

//if ( Levenshtein[i][n] - (m-i) >= 0 && AtMostLevenshteinDistance
// < Levenshtein[i][n] - (m-i) ) {SkipHeuristic=1; break;}
// Caution: Levenshtein[i][n] can be less than (m-i), this changes nothing the logic is the same.

printf("'Levenshtein[i][n] - (m-i) >= 0' equals %d\n", Levenshtein[i][n] - (m-i) >= 0); 
printf("'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals %d\n", 
  AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)); 

exit (0);
}

// When 'unsigned int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]':
// 
// D:\_KAZE>bug.exe
// 'Levenshtein[i][n] - (m-i) >= 0' equals 1
// 'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals 1
// 
// When 'int Levenshtein[MaxLineLength+1][MaxLineLength+1]; // declare int d[0..m, 0..n]':
// 
// D:\_KAZE>bug.exe
// 'Levenshtein[i][n] - (m-i) >= 0' equals 0
// 'AtMostLevenshteinDistance < Levenshtein[i][n] - (m-i)' equals 1

Well, 12 - (14-1) = -1 i.e. -1 >= 0 should be 0 but due to "shadowy" conversion it becomes 1. "Elementary" bugs like this show how C basics are not to be taken lightly, some slippery stones exist.

Now, it's time for something exciting and extremely ... SLOW - Fuzzy Exhaustive Searching.

My intention is to add this must-have feature in revision 1 of Kazahana, traversing (fuzzily searching) at each possible position i.e. FULLY all the incoming lines, immediately an example:

Kazahana "*edelweiss*" enwiki-20130904-pages-articles.7z.001 1537
D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "*edelweiss*" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/33

The above WILDCARD search resulted in 33 hits, among them 3 different cases: edelweiss/Edelweiss/EDELWEISS

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "edelweiss" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 5

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "Edelweiss" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 25

D:\_KAZE>"Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fix_HEXADECAD-Threads_IntelV12_32bit.exe" "EDELWEISS" enwiki-20130904-pages-articles.7z.001 1537
Kazahana: Dumped xgrams: 3

Okay, EXACT searches gave:
edelweiss 5 occurrences
Edelweiss 25 occurrences
EDELWEISS 3 occurrences

One of those 33/5 lines is a 46 bytes long string: |Edelweiß || [[edelweiss]] || edelweiss flower

Let's search fuzzily for "edelvais", pretending we don't know the right spelling:

To find "edelweiss" we need at least Levenshtein distance 3 (1 insertion and 2 substitutions): 'edelvais' vs 'edelweiss':

  • the mispelled one has one character less;
  • the mispelled one has 2 wrong characters: 'v' & 'a' instead of 'w' & 'e'.

    The present fuzzy mode is non-exhaustive, now we search n bytes long string (the pattern) in m bytes long stringS (the incoming lines) looking to fall under this case:

if (AtMostLevenshteinDistance >= MAX(m,n)-MIN(m,n)) {
//...
}

The above condition limits our hits, it has its own niche and usefulness, while the EF (Exhaustive Fuzzy) is to complement the three existing regimes.

// EXHAUSTIVE [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
// Here the old fuzzy is complemented with exhaustive one, using BBs of the incoming string (here 'm').
// We need to factorize 'm' down to all 'n+AtMostLevenshteinDistance'
// long strings/BBs and to search into them ONE-BY-ONE - a gruelling task indeed!
// Example:
// One of those 33/5 lines is a 46 bytes long string, i.e. m = 46:
// |Edelweiß || [[edelweiss]] || edelweiss flower
// Let's search fuzzily for "edelvais", pretending we don't know the right spelling:
// To find "edelweiss" we need at least Levenshtein distance 3, i.e. n = 8, AtMostLevenshteinDistance = 3:
// 'edelvais' vs 'edelweiss':
// - the mispelled one has one character less;
// - the mispelled one has 2 wrong characters: 'v' & 'a' instead of 'w' & 'e'.
// From above we need Building-Blocks of 46 bytes order 8+3.
// Inhere we are using order 11, 'm - Order + 1' is the number of total BBs for text 'm' bytes long: 46-11+1 = 36
// The 36 BBs:
// |Edelweiß |
// Edelweiß ||
// ...
// weiss flowe
// eiss flower
    for (l=0; l < m-(n+AtMostLevenshteinDistance)+1; l++) {
    // Here we throw 'edelvais' against '|Edelweiß |', ... 'eiss flower':
    // ...
    }
// Even more gruelling, 'edelvais' vs 'EDELWEISS' needs LD = 9 (1 insertion and 8 substitutions) as a minimum.
// EXHAUSTIVE ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
To be done and benchmarked.

edelweiss
n.
An alpine plant (Leontopodium alpinum), native to Europe and having leaves covered with whitish down and small flower heads surrounded by conspicuous whitish bracts.
[German : edel, noble (from Middle High German edele, from Old High German edili) + weiss, white (from Middle High German wiz, from Old High German wiz, hwiz; see kweit- in Indo-European roots).]

endure
v. tr.
1. To carry on through, despite hardships; undergo: endure an Arctic winter.
2. To bear with tolerance: "We seek the truth, and will endure the consequences" (Charles Seymour). See Synonyms at bear1.
v. intr.
1. To continue in existence; last: buildings that have endured for centuries.
2. To suffer patiently without yielding.
[Middle English enduren, from Old French endurer, from Latin indurare, to make hard : in-, against, into ; see en-1 + durus, hard; see deru- in Indo-European roots.]

One of my favorite flowers - an ENDURER - edelweiss endures freezing winds suffering patiently without yielding.

Deep nobility/beauty/wisdom is based on ENDURANCE, no doubt about it.
Likewise, in that spirit, Kazahana's 4th mode will be battle against time, I am already scared & thrilled how heavily amazing it is gonna be - extremely slow & handy in the same time.

An add-on from 2013-Nov-29

WILDCARD mode has been enriched with one additional FAST wildcard sub-mode.

Seeing, three-four days ago, the Mr. Handy's function - a fellow CodeProject member - I postponded all my activities in an attempt to come up with some gem.

I already have had a wildcard searcher working just fine, a very versatile one, but slow. Therefore I added a FAST add-on to my 3-in-1 searcher Kazahana thus allowing VERSATILE (9 wildcards) and FAST (the classic 2 wildcards) modes. I also tested mine vs his using 2 threads, in short they are really fast, the 2 threads utilize 180-192% the CPU achieving 140-170MB/s TOTAL traversal speed, see further below.

The FAST mode uses only the TWO classic wildcards '*' and '?' (where mine are '&' and '+' respectively) as follows:

...
Note9a: Nine SLOW wildcards are available:
        wildcard '*' any character(s) or empty,
        wildcard '.' any ALPHA character(s) or empty,
        wildcard '`' any NON-ALPHA character(s) or empty,
        wildcard '@'/'#' any character {or empty}/{and not empty},
        wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
        wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
Note9b: Two FAST wildcards are available:
        wildcard '&' any character(s) or empty,
        wildcard '+' any character and not empty.
Note9c: Don't mix SLOW and FAST, the SLOW overrides the FAST, i.e. presence of at least one of the 9 wildcards cancels FAST mode.
...

Unfortunately, his function is faster than mine, the only two things that make my etude to trump his:

  • my function is PLAINer & BEAUTIFULer & SHORTer;
  • my function is licenseless i.e. 100% FREE.

    I will need help to speed it up, the source is given below:

// Igor Pavlov's recursive variant modified (and converted to iterative) by Kaze, 2013-Nov-28:
//static boolean Wildcard_IP(const char *mask, int maskPos, const char *name, int namePos)
//{
//int maskLen = maskGLOBALlen - maskPos;
//int nameLen = nameGLOBALlen - namePos;
//if (maskLen == 0)
//    if (nameLen == 0)
//        return true;
//    else
//        return false;
//if (mask[maskPos] == '*') {
//    if (Wildcard_IP(mask, maskPos + 1, name, namePos))
//        return true;
//    if (nameLen == 0) 
//        return false;
//    return Wildcard_IP(mask, maskPos, name, namePos + 1);
//} else if (mask[maskPos] == '?') {
//    if (nameLen == 0) 
//        return false;
//    return Wildcard_IP(mask, maskPos + 1, name, namePos + 1);
//} else {
//    if (mask[maskPos] != name[namePos])
//        return false;
//    return Wildcard_IP(mask, maskPos + 1, name, namePos + 1);
//}
//}
int WildcardMatch_Iterative_Kaze(const char* mask, const char* name) {
const char* maskSTACK;
const char* nameSTACK;
int BacktrackFlag = 0;
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            if (!BacktrackFlag) return 0;
            name++;
            goto Backtrack;
        }
    } 
} 
while (*maskSTACK == '*') ++maskSTACK;
return (!*maskSTACK);
}

The big benchmark, searching all lines in Wikipedia 1024MB dump:
My function is used in executable: Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe
His function is used in executable: Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe
The runs are, my wildcards '&'/'+' stand for '*'/'?':

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+
  &" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&Traven&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "&Traven&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_JH.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Speed results for pattern "&karolina&wydra&":
D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,160,591 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 156 KB/clock
Kazahana: Performance: 1,401 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,694/654
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,334,917,342 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.717 =    9%
User Time    =    13.041 =  178%
Process Time =    13.759 =  188%
Global Time  =     7.298 =  100%

D:\_KAZE>timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+
  fixITER_HEXADECAD-Threads_IntelV12_JH.exe "&karolina&wydra&" 
  enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,167,227 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 163 KB/clock
Kazahana: Performance: 1,459 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 6,428/639
Kazahana: Performance: I/O time, i.e. fread() time, is 9 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,308,754,183 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.748 =   11%
User Time    =    12.230 =  181%
Process Time =    12.979 =  192%
Global Time  =     6.729 =  100%

Here, to avoid flooding only the highlights are given:
Speed results for pattern "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&":

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 184 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 186 KB/clock

Speed results for pattern "&Traven&":

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 145 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 152 KB/clock

Speed results for pattern "+++++++++ +++++++++ & influence&":

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 189 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 189 KB/clock

Speed results for pattern "&metal+fat&u+&":

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 152 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 159 KB/clock

Speed results for pattern "&[+&Category:&pai&nte++ &+++++&":

Mine:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 153 KB/clock
His:
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 159 KB/clock

And some results from 'WildBench.zip' given at download section:

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>TEST.BAT

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12.exe" 
  "&a&e+f++&e" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1024
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1024KB ... OK
-; 00,000,149,796 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/11
Kazahana: Performance: 105 KB/clock
Kazahana: Performance: 8,831 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 36/5
Kazahana: Performance: I/O time, i.e. fread() time, is 13 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 5,433,065 ticks
Kazahana: Done.

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>"Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER_HEXADECAD-Threads_IntelV12_jh.exe" 
  "&a&e+f++&e" MASAKARI_General-Purpose_Grade_English_Wordlist.wrd 1024
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
  searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER, copyleft Kaze 2013-Nov-29.
Enforcing FAST wildcard mode ...
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1024KB ... OK
-; 00,000,149,796 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 317,919/317,919/11
Kazahana: Performance: 105 KB/clock
Kazahana: Performance: 8,831 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 36/0
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 5,974,419 ticks
Kazahana: Done.

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>type Kazahana.txt
[&a&e+f++&e] afterfame /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] afterfuture /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] arsenferratose /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] azelfafage /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] brazenface /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] laissezfaire /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] malperformance /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] salesforce /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] schadenfreude /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] waterfree /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[&a&e+f++&e] zauberflote /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>

And speed results from WildBench (Dezhi Zhao's bench modified) given at download section:

// Results on my laptop Core 2 T7500 2200MHz:
/*
D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>_compile_Intel.bat

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>icl /O3 wildbench.c /FAcs /Fewildbench_Intel12
Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1.1.258 Build 20111011
Copyright (C) 1985-2011 Intel Corporation.  All rights reserved.

wildbench.c
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:wildbench_Intel12.exe
wildbench.obj

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>wildbench_Intel12.exe
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.272s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.959s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 18.237s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 46.035s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 14.711s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 21.403s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.164s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 16.302s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 18.595s, r = 350000000

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>_compile_VS2010.bat

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>cl /Ox wildbench.c /FAcs /Fewildbench_VS2010
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

wildbench.c
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:wildbench_VS2010.exe
wildbench.obj

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>wildbench_VS2010.exe
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 63.835s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.130s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.567s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 62.384s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.005s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.068s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 63.040s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.036s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 26.333s, r = 350000000

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark\WildBench>
*/

And from Intel's 'wildbench.cod' included in the benchmark, you can see:
_WildcardMatch_Iterative_JackHandy: 000ce-00000+2= 208 bytes long
_WildcardMatch_Iterative_Kaze: 000a2-00000+14= 176 bytes long

Strange, I did my next-to-better and failed to outperform Mr. Handy's etude. Hm, what I miss?

An add-on from 2013-Nov-30

FREEDOM strikes a blow!

Since last night, the FASTEST wildcard function is called 'WildcardMatch_Iterative_Kaze' revision 2 and it is 100% FREE, as always.

No more hms, lucky was I again, now the FASTEST wildcard function is not Mr.Handy's but Kaze's, the source/benchmarks of revision 2 are given below. I figured out, very quickly, the cause for the delays in yesterday's revision - one unnecessary branching, now removed making the function to hover.

The new tweak is quite the same as his first 'while' and it may seem that I have "stolen" his idea but of course it is not the case, I simply saw how three 'if's are nastily nested when the third is completely unnecessary, namely:

if (!BacktrackFlag) return 0;

This third 'if' was shouting for removal but yesterday time was not enough, so to remove it one more 'for' was needed in order to set the 'BacktrackFlag' out with the main 'for'.

Here I want to thank Mr.Handy for his work, seeing how his function reigned since the 2001 is awesome, however the time for dethronement has come.

The final showdown, both for short strings (filename-like) and long strings (Wikipedia's lines):

For short strings, I used again the modified Dezhi Zhao's WildBench, updated at download section:

// Results on my laptop Core 2 T7500 2200MHz:
/*
D:\_KAZE\WildBench>RUNME.bat

D:\_KAZE\WildBench>wildbench_Intel12.exe
FREEDOM strikes a blow!
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 46.503s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.241s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 15.008s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 44.351s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 15.038s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 14.633s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 48.782s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 14.976s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 15.413s, r = 350000000

D:\_KAZE\WildBench>wildbench_VS2010.exe
FREEDOM strikes a blow!
Running three times, for charm ...

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.152s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 25.896s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.776s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.167s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.068s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.386s, r = 350000000

50000000 runs of WildcardMatch_Recursive_DezhiZhao  = 61.682s, r = 350000000
50000000 runs of WildcardMatch_Iterative_JackHandy  = 26.146s, r = 350000000
50000000 runs of WildcardMatch_Iterative_Kaze       = 22.620s, r = 350000000

D:\_KAZE\WildBench>
*/

For long strings, I used again the 1024MB dump of Wikipedia:

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&karolina&wydra&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 162 KB/clock Kaze's revision 2
Kazahana: Performance: 163 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "++++++++++ &AUTHENTICITY&EMPOWERS&YOU+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/0
Kazahana: Performance: 186 KB/clock Kaze's revision 2
Kazahana: Performance: 186 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&Traven&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/617
Kazahana: Performance: 152 KB/clock Kaze's revision 2
Kazahana: Performance: 152 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "+++++++++ +++++++++ & influence&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/540
Kazahana: Performance: 190 KB/clock Kaze's revision 2
Kazahana: Performance: 189 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&metal+fat&u+&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/18
Kazahana: Performance: 158 KB/clock Kaze's revision 2
Kazahana: Performance: 159 KB/clock Jack Handy's

timer Kazahana_r1-++fix+nowait_critical_nixFIX_WolfRAM+fixITER+_HEXADECAD-Threads_IntelV12.exe 
  "&[+&Category:&pai&nte++ &+++++&" enwiki-20130904-pages-articles.7z.001 1536 >>Results.txt
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/7,914,526/56
Kazahana: Performance: 162 KB/clock Kaze's revision 2
Kazahana: Performance: 159 KB/clock Jack Handy's

And the source:

// Revision 2, 2013-Nov-30:
int WildcardMatch_Iterative_Kaze(const char* mask, const char* name) {
const char* maskSTACK;
const char* nameSTACK;
//int BacktrackFlag = 0; // No need of it in rev.2
/*
    // Simplest heuristic with SUPEROVERHEAD enforced: trying to skip the
    // whole wildcard section by comparing the two arrays order 1 of mask&name.
    int i;
    unsigned char maskOrder1[256];
    unsigned char nameOrder1[256];
    for (i='a'; i <= 'z'; i++) { maskOrder1[i]=0; nameOrder1[i]=0;}
    for (i=0; i < strlen(mask); i++) maskOrder1[mask[i]]=1;
    for (i=0; i < strlen(name); i++) nameOrder1[name[i]]=1;
    // Assuming the incoming strings are already lowercased (as it should for speed) and
    // if we don't have matching alphabet parts (from mask side) means
    // we don't need to compare any further i.e. the match fails.
    for (i='a'; i <= 'z'; i++) if ( maskOrder1[i] == 1 && nameOrder1[i] == 0 ) return 0;
*/
/*
    int i;
    for (i=0; i < strlen(mask); i++) {
        if ( mask[i] != '*' && mask[i] != '?' )
            if ( !memchr(name,mask[i],strlen(name)) ) return 0;
    }
*/
/*
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            if (!BacktrackFlag) return 0; // Stupid branching, SLOW!
            name++;
            goto Backtrack;
        }
    } 
} 
*/
// Here, outside the main/second 'for', in order to avoid branching
// we need to set the old/obsolete BacktrackFlag i.e. we need first occurrence of '*':
for (name, mask; *name; ++name, ++mask) {
    if (*mask == '*') {
        goto Backtrack;
    }
    //else if (*mask == '?') {
    //} else {
    else if (*mask != '?') {
        if (*name != *mask) { 
            return 0;
        }
    } 
}
// We are entering the main/second 'for' with mask pointing
// to '*' as if BacktrackFlag is already set in the very first iteration at first condition:
Backtrack:
for (nameSTACK = name, maskSTACK = mask; *nameSTACK; ++nameSTACK, ++maskSTACK) {
    if (*maskSTACK == '*') {
        mask = maskSTACK+1;
        if (!*mask) return 1;
        name = nameSTACK;
        //BacktrackFlag = -1;
        goto Backtrack;
    }
    //else if (*maskSTACK == '?') {
    //} else {
    else if (*maskSTACK != '?') {
        //if (tolower(*nameSTACK) != tolower(*maskSTACK)) {
        // These 'tolower's are outrageous, they hurt speed BADLY,
        // in real-world usage both should have been lowercased outwith the 'for'.
        if (*nameSTACK != *maskSTACK) { 
            //if (!BacktrackFlag) return 0; // Stupid branching, SLOW!
            name++;
            goto Backtrack;
        }
    } 
}
while (*maskSTACK == '*') ++maskSTACK;
return (!*maskSTACK);
}

And from Intel's 'wildbench.cod' included in the benchmark, you can see:
_WildcardMatch_Iterative_JackHandy: 000ce-00000+2= 208 bytes long
_WildcardMatch_Iterative_Kaze: 000a3-00000+13= 176 bytes long

Very close I am to releasing Kazahana's revision 1.

An Add-on from 2013-Dec-07

The EXHAUSTIVE FUZZY mode has been added, entering it happens when LD on command line is postfixed with 'e' as in next lines:

After 304,017,070 comparisions, 1 hit:
D:\_KAZE>Kazahana 0e "miesha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist
After 893,637,435 comparisions, 1 hit:
D:\_KAZE>Kazahana 1e "mieshaA tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist
After 930,871,807 comparisions, 1 hit:
D:\_KAZE>Kazahana 1e "misha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
* [[Miesha Tate]], Mixed martial artist
After 1,489,713,146 comparisions, 3 hits:
D:\_KAZE>Kazahana 2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
D:\_KAZE>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist
As it is shown, all lines containing your pattern within your 'AtMostLevenshteinDistance' (which is 2 in last example) are dumped.

When one doesn't know the right spelling the combinations are at least 4x3: Meesha/Meescha/Mischa/Misha Tayt/Taith/Tate

And showing one heavy EF "knee-bender" example scanning 1GB in 31,876 seconds using 1 thread:
D:\_KAZE>Kazahana 13e "Project Icarus, a design study of an 
interstellar spacecraft based on Project Daedalus" enwiki-20130904-pages-articles.7z.001 1536
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
(Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX, copyleft Kaze 2013-Dec-05.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,062 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,527,965,825/1
Kazahana: Performance: Total/fread() clocks: 17,282,584/734
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 1,777,296,620 ticks
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =    10.358 =    0%
User Time    = 31866.433 =  184%
Process Time = 31876.791 =  184%
Global Time  = 17283.201 =  100%
Pattern, your fuzzy text, 86 bytes long:
Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
Haystack, a line occurring within Wikipedia 1024MB dump, 126 bytes long:
* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft based on Project Daedalus
In order to match above twos we need at least Levenshtein Distance 13, why, that's why:
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE0: tudy)]], a design study of an interstellar spacecraft based on Project Daedalus

Insert 6 chars 'Projec':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE1: Projectudy)]], a design study of an interstellar spacecraft based on Project Daedalus

Insert 1 char ' ':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE2: Project udy)]], a design study of an interstellar spacecraft based on Project Daedalus

Substitute 6 chars 'udy)]]' with 'Icarus':
GOAL  : Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
PHASE3: Project Icarus, a design study of an interstellar spacecraft based on Project Daedalus
13=6+1+6

The logic is simple (the classical Wagner-Fischer is wrapped in two more cycles in order to throw our pattern at all Building-Blocks orders 86-13..86+13 of the incoming line), the only scary thing is the number of comparisions, for above example: 2,527,965,825
The scanned lines are: 9,382,307

One quick example:
We need to factorize the incoming lines down to Building-Blocks order 86-13=73 as a start:
BB order 86-13=73: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of"
BB order 86-12=74: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of "
...
BB order 86-01=85: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interste"
BB order 86-00=86: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstel"
BB order 86+01=87: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstell"
...
BB order 86+12=98: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecra"
BB order 86+13=99: "* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraf"

Those above are the first loop, second loop is sliding within the line one position at a time:

BB order 86-13=73: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of "
BB order 86-12=74: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of a"
...
BB order 86-01=85: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstel"
BB order 86-00=86: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstell"
BB order 86+01=87: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstella"
...
BB order 86+12=98: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraf"
BB order 86+13=99: " [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft"

One useful way for automatical exhaustive fuzzy search is to start with Levenshtein Distance 0 (same as Exact insensitive search) and gradually increase the LD by 1 until a hit occurs.

This auto-mode would be powerful, just giving your pattern and waiting for hits, they are guaranteed, when LD reaches the length of pattern or even exceeds then all kinds of hits will flood you.

Here, 16 threads are "Bare Necessities", for hardcore text explorers like me 128 threads would save the day, literally.

As I see things, no matter how extensive (or rather intensive) the calculations are, the EXHAUSTIVE FUZZY mode is a must-have, do-have already.

An Add-on from 2013-Dec-10

Major rearrangement in Exhaustive Fuzzy section resulted in several times faster execution. Galadriel's heuristics #1 & #3 are back in the game allowing 'caching' thus avoiding unnecessary LD matrix updates. Also I downloaded the latest Intel v14.0 compiler for free evaluation, and I have recompliled Kazahana as SSE and AVX executables, given in Download Section. Thank you Intel.

The goal is to widen the nasty memcpy() bottleneck by replacing the slow rep movsd with 256bit intrinsic, had I had a modern CPU I would compile for 512bit copy as well.

The actual boost in latest revision 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+ is given:

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer "Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe" 
  2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance 
  (Wagner-Fischer) searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX, copyleft Kaze 2013-Dec-05.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,004,486 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/1,489,713,146/3
Kazahana: Performance: 4 KB/clock
Kazahana: Performance: 39 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 239,876/944
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,407,651,279 ticks
Kazahana: Done.

Kernel Time  =    27.750 =   11%
User Time    =   450.656 =  187%
Process Time =   478.406 =  198%
Global Time  =   240.506 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer "Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX+_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe" 
  2e "meescha tate" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, 
  r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+, copyleft Kaze 2013-Dec-10.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,006,880 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/1,489,713,186/3
Kazahana: Performance: 6 KB/clock
Kazahana: Performance: 59 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 156,439/1,095
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,436,846,588 ticks
Kazahana: Done.

Kernel Time  =    18.796 =   11%
User Time    =   294.390 =  186%
Process Time =   313.187 =  198%
Global Time  =   157.485 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
*  1977   &ndash; [[Ameesha Patel]], Indian actress and producer
* [[Ameesha Patel]]
* [[Miesha Tate]], Mixed martial artist

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>timer Kazahana_r1-++fix+
  nowait_critical_nixFIX_WolfRAM+fixITER+EX+_HEXADECAD-Threads_IntelV14_SSE2_32bit.exe 13e 
  "Project Icarus, a design study of an interstellar spacecraft based 
  on Project Daedalus" enwiki-20130904-pages-articles.7z.001 1536
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) 
    searcher, r. 1-++fix+nowait_critical_nixFIX_Wolfram+fixITER+EX+, copyleft Kaze 2013-Dec-10.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 1536KB ... OK
/; 00,000,000,518 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 9,382,307/2,527,966,497/1
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 4 xgrams/clock
Kazahana: Performance: Total/fread() clocks: 2,075,345/1,049
Kazahana: Performance: I/O time, i.e. fread() time, is 0 percents
Kazahana: Performance: RDTSC I/O time, i.e. fread() time, is 2,655,364,558 ticks
Kazahana: Done.

Kernel Time  =   106.125 =    5%
User Time    =  3867.406 =  186%
Process Time =  3973.531 =  191%
Global Time  =  2076.243 =  100%

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>type Kazahana.txt
* [[Project Icarus (Interstellar Probe Design Study)]], a design study of an interstellar spacecraft based on Project Daedalus

D:\_KAZE\_KAZE_GOLD\Kazahana_source_EXEs_Benchmark>

The 'caching' delivers strong boost 17,283s vs 2,076s, still it is slow, I cannot see how to speed it up, Wikipedia being 42GB reminds that full-traversal will take 42x2000s, scary!

Points of Interest

Finally, I thank Igor Pavlov, maladetz. Also I am thankful to Fantasy and Mr.Eiht who helped me to calibrate MokujIN and FNV1A_YoshimitsuTRIAD which in turn were used in Kazahana. Greediness (not need) for more speed is a virtue, if you ask me, it is a powerful driving force, the only vice one should be afraid of is ungratefulness.

History

  • 2013-Nov-17: The day Kazahana hits the mainstream

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Sanmayce
Other
Bulgaria Bulgaria
A Bulgarian old boy interested in search techniques, nothing special.
Follow on   Twitter

Comments and Discussions

 
QuestionWhat about UTF-8 and UTF-16 support? PinmemberMember 42896136-Jan-14 2:34 
AnswerRe: What about UTF-8 and UTF-16 support? PinmemberSanmayce8-Jan-14 9:20 
QuestionGallowwalker - the free GUI tool using Kazahana PinmemberSanmayce12-Dec-13 6:37 
Questioncompiling Kazahana with mingw, working now PinmemberRichB200029-Nov-13 6:57 
AnswerRe: compiling Kazahana with mingw, working now PinmemberSanmayce29-Nov-13 7:34 
QuestionWhat 'Kazahana' really means? PinmemberSanmayce22-Nov-13 5:00 
Questioncompiling Kazahana with mingw [modified] PinmemberRichB200018-Nov-13 14:25 
AnswerRe: compiling Kazahana with mingw PinmemberSanmayce20-Nov-13 7:08 
AnswerRe: compiling Kazahana with mingw PinmemberSanmayce20-Nov-13 8:58 
AnswerRe: compiling Kazahana with mingw PinmemberSanmayce22-Nov-13 5:02 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web01 | 2.8.140827.1 | Last Updated 10 Dec 2013
Article Copyright 2013 by Sanmayce
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid