Click here to Skip to main content
13,596,798 members
Click here to Skip to main content

Stats

163.3K views
3K downloads
69 bookmarked
Posted 9 Sep 2011
Licenced CPOL

Fastest strstr-like function in C!?

, 22 Aug 2016
Tuned function for searching a needle in a haystack
Nakamichi_Kintaro++_source_executables_64bit_(GCC510-vs-Intel150)_(TW-vs-RG)_BENCHMARK
MokujIN 224 prompt.lnk
Railgun_Trolldom_28-pages.pdf
Railgun_Swampshine_BailOut_source_(Two-Way_benchmark)
// ''Sekirei''|鶺鴒| are superpowered beings with a different code similar to a human's. 
// wagtail -  Any of various chiefly Old World birds of the family Motacillidae, having a slender body with a long tail that constantly wags.
// wag - A humorous or droll person; a wit.
// wag -
// v. intr.
// 1. To move briskly and repeatedly from side to side, to and fro, or up and down.
// 2. To move rapidly in talking. Used of the tongue.
// 3. To walk with a clumsy sway; waddle.
// 4. Archaic: To be on one's way; depart.
// v. tr.
// To move (a body part) rapidly from side to side or up and down, as in playfulness, agreement, admonition, or chatter.
// /HERITAGE/
// ''Kusano''|草野|Sekirei #108, commonly referred to as "Kuu-chan" or "Ku", is the youngest of Minato's Sekirei, and is also known as the "Green Girl"|緑の少女|Midori no Shōjo by other Sekirei. At the beginning of the story, she was hiding in a botanical garden after being traumatized when Mikogami attempted to forcibly wing her. Kusano communicated with Minato telepathically and led him through the garden until he found her. Kusano refers to Minato as ''Onii-chan'' (big brother), and is the most attached to him. She does not like fighting or quarreling and she can be seen stopping them when they start. She is also very impressionable and often copies Musubi, Tsukiumi and Kazehana's mannerisms. She VERY much wants to be Minato's wife when she grows up, and is highly possessive of him at times, even biting her fellow sekirei when they start to get too close to him.
// Apart from its unusual [[plumage]] pattern and habitat, the Forest Wagtail differs from its ''Motacilla'' relatives in its strange habit of swaying its tail from side to side, not wagging it up and down like other wagtails. The Japanese name ''Jokofury-sekirei'' (=sideways-swinging wagtail) ) is based on this habit.
// ''Kazehana''|風花|Sekirei #03 is a tall and extremely buxom Sekirei whose first encounter with Minato was to stumble into his bed in a drunken stupor. 
// Kazehana displays an extremely relaxed personality, preferring to spend most of her time lounging and drinking ''[[sake]]'', and often becomes giddy when discussing matters of love. However, when she becomes serious she displays a lot of power and lets nobody (with the exception of Minaka, Miya, and Minato) talk down to her.
// Kazehana has the ability to control and manipulate wind, which also grants her a limited ability of flight. 
// Her known attacks is the |Flower Whirlwind|花旋風|Hana Senpū ...
// ... who teaches him the Sekireigan (Eye of the Sekirei). With this he is able to move extremely fast and attack from many angles simultaneously.
// His sword techniques are unnamed, but he uses the Sekireigan, which was taught to him by Anri. Yukimura can utilize the Sekireigan to access extreme speed.
// /Wikipedia/
// Version: This is Railgun_Sekireigan revision 2, copyleft 2013-Oct-16, Kaze
// Purpose: This is optimized strstr-like (memmem in fact) function for short needles up to 255 bytes ... and beyond.
// Caution: For better speed the case 'if (cbPattern==1)' was removed, so Pattern must be longer than 1 char.
#define _rotl_KAZE(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define HaystackThresholdSekirei 961 // Quadruplet works up to this value, if bigger then BMH2 takes over.
#define NeedleThresholdBIGSekirei 12+40 // Should be bigger than 'HasherezadeOrder'. BMH2 works up to this value (inclusive).
#define HashTableSizeSekirei 17-1 // In fact the real size is -3, because it is BITwise, when 17-3=14 it means 16KB, (17-1)-3=13 it means 8KB.
char * Railgun_Sekireigan_Bari (char * pbTarget, char * pbPattern, uint32_t cbTarget, uint32_t cbPattern)
{
	char * pbTargetMax = pbTarget + cbTarget;
	register uint32_t ulHashPattern;
	register uint32_t ulHashTarget;
	signed long count;
	signed long countSTATIC;

	unsigned char SINGLET;
	uint32_t Quadruplet2nd;
	uint32_t Quadruplet3rd;
	uint32_t Quadruplet4th;

	uint32_t AdvanceHopperGrass;

	long i; //BMH needed
	int a, j;
	unsigned char bm_Horspool_Order2[256*256]; // BMHSS(Elsiane) needed, 'char' limits patterns to 255, if 'long' then table becomes 256KB, grrr.
	uint32_t Gulliver; // or unsigned char or unsigned short

	unsigned char bm_Hasherezade_HASH[1<<(HashTableSizeSekirei-3)];
	uint32_t hash32;
	uint32_t hash32B;
	uint32_t hash32C;

	if (cbPattern > cbTarget) return(NULL);

	if ( cbPattern<4 ) { 

        	pbTarget = pbTarget+cbPattern;
		ulHashPattern = ( (*(char *)(pbPattern))<<8 ) + *(pbPattern+(cbPattern-1));
		if ( cbPattern==3 ) {
			for ( ;; ) {
				if ( ulHashPattern == ( (*(char *)(pbTarget-3))<<8 ) + *(pbTarget-1) ) {
					if ( *(char *)(pbPattern+1) == *(char *)(pbTarget-2) ) return((pbTarget-3));
				}
				if ( (char)(ulHashPattern>>8) != *(pbTarget-2) ) { 
					pbTarget++;
					if ( (char)(ulHashPattern>>8) != *(pbTarget-2) ) pbTarget++;
				}
				pbTarget++;
				if (pbTarget > pbTargetMax) return(NULL);
			}
		} else {
		}
		for ( ;; ) {
			if ( ulHashPattern == ( (*(char *)(pbTarget-2))<<8 ) + *(pbTarget-1) ) return((pbTarget-2));
			if ( (char)(ulHashPattern>>8) != *(pbTarget-1) ) pbTarget++;
			pbTarget++;
			if (pbTarget > pbTargetMax) return(NULL);
		}

	} else {
		if (cbTarget<HaystackThresholdSekirei) { // This value is arbitrary (don't know how exactly), it ensures (at least must) better performance than 'Boyer_Moore_Horspool'.

		pbTarget = pbTarget+cbPattern;
		ulHashPattern = *(uint32_t *)(pbPattern);
		SINGLET = ulHashPattern & 0xFF;
		Quadruplet2nd = SINGLET<<8;
		Quadruplet3rd = SINGLET<<16;
		Quadruplet4th = SINGLET<<24;
		for ( ;; ) {
			AdvanceHopperGrass = 0;
			ulHashTarget = *(uint32_t *)(pbTarget-cbPattern);
			if ( ulHashPattern == ulHashTarget ) { // Three unnecessary comparisons here, but 'AdvanceHopperGrass' must be calculated - it has a higher priority.
				count = cbPattern-1;
				while ( count && *(char *)(pbPattern+(cbPattern-count)) == *(char *)(pbTarget-count) ) {
					if ( cbPattern-1==AdvanceHopperGrass+count && SINGLET != *(char *)(pbTarget-count) ) AdvanceHopperGrass++;
					count--;
				}
				if ( count == 0) return((pbTarget-cbPattern));
			} else { // The goal here: to avoid memory accesses by stressing the registers.
				if ( Quadruplet2nd != (ulHashTarget & 0x0000FF00) ) {
					AdvanceHopperGrass++;
					if ( Quadruplet3rd != (ulHashTarget & 0x00FF0000) ) {
						AdvanceHopperGrass++;
						if ( Quadruplet4th != (ulHashTarget & 0xFF000000) ) AdvanceHopperGrass++;
					}
				}
			}
			AdvanceHopperGrass++;
			pbTarget = pbTarget + AdvanceHopperGrass;
			if (pbTarget > pbTargetMax) return(NULL);
		}

		} else { //if (cbTarget<HaystackThresholdSekirei)

	if ( cbPattern<=NeedleThresholdBIGSekirei ) { 

			countSTATIC = cbPattern-2-2;
			ulHashPattern = *(uint32_t *)(pbPattern); // First four bytes
			//ulHashTarget = *(unsigned short *)(pbPattern+cbPattern-1-1); // Last two bytes
			i=0;
			//for (a=0; a < 256*256; a++) {bm_Horspool_Order2[a]= cbPattern-1;} // cbPattern-(Order-1) for Horspool; 'memset' if not optimized
			//for (j=0; j < cbPattern-1; j++) bm_Horspool_Order2[*(unsigned short *)(pbPattern+j)]=j; // Rightmost appearance/position is needed
			for (a=0; a < 256*256; a++) {bm_Horspool_Order2[a]=0;}
			for (j=0; j < cbPattern-1; j++) bm_Horspool_Order2[*(unsigned short *)(pbPattern+j)]=1;
			while (i <= cbTarget-cbPattern) {
				Gulliver = 1;
				if ( bm_Horspool_Order2[*(unsigned short *)&pbTarget[i+cbPattern-1-1]] != 0 ) {
					//if ( Gulliver == 1 ) { // Means the Building-Block order 2 is found somewhere i.e. NO MAXIMUM SKIP
						if ( *(uint32_t *)&pbTarget[i] == ulHashPattern) {
							count = countSTATIC; // Last two chars already matched, to be fixed with -2
							while ( count !=0 && *(char *)(pbPattern+(countSTATIC-count)+4) == *(char *)(&pbTarget[i]+(countSTATIC-count)+4) )
								count--;
							if ( count == 0) return(pbTarget+i);
						}
					//}
					// Trying to strengthen the Skip performance... here ONLY one additional lookup, for better/longer jumps more such lookups, unrolled to be added. 
					if ( bm_Horspool_Order2[*(unsigned short *)&pbTarget[i+cbPattern-1-1-2]] == 0 ) Gulliver = cbPattern-(2-1)-2;
				} else Gulliver = cbPattern-(2-1);
				i = i + Gulliver;
				//GlobalI++; // Comment it, it is only for stats.
			}
			return(NULL);

	} else { // if ( cbPattern<=NeedleThresholdBIGSekirei )
// MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() [
			countSTATIC = cbPattern-2-2;

			ulHashPattern = *(uint32_t *)(pbPattern); // First four bytes
			//ulHashTarget = *(unsigned short *)(pbPattern+cbPattern-1-1); // Last two bytes
		
			i=0;

			for (a=0; a < 1<<(HashTableSizeSekirei-3); a++) {bm_Hasherezade_HASH[a]= 0;} // to-do: 'memset' if not optimized
			// cbPattern - Order + 1 i.e. number of BBs for 11 'fastest fox' 11-8+1=4: 'fastest ', 'astest f', 'stest fo', 'test fox'
			for (j=0; j < cbPattern-12+1; j++) {
				hash32 = (2166136261 ^ *(uint32_t *)(pbPattern+j+0)) * 709607;        
				hash32B = (2166136261 ^ *(uint32_t *)(pbPattern+j+4)) * 709607;        
				hash32C = (2166136261 ^ *(uint32_t *)(pbPattern+j+8)) * 709607;        
				hash32 = (hash32 ^ _rotl_KAZE(hash32C,5) ) * 709607;
				hash32 = (hash32 ^ _rotl_KAZE(hash32B,5) ) * 709607;
				hash32 = ( hash32 ^ (hash32 >> 16) ) & ( (1<<(HashTableSizeSekirei))-1 );
				bm_Hasherezade_HASH[hash32>>3]= bm_Hasherezade_HASH[hash32>>3] | (1<<(hash32&0x7));
			}

			while (i <= cbTarget-cbPattern) {
				Gulliver = 1; // Assume minimal jump as initial value.
				// The goal: to jump when the rightmost 8bytes (Order 8 Horspool) of window do not look like any of Needle prefixes i.e. are not to be found. This maximum jump equals cbPattern-(Order-1) or 11-(8-1)=4 for 'fastest fox' - a small one but for Needle 31 bytes the jump equals 31-(8-1)=24
				//GlobalHashSectionExecution++; // Comment it, it is only for stats.
					hash32 = (2166136261 ^ *(uint32_t *)(pbTarget+i+cbPattern-12+0)) * 709607;        
					hash32B = (2166136261 ^ *(uint32_t *)(pbTarget+i+cbPattern-12+4)) * 709607;        
					hash32C = (2166136261 ^ *(uint32_t *)(pbTarget+i+cbPattern-12+8)) * 709607;        
					hash32 = (hash32 ^ _rotl_KAZE(hash32C,5) ) * 709607;
					hash32 = (hash32 ^ _rotl_KAZE(hash32B,5) ) * 709607;
					hash32 = ( hash32 ^ (hash32 >> 16) ) & ( (1<<(HashTableSizeSekirei))-1 );
					if ( (bm_Hasherezade_HASH[hash32>>3] & (1<<(hash32&0x7))) ==0 )
						Gulliver = cbPattern-(12-1);

				if ( Gulliver == 1 ) { // Means the Building-Block order 8/12 is found somewhere i.e. NO MAXIMUM SKIP
					if ( *(uint32_t *)&pbTarget[i] == ulHashPattern) {
						count = countSTATIC;
						while ( count !=0 && *(char *)(pbPattern+(countSTATIC-count)+4) == *(char *)(&pbTarget[i]+(countSTATIC-count)+4) )
							count--;
						if ( count == 0) return(pbTarget+i);
					}
				}
					i = i + Gulliver;
				//GlobalI++; // Comment it, it is only for stats.
			} // while (i <= cbTarget-cbPattern)
			return(NULL);
// MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() MEMMEM() ]
	} // if ( cbPattern<=NeedleThresholdBIGSekirei )
		} //if (cbTarget<HaystackThresholdSekirei)
	} //if ( cbPattern<4 )
}

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Sanmayce
Other
Bulgaria Bulgaria
A Bulgarian old boy interested in search techniques, nothing special.

You may also be interested in...

Pro
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web03-2016 | 2.8.180621.3 | Last Updated 22 Aug 2016
Article Copyright 2011 by Sanmayce
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid