Click here to Skip to main content
15,867,594 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
Hi All

I'm developing my own CString class. With the idea to make it "much" faster than the standard CString, easily upgradeable, and to include into it some non-standard parsing functions I often use in my work, those will be much better to be part of the CFString object itself, instead of being separate functions.
And in deed I have some noticeable achievements for the basic functions my CFString uses:

My copy() is 35% faster then strcpy()!

My lengh() is 30% faster than strlen().
BUT note that the optimizer often pre-calculates strlen() calls and replaces them with respective numbers!
For example int i = strlen("abc") will eventually generate mov EAX, 3 instaed of generating the asm code of strlen() itself, that happens for hardcoded strings! so in the cases when strlen() isn't replaced by a number, my function is 30% faster

My compare() is 20% faster than strcmp().

I used the inline assembly option of the Visual C++ compiler, the "__asm" command to achieve this.

I also developed an equalizing mechanism, so when I write str1 = str2 the "operator =" actually doesn't copy the data from str2 to str1
the copy occurs ONLY in specific cases when it's needed.
And that mechanism provides about 35 - 40% faster equalizing betweend CFString objects in comparison to the CString equalizing!

So far so good! But in the case with FindOneOf the standard CString beated me badly! CString::FindOneOf is almost twise faster than my CFString::FindOneOf, and my FindOneOf IS written in assembly too. I did everithing I could to tweak it up, but I only reduced the time from 33000ms to 28000ms and CString::FindOneOf does the job for 18000ms, and my function suppose to be faster, now I'll be happy just to equal the score of CString::FindOneOf!
This is the test code I use:
<br />	CString/CFString str1("abcdefghijklmnoprs0987654321");<br /><br />	int start = GetTickCount();<br />	for(int pos = 0; pos; 100000000; pos++)<br />	{<br />		index = 0;<br />		index = str1.FindOneOf("1234567890");<br />	}<br />	int time = GetTickCount() - start;<br /><br />	CString strTime;<br />	strTime.Format("%d, %d", time, index);<br />	m_editResult.SetWindowText(strTime); //this is the CEdit object I use for output<br />

I looked at the disassembly of the CString::FindOneOf but couldn't find a single loop, I didn't understan anything! there are a lot of stack operations (PUSH POP) a lot of CALLs and they suppose to be slow, I don't have any idea how can this function beat mine!!!
This is the algorithm I use written in C, although it's written in assembly and I believe optimised in the actual function:
<br />        const char* pstrTBuffer = m_pstrBuffer; //the actual string<br />	const char* pstrTSeek   = pstrSeek; //the charset<br />	do<br />	{<br />		while(*pstrTSeek)<br />		{<br />			if(*pstrTBuffer == *pstrTSeek++)<br />			{<br />				return pstrTBuffer - m_pstrBuffer;<br />			}<br />		}<br />		pstrTSeek = pstrSeek;<br />	}<br />	while(*++pstrTBuffer);<br />	return -1;<br />

In case the code isn't clear enough:
The idea is as simple as possible, compare each character in the main string m_pstrBuffer with each character in the charset pstrSeek
if there is a match return its index if not return -1

I assumed that the problem isn't in my coding technique, it must be generally in the algorithm I use!!!
Does somebody know a better algorithm to implement FindOneOf??? :)
I will appreciate any help - thank you!!! :)

Sorry for the prolonged question, but I wanted to be as clear as I can in my description.

modified on Monday, June 22, 2009 10:20 AM
Posted

Member 4399131 wrote:
I assumed that the problem isn't in my coding technique, it must be generally in the algorithm I use!!!


Yep.

Member 4399131 wrote:
Does somebody know a better algorithm to implement FindOneOf???


Looking at their code (strpbrk - actually implemented using strspn), Microsoft do :)

They build a table (it's only 256 bits == 32 bytes) from the set of characters you're looking for and do a table lookup for each character of the string you're searching in. That means that their algorithm is O(n) (n=search string length) rather than O(n*m) (n=search string length, m=number of chars you're looking for).

Oh, and they implement that in assembly language (different implementations for x86 and x64). If I were you, I'd do myself a favour and use the MS C run-time function in this case...

 
Share this answer
 
Without knowing the logic behind the standard implementation, it is impossible to know why your algorithm is being beat. Perhaps the standard string stores data about specific searches so that, in your loop, it loads known results instead of recalculating them (ie, you are searching the same string for the same charset, so why not cache the result?). Perhaps it realized that your set of input characters is consecutive, and instead of iterating through them it tests each character against a range. Perhaps the standard implementation starts from the end of the string and searches forward. Perhaps the compiler realized that the standard call to FindOneOf in your loop could be optimized away since you are not using any values of that method call except the final one, but your inline-assembly could not be optimized away. The moral is, without doing more testing I have no idea why your algorithm is slower (or your test method is inappropriate). I would only recommend additional test cases, or to make one or more of the modifications that I said were possibilities of what the standard code does. You could also try reordering the loops, but I doubt you are going to see significant timing differences resulting from such a change.

[BEGIN EDIT]
The lookup table idea is a great one for ASCII strings... I was only thinking in terms of unicode.
[END EDIT]

 
Share this answer
 
I think Stuart's advice to use the C runtime in this case is the best alternative.

An algorithm can be optimized for different purposes and can also be better or worse in some scenarios. I guess the standard C runtime implementation is rather generic and predictable.

Try out the following algorithm and you'll see what I mean.
It is twice as fast as CString::FindOneOf() with the strings you're using for testing and measurement when it's release-built. On the other hand, adding a 'u' in the search string will make the algorithm 50% slower than CString::FindOneOf(). Surely you can optimize it further, but it will still have its weaknesses.
The key in the algorithm below is to do as little as possible with each char in the string to be searched.
int FindOneOf( const char* pString, const char* pSearch )<br />{<br />    register char cMask = -1;<br />    register char cPattern = *pSearch;<br />    register int nPos;<br />    int nSPos;<br />    int nRet = -1;<br /><br />    for( nPos = 1; pSearch[nPos]; ++nPos )<br />    {<br />        cMask &= ~(pSearch[nPos - 1] ^ pSearch[nPos]);<br />        cPattern |= pSearch[nPos];<br />    }<br />    // cMask now contains '1's in bits that have the same value for every char in pSearch string<br /><br />    // Check each char in pString<br />    for( nPos = 0; pString[nPos]; ++nPos )<br />    {<br />        // Find out which bits have the same values by '~(pString[nPos] ^ cPattern)'<br />        // Mask the bits of interest and compare with the mask<br />        if( (~(pString[nPos] ^ cPattern) & cMask) == cMask )<br />        {<br />            // We have a possible match, now walk through the search string<br />            for( nSPos = 0; pSearch[nSPos]; ++nSPos )<br />            {<br />                if( pString[nPos] == pSearch[nSPos] )<br />                {<br />                    // A match was found, update return value and get out<br />                    nRet = nPos;<br />                    break;<br />                }<br />            }<br />            if( nRet != -1 )<br />            {<br />                break;<br />            }<br />        }<br />    }<br /><br />    return nRet;<br />}

Beware! The above is just an example at the top of my head! ;-)

 
Share this answer
 


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900