|
|
Comments and Discussions
|
|
 |

|
Hi Jim,
Another hidden gem...
Jeff
|
|
|
|

|
Million thanks
-Jim
|
|
|
|

|
Perhaps I am silly but the template declaration doesn't look like legal C++ code to me. In particular, this:
doesn't seem legal to me. Does this compile?
Thanks.
|
|
|
|

|
You are right! My article has indeed some strange typos in the template declarations of its embedded code. Thank you for helping me to notice that. Nevertheless the code contained in the SearchNTestCode.zip[^] file, that is included in the article, is correct and has been compiled usefully in several compilers. I will include the fix of this problem in the next update of my article.
Best Regards,
Jim Xochellis
-------------------
When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|

|
Try moving the creation of the following variables outside the for loops for even better performance
__lookAhead
__remainder
__backTrack
__prevRemainder
The compilers optimiser may already do this for you, but just to be sure...
PS, Great article by the way!
Regards,
Simon Hughes
E-mail: simon@hicrest.net
Web: www.hicrest.net
|
|
|
|

|
Hi Simon,
Simon Hughes wrote:
Try moving the creation of the following variables outside the for loops for even better performance
__lookAhead
__remainder
__backTrack
__prevRemainder
This is a very interesting remark, thank you very much! Nevertheless, the (potential) performance problem is located only in the declaration of the __backTrack variable and I believe that moving the declarations outside of the loop is not the appropriate thing to do. In particular:
1) All the above variables are stack allocated, hence we doesn't have multiple memory allocations. (nor the corresponding overhead)
2) In case we are dealing with primitive types, (i.e. pointers and ints) or types that doesn't have any constructors, then no harm is done.
3) When the variables are initialized by assignment on declaration, the assignment (or copy construction) is replacing the typical construction and no harm is done, even if we have constructors.
4) The __backTrack variable is not initialized by assignment on declaration, consequently its constructor will be called once for every matching element! Here we do have a performance problem, which can be solved by changing the expression _RandomAccessIter __backTrack; to _RandomAccessIter __backTrack = __lookAhead - 1; (And removing the corresponding expression from the for loop that follows)
5) In case we move the declarations of the variables outside the loop, then we will have redundant constructor calls and we may loose some of the advantages that the copy construction provides, specially when reference counting or "lazy" copy is being used. (Not very probable in the particular situation but we never know...)
Thanks again for your remark, my code will be corrected very soon, to eliminate any potential performance problem.
Best regards,
Jim Xochellis
-------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|

|
Hi again,
In my previous e-mail I have completely forgotten all about the destructors!! It seems now that I have to move the declaration of the variables outside the loop, exactly like you proposed...
Best Regards,
Jim Xochellis
-------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|

|
When using forward iterators, you will need to perform I+N increments,
when I is the initial position of the sequence and N the sequence length. You could save (I+N)*(N-1/N) comparisons by comparing only after every Nth increment, but then you would need to keep a counter and compare that every increment. So the straightforward implementation would increase the number of comparisons. The counter comparison probably is cheaper, though ( same vale repeatedly, will be in register).
If you had bidirectional iterators, you could easily backtrack once you've found one item. This means that bidrectional iterators would benefit from their own algorithm. With forward iterators, you need to keep a previous iterator which was you last mismatching comparison, N items back from your last comparison. When you find a match, you need to run from this iterator forwards. If you're lucky, this will take avg(N/2) operations to find the start of the sequence, which is just as expensive as with bidirectional iterators. However, had you found a single item, the forward iterators would cost you N comparisons to deduce this, while bidirectional iterators could do this in 2 operations (one forward, one back).
|
|
|
|

|
Hi,
Thank you very much for your thoughts. I'll keep them in mind when I'll try to implement an enhanced search_n for forward iterators.
Best Regards,
Jim Xochellis
-------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|

|
Any article on improving the efficiency of code is welcome!
Education is never a waste of time!
INTP
|
|
|
|

|
5,of course!
But I still have a question.
How can I get the exactly _count? That is: when there are additional matched values next to the subsequence you found,how can I discard this particular subseqence?
Such as: given {abbcddbbbbbcbbbd},while _count=3,value=b
The value return I hope is 13,but not 7.
ho ho
I love codeproject!
|
|
|
|

|
Hi Yixin,
Yixin Liu wrote:
5,of course!
Many thanks for supporting this article.
But I still have a question.
How can I get the exactly _count? That is: when there are additional matched values next to the subsequence you found,how can I discard this particular subseqence?
Such as: given {abbcddbbbbbcbbbd},while _count=3,value=b
The value return I hope is 13,but not 7.
I didn't quite understand your question, I am afraid.
Nevertheless, my best guess is:
You need to search for any subsequence that contains n1 >= n items equal to b and you need to know the start and the end of this subsequence. Am I right?
If that is the case, you have to use the search_n first to find the start of the subsequence and the find_if afterwards, in order to find the end of the subsequence.
For example, the following snippet will locate the first substring of the string "abbcddbbbbbcbbbd" that contains 3 or more chars equal to 'b' and will replace all the chars of that substring with '*'. Consequently the result will be: "abbcdd*****cbbbd".
char seq[] = "abbcddbbbbbcbbbd";
char *seqStart = seq;
char *seqEnd = seq + sizeof seq;
int n = 3;
char *subSeqStart = search_n(seqStart, seqEnd, n, 'b');
char *subSeqEnd = seqEnd;
if (subSeqStart != seqEnd)
{
subSeqEnd = find_if(subSeqStart + n, seqEnd,
bind2nd(not_equal_to<char>(), 'b'));
fill(subSeqStart, subSeqEnd, '*');
}
I hope I helped a little
Best Regards,
Jim Xochellis
-------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|
|

|
Great job and good article! 5+
~Nitron.
ññòòïðïðB A start
|
|
|
|

|
And very well presented. Thanks.
|
|
|
|

|
Well it's quite funny that a search for N elements in a row is done element by element instead of every N elements.
I'm not really a genius but this sounds quite obvious to me..!!
Nice article anyway.
|
|
|
|

|
Sometimes the obvious is the hardest to explain. This guy has focused on one of the simplest functions of the STL and provided a multi-page article, complete with diagrams, correct references, benchmarks etc, about it. He discussed different STL implementations, and compared them really well. If that does not impress you then you must be a genius!
|
|
|
|

|
you've got me wrong: the article in very interesting and is well done. I was just saying that it is rather funny that such a silly implementation was in the STL
|
|
|
|

|
For some reason, people think that STL is the end all when it comes to implementation. Look at such creations as auto_ptr and the export keyword and you realized these guys are far from infallible.
STL is some really great work. However, it should NEVER be assumed that it is the best solution for all problems.
Tim Smith
I'm going to patent thought. I have yet to see any prior art.
|
|
|
|

|
I find it similar in concept to the string-search algorithm presented on Thomas Wang's website. It's a common-sense idea that's been overlooked by many people with more training than actual intelligence: when checking a sequence, don't do more work than is needed. I gave your article a 5. Don't be so self-deprecating-- to achieve an 2 to 8 times speedup in such a well-examined area is really something of which you should be very proud, and I don't doubt that it's going to be very useful to people who need all the performance they can get.
Regards,
Jeff Varszegi
|
|
|
|

|
Hi Jeff,
Jeff Varszegi wrote:
I find it similar in concept to the string-search algorithm presented on Thomas Wang's website.
Yes, that is true. Wang's algorithm seems that is based on the (notable) Boyer-Moore search algorithm, which has also given me the inspiration to implement this search_n version.
I gave your article a 5. Don't be so self-deprecating-- to achieve an 2 to 8 times speedup in such a well-examined area is really something of which you should be very proud, and I don't doubt that it's going to be very useful to people who need all the performance they can get.
Thanks for your kind and supportive words and your voting.
Best Regards,
Jim Xochellis
------------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the good way, the bad way, the greek way and my way...
|
|
|
|

|
Jim Xochellis wrote:
Yes, that is true. Wang's algorithm seems that is based on the (notable) Boyer-Moore search algorithm, which has also given me the inspiration to implement this search_n version.
To be a bit more specific, both these are really closer to the Boyer-Moore-Horspool algorithm. Given the degenerate circumstances (I.e. we're only looking for runs of identical elements) it would also be hard to distinguish from Sunday's variant of B-M-H.
I'm sure you're aware of it already, but for those looking on who want to look up, when it comes to implementing a version for forward-iterators, the Knuth-Morris-Pratt algorithm would be an obvious possibility.
The Universe is a figment of its own imagination.
Later,
Jerry.
|
|
|
|

|
Hi Jerry,
Jerry Coffin wrote:
To be a bit more specific, both these are really closer to the Boyer-Moore-Horspool algorithm. Given the degenerate circumstances
You are right, both algorithms are closer to Horspool algorithm than to Boyer-Moore (Horspool is a simplification of the B-M). But there are also important differences. For instance, Boyer-Moore and Horspool algorithms perform the comparisons from right to left, whereas my algorithm compares both from right to left and from left to right and Thomas Wang's algorithm compares from left to right. (there are some other differences too)
I'm sure you're aware of it already, but for those looking on who want to look up, when it comes to implementing a version for forward-iterators, the Knuth-Morris-Pratt algorithm would be an obvious possibility.
Unfortunately, I can't see how the Knuth-Morris-Pratt algorithm can be used to optimize search_n (for Forward of even Random access iterators), but then again I have some research to do, until I can say something positive here.
By the way, thanks for this very interesting discussion.
Best Regards,
Jim Xochellis
-------------------
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the right way, the wrong way, the Greek way and my way...
|
|
|
|

|
Great article. 5/5.
I believe the problem with STL algorithms is that they are not customizable enough. For instance, std::sort() allows the user to pick the sorting criteria, but not the algorithm itself. It would be nice if some STL algorithms had an additional parameter for customizing the implementation.
Anyway, thanks for the nice article.
|
|
|
|

|
Hi Nemanja,
Nemanja Trifunovic wrote:
Great article. 5/5.
Thank you very much. That is a great honor for me, since I am aware of your great articles in CodeProject.
I believe the problem with STL algorithms is that they are not customizable enough. For instance, std::sort() allows the user to pick the sorting criteria, but not the algorithm itself. It would be nice if some STL algorithms had an additional parameter for customizing the implementation.
I agree, but even without the extra customization facilities, there is still room for improvements.
Thanks again,
Jim Xochellis
1. When it comes to programming, I would like to think that the developers believe in the good source (code), but unfortunately many of them still believe in the good force!
2. When it comes to solving a problem, there is always the good way, the bad way, the greek way and my way...
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
This article is discussing the efficiency of the most popular search_n implementations. Furthermore, it is introducing a new search_n specialization for random access iterators, which outruns by far the most commonly used implementations.
| Type | Article |
| Licence | CPOL |
| First Posted | 11 Apr 2004 |
| Views | 87,532 |
| Downloads | 374 |
| Bookmarked | 36 times |
|
|