 |
|
 |
Thank you for the code! I've added a parameter to the FindComparable to return the item without another 'get' by index.
Thanks.
|
|
|
|
 |
|
 |
What is the advantage of using an AVL tree as an indexed container that has O(n log(n)) access time, as opposed to an array which has O(1) access time?
Also, your indexing scheme seems like it has a flaw, in that, it inserts duplicates nodes into the tree even though only the latest inserted item is actually searchable. This will decrease performance over time because more unecessary hidden nodes will be visited.
Thanks for sharing you fine code.
-y
|
|
|
|
 |
|
 |
Thanks
So to the point:
You can use this as an indexed container with O(log(n)) access time for each element if you only need to visit one element per user interaction, i.e. a text editor. Indeed, visiting all nodes takes O(n log(n)) time. This container also has O(log(n)) insertion time for arbitrary indices.
The vector, on the other hand, has O(1) visiting time for any element and O(n) visiting time for all elements, which is considerably better. However, when you insert elements at the end of the vector, time can be either O(1) or O(n) depending on your luck, with average time guaranteed to be O(log(n)). If you have a batch job, use a vector. It's much better, because you don't need all member functions to execute in the same time constraints, rather you need smaller time for the whole task. If you have an interactive system, this container is better because it doesn't cause unintended pauses, but for normal everyday tasks it's probably an overkill. For real-time system, something like this container could work but there must be extra work done on memory allocation so that it executes within 100% predictable time constraints.
As for insertion:
If you insert by index, the indices of all older elements change implicitly because of the underlying child counting mechanism. If you insert comparable elements, you should manually check if that element is already present, if you don't want any duplicates. This makes insertion time take O(2 * log(n)) time, which is again O(log(n)) anyway. Your insertion code will then look like this:
if (tree.FindComparable(x) == tree.Size()) tree.InsertComparable(x);
because FindComparable(...) returns Size() if the element is not present. Maybe I should add another method that checks for that, and call it something like InsertComparableUnique() or something... but it would be a one-liner. Nice idea, so thanks a lot!
-- modified at 11:43 Thursday 5th January, 2006
|
|
|
|
 |
|
 |
Thanks, this is just what I needed. Sorry for the slow response, I was out of town.
Scott
|
|
|
|
 |
|
 |
It would be nice if you could add an option to preserve the sequence in which objects with duplicate key values were added. Presently they are stored in reverse order.
Scott
|
|
|
|
 |
|
 |
I will see what I can do. Check my site http://myjavaserver.com/~dimitertg these days for updates. I will try to do this as soon as I get some time, maybe tonight (it's 2:31 PM now here)
|
|
|
|
 |
|
 |
You mean when you use InsertComparable, right? Because if you use indices that's pretty much by design...
|
|
|
|
 |
|
|
 |
|
 |
Hi,
You can get the latest version here:
http://www.myjavaserver.com/~dimitertg/
|
|
|
|
 |
|
 |
I know you are very busy, so I doubt you will have time to consider these, BUT...
- An implmentaion that does NOT use stl. As mentioned, stl has containers that perform pretty well, though perhaps not as well as this one. However there are times when stl in simply unavailable, and it would be great to have good containers in these cases (of course, no MFC either).
- Internal locking - in multi thread applications quite often we need to lock containers, and this implementaion (like most others) forces a gross external lock.
|
|
|
|
 |
|
 |
This implementation doesn't use STL by default, just the examples do. There will be a new version very soon, because I just found out that FindComparable returns the wrong index Stay in touch...
|
|
|
|
 |
|
 |
OK, the bugfix is done. Mailing it now...
|
|
|
|
 |
|
 |
Fix mailed. Now FindComparable actually works.
|
|
|
|
 |
|
 |
Well, don't feel bad about the bug, but do please test even more that you have.
We had a home grown R/B tree that we THOUGHT worked for years, until we found a delete bug. I belive the code was based on Sedgewick, which didn't have a delete function. We ended up borrowing someone elses code.
Don't give up - it is a worth while endevor.
"This implementation doesn't use STL by default"
Ok, then do us all a favor - don't put parts of the example into the impelmention. In other words, HeightBalancedTree.h (which I presume is the implemetion) should not include
#include
#include
#include
If it doesn't use them. The other stuff related to your example should be else where as well.
This will greatly enable resuse. Otherwise users need to chop stuff up, and when you have an update the chopping will have to all be re-done.
|
|
|
|
 |
|
 |
Thanks. That's what I will do. I hope the fix gets published soon. Otherwise, for the next version I will remove any includes from the header file and all references to stl there, just like you suggested.
|
|
|
|
 |
|
 |
OK, all done. No need to try to send the updated code to codeproject, I think they ignore updates. Take it from my site instead: http://www.myjavaserver.com/~dimitertg/ . Again, thanks for the feedback!
|
|
|
|
 |
|
 |
Did you like about this method when implementing this object?
|
|
|
|
 |
|
 |
Actually, no, deletion is always immediate. I had another implementation that used memory pools and was much, much more complicated, but since it was based on Ammeraal's implementation, I scrapped it and wrote this from scratch as a proof-of-concept. I will add memory pools and such things but only when I start using this in my yet-unfinished text editor (as an array of lines)... that may never see the light of day if it keeps going like this... there just isn't enough time for all that... I'd love to see something like what you suggest.
|
|
|
|
 |
|
 |
map and set are ordered structures (red-black trees usually i think)
they have same features and enjoy STL's robustness and standardization.
Iftah.
|
|
|
|
 |
|
 |
We must consider the difference between the facts and the myths:
STL is a specification with a number of different implementations.
The only thing STL specifications says about set/map is that they must be sorted and associative.
If I write myself a class having the same std::map interface and that store data in a reallocated array at each insertion, I’m still providing and STL compliant map class. Of course, about efficiency, we’ll have lot to discuss.
Moral: you can say STL is standard, not that it is robust.
You can say that a particular STL implementation is robust (or better: more robust than another).
About the PJ Plauger STL implementation provided by Microsoft with its compliers, yes … it uses black/red trees. Like every sorted contained based on binary trees, it has the same external behaviour as AVL, but the different algorithm used to keep them balanced make them differently performant in different situation.
B/R performs better in case of frequent insertion/deletions, AVL in case of frequent visit.
About robustness: if everyone use only existing code because is “robust” no new code can be provided and no emprovement in the IT science can exist.
Test it, correct it, and it’ll become robust as well everything else.
If you don’t have the time to do this … don’t use it, but don’t ask why invent it …
In a metaphor: if anyone marries only experienced girl to have children, no children can be done in after one generation.
Out of metaphor, if you have the presumption to write “for ever software” you're right.
If you think your software can –sooner or later- die, why never test something different in new software development?
2 bugs found.
> recompile ...
65534 bugs found.
|
|
|
|
 |
|
 |
This is not completely correct: the STL has specific requirements for all its containers and algorithms, e.g. from 'Generic Programming and the STL' (Austern) one can read that associative containers must meet:
- Average complexity for find and equal range is most logarithmic
- Average complexity for count is at most O(log(size()) + count(k))
- Average complexity for erase key is at most O(log(size()) + count(k))
etc...
A red/black tree full fills this requirement and if your implementation does the same, you may call it an STL compliant map. It's btw funny that the SGI implementation just used the red/black tree description of "Introduction to Algorithms" form CLR, except I believe they had also make an exception for insertion or erasion due to iterator stability requirements of the STL.
However because of current processor architectures, often hashing and sorted vectors perform better than balanced binary trees. Alexandrescu has therefore written a sorted vector, which can easily out perform a balanced tree if searching is more frequent than insertion/deletion.
|
|
|
|
 |
|
 |
"However because of current processor architectures, often hashing and sorted vectors perform better than balanced binary trees. Alexandrescu has therefore written a sorted vector, which can easily out perform a balanced tree if searching is more frequent than insertion/deletion."
Could you please explain this in more detail?
What is it about "current processor architectures" that make this so.
Are you saying that a hash will out perform this AVL ?
You say "often", so I ask under what conditions is this the case ?
|
|
|
|
 |
|
 |
I believe it is item 23 of 'Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library' from Scott Meyers. Caches of current processors can operate on processor speeds. If a processor can look it up in the cache, it's much faster that looked up in normal memory (which is much faster than a page fault btw). The cache is filled with data from recently addressed memory locations. Because vectors operate on a continous memory array, it is more likely that a cache hit will take place when oiperating on its elements than with binary trees where its nodes can be scattered over memory. Hash containers are in between those two extremes.
|
|
|
|
 |
|
 |
Ok, I see.
Well, I wouldn't be the farm on that one. This may be true today, but not tomorrow - and is there is a good chance that on an real SMP machine (hyper-hype doesn't count) you may find it completely opposite - if two threads running on different processors are using the container, to maintain coherency between the processors cache is flushed much more frequently when the memory address are cloasly related - therefore non-locality could be an advantage.
Historically, counting on specific processor features over proper design and use usually backfires in the long run.
As for page faults, if you find out how to control that your better than the NT design team.
I'll have to mention this to Scott.
|
|
|
|
 |
|
 |
Of course you are right that this argument does not have to hold, but I guess starting from about 1985 processor speeds have grwon faste than memory speeds. So I guess this will hold a while. Of course a good design just uses an interface: now it can be a sorted vector, in the future replace it with set/map again. For the company I work, this speed advantage really not an issue.
|
|
|
|
 |