|
|
I read through this thread and really cannot understand what's the "problem". Where the voodoo comes in. There is very little that surprises me. Or do I mistunderstand what you are talking about?
Say you've got 4kiByte blocks, 32bit key values, then an index block holds up to 500 downwards pointers (the 96 bytes are for horizontal pointers and other sorts of management info). Root + 1 level provide pointers to a quarter million blocks. If your data records are small enough to fit 40 to the block (i.e. less than 100 bytes), this might be enough for 10 million records: Search the root index (use binary search if you prefer!), follow associade pointer to next index level. Search this root index (again binary search if you like), and there is youv'e got the pointer to the block where your data are located. The data block may have a small index with key/offset pairs to each record; this is so small that a sequential seach probably is as fast as a binary.
If blocks are at minimum filling, you may, for ten million 100 byte records, end up with two index levels below the root. If keys are not integer, but require more space, you may not be able to pack 330-500 key/pointer pairs at the index levels; this may also cause a second level to be established. But still: Three small binary searches to get to the right block isn't much work. If this is a disk structure: The root will definitely remain i cache throughout, and unless you are extremely tight on RAM, all or much of the first level below will as well. Often you use sorted structures because the records are frequently processed in sorted order. So while the second index level is not necessarily all in RAM, the relevant parts of it may be, in those cases where you do a more or less sequential processing. (If the sequential ordering means nothing to your use of the data, then you should use a hash method, not B-tree!)
You quote a single figure for both insert and removal. When you make your timings, you should make a large number of calls, quoting minimum, maximum, and average (or even more statistics-oriented figures, if that's within you field of knowledge). If you enter a million records, most of them will be super-fast; there is still room in that block for accomodating another one (key, at index level, or record, at leaf level). Every now and then, a block fills up and must be split (more often at the leaf level than at the index level, if data itself is kept in the block, not just pointers). How often this happens depends on the key size (for index blocks) and record size (for the data blocks), and the block size.
At rare occasions, the entire tree is full and the number of levels must be added. That could be quite a lot more time consuming; if the tree is to be balanced according to B-tree rules all the time, this might require some shuffeling stuff around.
Remember that B-tree rules doesn't absolutely require you to immediately float a block split up to the index level above. A search may have to skip to the next "horizontal" block, until it finds a key higher than the candidate key. Some implementations are rather lax, leaving split blocks for quite some time before updating the index. This might reduce the average add time, at the expense of (asynchronous) "cleanup" routines and somewhat higher average search times.
On the average, B-trees are quite fast for searching. But one thing that holds for all sorts of index structures: If you've got ten million new records for inserting into the tree, the right way to it is NOT one-by-one, in unsorted order. Any B-tre implementation should provide a "batch insert", sorting those ten million records by an nlogn method before inserting them into the tree, and do this in sequential key order, filling up leaf blocks from one end, one block at a time, without worrying about the higher index levels, linking in new blocks through horizontal pointers as needed. Once the bottom layer is done (without concern for higher level indexes), you go up one level to insert pointers all the new blocks, completing that level before ascending further, level by level, up to the root.
|
|
|
|
|
first of all, it's a collection class, not a DBMS. Yes, B-trees can be used for DBMS systems but if I was going to go through all the trouble of optimizing my algorithm doing batch inserts and what have you, then I'd be doing it on a B Plus tree instead
At least if my goal was to make a dbms.
My concern in my OP was not about the performance of my class(es)
It was about the performance of Microsoft's classes.
Namely, a SortedDictionary (red-black tree) beating out a B-tree after 10,000 entries.
But then i think i already explained that. Maybe i didn't. i don't remember and don't much care.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Apparently I implemented a B+ tree for no reason because it gives you zero benefit unless you're accessing slow I/O like a disk, and it's tied to your nodes, which means implementing some kind of file driver and a file format for storing nodes at the very least.
So basically, a database.
I mean, it's cool code, but it's useless byself. On the shelf it goes.
*headdesk*
At least my regular B Tree works. That's kinda cool.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Make it 2D, you got the fabled QuadTrees!
Although, most (in memory) implementation (including mine ) seems to be simple (2D) binary tree instead of being B+Tree
Could be useful for making graphics application (and doing in memory hit test), like the one I have in limbo for the last 2 years!
|
|
|
|
|
interesting. right now i don't know much about graphics as far as the maths go.
but i can do some basic trig and compute the distance between points, that kind of thing.
i've never much needed to do polygon hit testing though i do know that btrees or b+trees are used in them. I'm not sure how.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Simple, when you do an hit test you don't want to hit test all the polygon in your application, since it's gonna be a computationally expensive operation.
So all your polygons are in a Quad Tree and with their bounding rectangle.
And the QuadTree has a very fast hit test implementation that would return the list of polygon worth checking in more details!
|
|
|
|
|
I get the gist at least.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
|
As for hit test, there some simple well know implementation that works well for polygon. (the winding number one being my favourite)
My grief is that I work with shape, i.e. unlike polygon which have straight lines between vertices for boundary, shape have Bezier curves, much harder
|
|
|
|
|
also i remember it being more complicated for polys that have concave bits to them.
I could be wrong. It has been years since i read the black book.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
winding method work for all polygons!
the only "downside" is that self intersecting polygon would have holes.. where they self intersect and odd number of times... Which is more or less what people have come to expect....
|
|
|
|
|
Make it 3D, you get OctTrees!
Make it 4D you get HoneyWitchTrees!
|
|
|
|
|
Now go implement an AVL Tree.
#SupportHeForShe
Government can give you nothing but what it takes from somebody else. A government big enough to give you everything you want is big enough to take everything you've got, including your freedom.-Ezra Taft Benson
You must accept 1 of 2 basic premises: Either we are alone in the universe or we are not alone. Either way, the implications are staggering!-Wernher von Braun
|
|
|
|
|
i just did
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Now make a visual representation of that AVL tree that shows it rebalancing in realtime.
#SupportHeForShe
Government can give you nothing but what it takes from somebody else. A government big enough to give you everything you want is big enough to take everything you've got, including your freedom.-Ezra Taft Benson
You must accept 1 of 2 basic premises: Either we are alone in the universe or we are not alone. Either way, the implications are staggering!-Wernher von Braun
|
|
|
|
|
i might scrap my AVL tree as my B-tree implementation seems to outperform it, which kind of surprised me.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
It's probably the rebalancing code that makes it slower on insert. But should be faster on lookup.
#SupportHeForShe
Government can give you nothing but what it takes from somebody else. A government big enough to give you everything you want is big enough to take everything you've got, including your freedom.-Ezra Taft Benson
You must accept 1 of 2 basic premises: Either we are alone in the universe or we are not alone. Either way, the implications are staggering!-Wernher von Braun
|
|
|
|
|
Yep. I think it's a good rule of thumb that the more optimized your search algo is, the faster it's search is relative to its inserts (and especially removes!)
What surprised me some about my results is again after 30,000 rows or so if memory serves, the SortedBTreeDictionary was beating out SortedDictionary on inserts AND deletes, not just searches. I was impressed.
Having a much harder time getting my AVL tree working despite the algorithm being simpler. The problem is I was recomputing the heights all the time which made it atrociously slow and the code just didn't want to accept my optimizations. Problems kept coming up. I've finally got it inserting and deleting 10,000,000 rows without much complaint which means I can throw it back into my cheesy little test harness and beat it up. woo.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Good times! I look forward to hearing about your results!
#SupportHeForShe
Government can give you nothing but what it takes from somebody else. A government big enough to give you everything you want is big enough to take everything you've got, including your freedom.-Ezra Taft Benson
You must accept 1 of 2 basic premises: Either we are alone in the universe or we are not alone. Either way, the implications are staggering!-Wernher von Braun
|
|
|
|
|
I needed to download JDK 1.8 so I googled for it. Google takes me to Oracle download site. After clicking the download link I have to create an Oracle account which wants Company name and work phone and address, etc.
You suck Oracle!
Social Media - A platform that makes it easier for the crazies to find each other.
Everyone is born right handed. Only the strongest overcome it.
Fight for left-handed rights and hand equality.
|
|
|
|
|
That's what I use my spam magnet account for and that is the actual user name of it.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
I have one of those: "DontSendMeAnyCrap@..."
It's an auto-delete everything account.
Or there is Mailinator if they want a confirmation email.
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
That actually is where I downloaded it from. Once you check to accept the terms and then click the Hyperlink it then takes you to a login page where you have to login to the Oracle site.
Social Media - A platform that makes it easier for the crazies to find each other.
Everyone is born right handed. Only the strongest overcome it.
Fight for left-handed rights and hand equality.
|
|
|
|