Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Both times before this was posted it garnered 0 comments:

https://news.ycombinator.com/item?id=4317545

https://news.ycombinator.com/item?id=7243568

When I read this I immediately recognized it as a PDF that I have in GoodReader (where I store all my technical PDFs). So I'm curious if it will start a good conversation about b-trees this time :)

To get the conversation started, a couple of years ago, I decided to write a b-tree data structure to refresh my skills a bit. I wrote it in C since I've been doing a lot of Java the last 10 years or so. I only got as far as an in-memory b-tree and had started working on getting it stored to disk before I abandoned the project (think I was working on it during Christmas holidays and the holiday ended).

I subsequently read that in-memory b-trees can actually outperform other tree data structures like rb-trees due to cache coherency. I never bothered to performance test my b-tree vs. an rb-tree I also wrote for fun.

Anyway, that's my story about b-trees!



Cache coherency is something else.

But ya, B-Trees will (sometimes significantly) outperform BSTs for small keys because they will have fewer cache misses. Also interesting is that the idea that RB-Trees are faster than AVL trees for insert/delete may be outdated as well. I've seen benchmarks of AVL trees beating std::set in both libstdc++ and libc++ for both of those operations. Again this is probably due to less cache misses.

My B-Tree story is actually a B+Tree story. B-Trees (and B+Trees) can can actually be used for more than just ordered sets/maps, I've implemented a C++ sequence container[1] that allows O(log n) random access insertion and deletion. It is _much_ faster than a similar container[2] implemented using AVL trees, but at the cost of iterator stability.

[1] https://github.com/det/segmented_tree_seq

[2] http://avl-array.sourceforge.net


> Cache coherency is something else.

Ugh, yes, sorry about that. I meant that they are more cache friendly.

Not to disagree with anything you've said and only to back up what I was trying to get at is this SO answer:

http://stackoverflow.com/questions/647537/b-tree-faster-than...

The part I'm talking about is:

  Also, unless you play nasty allocation tricks with the red-
  black trees, it seems reasonable to conjecture that B-trees
  have better caching behavior (it accesses an array, not
  pointers strewn about all over the place, and has less
  allocation overhead increasing memory locality even more),
  which might help it in the speed race.
That's not exactly the reference I was thinking of, but it says what I remember about it.

Again, not taking away from what you're saying and not disagreeing with you in any way, more backing up what I originally said ;)


I did my final year project of my degree on B-Trees (Cache Conscious Optimisation of STL Data Structures).

The short version is that you want to tune to page size, inserts/reads tend to be faster than RB-Trees but deletes are slower. This was true across a variety of chips and cpu architectures.


Then this library from Google would interest you https://code.google.com/p/cpp-btree/

It exposes the standard STL container interface. Although it was primarily intended to be used as a map or a set, if I remember correctly they also had b-tree backed large vectors that outperformed std:: vectors in random reads. I seem to have lost the url for that post/comment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: