> If somehow the database design requires joins, Mongo is fast enough to run...

Xorlev · on May 10, 2013

+1 joins are efficiently done by databases, not by applications. At least in any reasonable dataset.

Joining in application works for a few thousand records.

pkolaczk · on May 12, 2013

Depends on what you are joining. Selecting five rows from one (sharded) table and the looking up matching five rows from another (sharded) table by primary key is almost just as fast from the application as it would be done by the database itself.

Additionally many NoSQL databases let you store arbitrary number of data in a row, so you don't need (big) intermediate join tables.

However, I agree that a database can be much faster when you want to join two huge datasets, because then it has all the fancy ways of fast joining with sort-merge-join or hash-join, etc. But if you need to join huge datasets, you're screwed anyway, because such joins don't scale at all, and are still really, really slow - they need to do at least one sequential scan (and typically 3) over each full dataset. Definitely not something you want to do in your OLTP app.

I saw an OLTP app where joining a few thousand rows (yeah, thousand, not million!) killed the app performance totally, up to the point that a single user using the app had to wait >10 seconds for a page refresh. The app was probably done by someone thinking that joins are free.

k_bx · on May 11, 2013

> That's definitely not true. What if you were planning on filtering after the join? You may find yourself pulling millions of records. The bandwidth alone would bring you down. I work with MongoDB, and once in awhile I really miss joins. You can't emulate joins in any reasonable amount of time in Mongo.

I agree that joins should be done by mongo (as a feature, similar to RethinkDB), but they can get you in trouble when you convert your collection from local-to-joined-one into sharded-into-multiple-servers. You'd then gain huge network load that can potentially bring your system down.

If you decided that data in some particular collection is potentially huge -- just don't use joins on it (unless REALLY once in a while).

pkolaczk · on May 12, 2013

"What if you were planning on filtering after the join? You may find yourself pulling millions of records."

Then you always filter before doing the join. Problem solved.

thedufer · on May 13, 2013

Are we pretending that there aren't filters that could depend on both halves of the join?