> We sincerely apologize for the inconvenience that this situation has caused. This type of incident is extremely rare in the web hosting industry.
Why are they speaking of the "industry" as a whole when they are to blame?
It's even crazier they are not even explaining the source of the data loss and why the "replication systems" didn't help.
IHMO they are trying to sweep this event under the carpet. They should instead explain why they should be trusted in the future and why this would not occur again.
yeah, I worked for several years at a hosting provider, and I can tell you for a fact that this wouldn't have happened there.
They're 100% virtualized and keep backups of all those machines. In addition, you can purchase a package so that YOUR backups are automatically backed up to 2 different datacenters. Between the two of those solutions, there would be a way forward.
I don't really know what Gandi is so I can't speak to them directly, but this is a solvable problem.
Replication is not a backup as was already mentioned. A great example of this is when the KDE project almost lost all of their Git repos because they were mirroring a corrupted copy of the data. https://www.phoronix.com/scan.php?page=news_item&px=MTMzNTc
Fortunately, git is a DVCS, so anyone who checks out a repo has a complete copy of it.
Now, granted, it'd be a huge pain to track down all the people who had copies of the 1,500 different repos, and try to find as up-to-date as possible of a version of each, but I doubt they got anywhere close to potentially losing all their source code.
Incidentally this shows why it's a good idea to sync your repo to GitHub, even if the canonical repo is elsewhere: in addition to the usual reasons of incentivizing some contributors by giving them "GitHub credit", and increasing visibility of your project's code, GitHub can serve as a backup!
Also, on a side-note, 1,500 separate repositories?! That sounds way overkill. I wonder if they'd benefit from having a monorepo.
That's a long article; please quote the part you're referring to so we're all looking at the same text.
> a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event
Since a "replica" is a copy, that seems technically correct.
Let's say a typical admin of a small shop wants to backup his postgres database.
The first thing he'll use is probably pg_dumpall which he'll output to a storage.
No replication involved. The backup is just a bunch of sql statements to recover the last known state of the database. It's a different kind of format however, which -by definition- isn't a replica anymore.
(And this process has several caveat's-one of which is that it can produce unusable dumps in some rare cases and isn't complete. users, triggers etc aren't dumped iirc.. could be wrong there)
> It's a different kind of format however, which -by definition- isn't a replica anymore.
All non-trivial replication has to cross machine boundaries. To transmit to another machine, you have to use a serial format since there are no pointers on the wire. So insisting that a replica must be the same format prohibits the concept of replication in practice.
So, that admin dumped his database but didn't know that the data was using a custom locale which makes recovery troublesome.
How is it a copy if it can't recover the original in some cases?
We're not talking about a compressed archive here, it's a (incomplete) step-by-step instruction to recreate the data. If anything out of norm happens, it's gonna fail-possibly silently.
> How is it a copy if it can't recover the original in some cases?
Do you think a gun is not a gun if it sometimes jams?
> We're not talking about a compressed archive here
I think we have two camps. Mine is considering "copy", "backup", "replica" to be broad categories that are distinguished by simple mathematical or technical properties. For instance, I'd consider a device that copied a single bit to be "copying," even though it's arguably just a wire.
The other camp has very specific products and tasks in mind. A replica is associated with distributed computing, while a backup is something a systems administrator makes as part of disaster recovery.
> (And this process has several caveat's-one of which is that it can produce unusable dumps in some rare cases and isn't complete. users, triggers etc aren't dumped iirc.. could be wrong there)
Triggers are dumped, users need pg_dumpall (as they live across multiple databases, same with tablespaces).
In the last few years, I have seen many people confuse replication with backups. People see them as the same thing, but they really aren't. Even with snapshots, if the devices are the same, they might have the same firmware bug, etc.
Just to further explore that a bit, would you say replication adds independent copies for failures of media, while backup adds copies made by independent software against failures of process / software / media.
Replication increases risk of data loss when implemented incorrectly, because added resources increase the probability of bit errors. This applies to both replicated disks (RAID) and servers. Replicated servers must use ECC memory as well as checksum blocks and periodically scrub data to ensure integrity (e.g. what ZFS does for you). If they don't then a bit error corrupts the data on all servers, because you have no way of know which copies are pristine or how to piece together pristine parts.
Why are they speaking of the "industry" as a whole when they are to blame?
It's even crazier they are not even explaining the source of the data loss and why the "replication systems" didn't help.
IHMO they are trying to sweep this event under the carpet. They should instead explain why they should be trusted in the future and why this would not occur again.