Kind of data
With our new infrastructure based on kubernetes, we want to standardize everything.
We are trying to have only 2 databases - MongoDB for RocketChat and PostgreSQL for all the rest, ok maybe some MySQL if needed, we’ll see. And for data, we’ll try to exclusively use Object Storage, and disks if needed (Object Storage (S3 compatible) is already compatible with Discourse, RocketChat and Nextcloud).
So for completeness, I’d say we have 3 types of data to take care of:
- disks (blocks)
- Object Storage (S3 compatible) buckets
Data conservancy theory
For an Internet service provider like us, we need 3 distinct things:
- High Availability (HA)
- Point in time recovery (PITR)
- Disaster Recovery (DR)
It means at any moment in time, if a disk or a server fails, we are still able to serve our users. This should be transparent to the user.
Point in time recovery
In case of a mistake, we are able to go back, one day, one week and are able to restore data as it was some time ago to recover from that mistake. Ideally the user could self serve here.
Something really bad happened, and we want to recover from that too. Typically it involves keeping a copy on a remote place, some hundreds of kilometers away.
Possible events to loose data
What could happen so that we would loose your data. And what is the probability of those events?
- admin computer get hacked
- admin make a mistake and delete data
- there is a bug in a software and data is lost
- we loose 3 disks at a time
- datacenter explodes
Whatever you do, there is always a risk. I think the highest probability here is that we make a mistake. We have to find ways to mitigate that risk but it can happen.
We use ceph in our kubernetes cluster. Let’s have a look at what is the plan for each kind of data.
The tooling around databases is now amazing.
We started to use wal-g and we’ll generalize to all our databases.
You can send an encrypted stream of data to a remote Object Storage (S3 compatible) bucket. And send the diff (based on binlog, oplog or wal) at the interval you want.
For the biggest instances, we replicate directly on NVMe 3 times at the database level (PG replicas and so on).
For the smaller one, we use a ceph disk that is replicated 3 times.
Moreover there are not so much data in databases. Most data will be on disk or Object Storage.
Conclusion, it is easy to cover the 3 cases of data conservancy, and we are really happy there.
We can now take snapshots with ceph, directly from kubernetes. So with this feature, we cover HA and PITR.
But what about DR?
rbd export the diff and stream it to borg backup.
Object Storage (S3 compatible)
We can version buckets so we cover HA and PITR. What about DR? Well, we can s3fuse and borg it elsewhere.
What about use case? So let’s just discuss about 3 things:
We think that it is not that much of a big deal if you loose your cat emojis on the chat.
This might be a bit more problematic if you loose images here.
This is probably already a copy of a data that you have on your laptop or elsewhere.
Ecological cost of Disaster recovery
We’ll set up a disaster recovery site anyway in the coming months. We are already looking for hardware and location. But for each bit you write on our servers, it is already replicated 3 times on the ceph cluster. If you add the disaster recovery, it is 6 times. So we need to double the number of CPU/RAM/Disks/SSDs and so on.
The main question is now, what is the durability of data with and without Disaster recovery.
This is hard to calculate, and I don’t have a PhD in statistics. But I could say that roughly:
- 99.99% durability without disaster recovery
- 99.99999% with disaster recovery
- 99.9999999999% if the disaster recovery is in your hand
99.99% is quiet high already, but I promise that if we do hosting for the next 50years, what we plan it will happen. And if you are the 0.01%, it is 100% data loss for you. Same for each percentage.
This is what is called extreme risk. Even google lost data (it was hard to find on google strangely…).
So this post is to warn you about this risk, and possible solutions to overcome it.
And here is the strategy for our different users:
- Our plan is that by default, you’ll not get disaster recovery.
- If you join the professional support you’ll get it.
- But in both cases, we recommend you to set it up on your side. And this is for free for everybody. We’ll even explain you how to do if you need!