PostMortem - Cloud Migration

status
postmortem

#1

Following Maintenance - 29-30th of September which did not work as we wanted to, we could not achieve a full migration of cloud.indie.host. While we have already been migrating different nextcloud instances with success in an elegant and native way to kubernetes. The scale change a few parameters.

I had to proceed to different tries to get the cloud to our new infrastructure that lead to several unplanned maintenance.

We used stash which is our backup system on the kubernetes cluster to proceed to the migration. Make a backup of the cloud, restore, set cloud on maintenance, new backup, new restore, redirect traffic.

Transferring the cloud data has been a very long procedure which in the end even failed. We then decided to use rsync in order to remove the restore step. Idea there was to directly transfer the data in the kubernetes volume that will be used by nextcloud.

In the meantime, we proceed to the transfer of the database the same way. Restoring the database took an unexpected amount of time, and also failed on first try. A bit of tuning of the database was needed.

Finally when all data was transferred and the database restored during Saturday night, there was an issue with apache configuraton that took a bit of time to figure out until everything was back on Sunday afternoon.

Sorry about the disturbances and mostly for the lack of information. We could have postpone the migration to another week and give it a bit more preparation. It was becoming critical and lasted already too long so we made the hard call to go for a long maintenance which unfortunately took longer than expected.


Maintenance - 29-30th of September