A recent Run As Radio podcast covered “Disaster Recovery in the Cloud”. Many suggestions weren’t brand new of course, but it was great to have them validated by two of the best people in the field. Anecdotes from personal experience livened up the conversation. I’ve made some notes here, but do listen to the podcast if you can.
The main theme I would pick out is that you need to arrive at a point where you have a disaster recovery process which is well rehearsed and for the most part automated. Anyone in the team should be able to put the plan into action, not just the best people, and they should be able to work from the instructions.
The “In Case Of” plan (a term which I liked) should provide a full list of everything that needs to be done, right down to phone numbers of people who need to be contacted. In the past, host Richard Campbell has had to deal with a data centre burning down, so he speaks with authority when he says that this kind of plan is a big help in overcoming the shock of a disaster. Things should run on autopilot as much as possible.
It takes time to put all this together. There are barriers to getting started. The cloud helps overcome some of these. A Disaster Recovery environment on-premises tends to involve a lot of redundancy. The hardware and software add nothing of value unless there’s a disaster. This is expensive of course. Using the cloud removes most of the redundancy, so management are more likely to agree to give you what you need.
The process of evolving a DR system will need to start as soon as possible. Designing it with cloud services will take time, but at least you avoid the hardware procurement phase and the ongoing maintenance. To bed in the DR processes and the documentation, you can use the cloud to orchestrate physical processes, as well as conducting full-on DR tests. There’s no temptation to “sweat the assets” by letting production processes encroach on the DR environment, as there might be if you owned the physical assets.
Evolving and testing the process will uncover nasty surprises which you wouldn’t want to come across in an emergency: In one part of the conversation Richard and his guest discuss problems getting the payroll working – not something you’d want to forget! There’s another great anecdote about backup tapes being stored on top of a machine – which was then stolen. If you’ve worked in IT for any time you know that this kind of thing is possible anywhere. Perhaps doing a DR exercise might have uncovered the risk.
It’s worth looking at other podcasts on the same site – apart from the technical material, the snippets of industry gossip are brilliant.