Guest Column | October 3, 2022

How To Simplify Your Infrastructure Disaster Recovery Plan

By Devon Rutherford, Leaseweb USA

iStock-1138022429-simplify-process

Whether you run an e-commerce site, are in the ad-serving business, or any other industry doing the bulk of your business online, unplanned downtime (due to outages), a ransomware attack, or a natural disaster can be one of the largest threats to your organization’s online success.

According to the Uptime Institute’s 2021 Global Data Center Survey, 69% of data center owners and operators suffered an outage within the past three years. And these service interruptions are having a material impact on their business with roughly half causing significant financial and reputational damage.

Disasters rarely strike when expected, and many system failures are due to factors outside of your control. Therefore, it’s important to have an effective disaster recovery plan and fault-tolerant solution in place. Something that enables your organization to easily restore business-critical operations with minimal downtime and data loss.

Don’t Bet On A Single Horse

Every Disaster Recovery (DR) plan should address the possible failure sources with its designed mitigations and remediation strategies. One such DR solution is Failover, which protects an organization from a catastrophic systems failure by sustaining critical operations on an alternative or redundant platform, during an outage of its primary systems.

For example, virtual servers and databases can be replicated to additional server infrastructure, located in a secondary data center – entirely independent of the primary site. Data on the recovery system mirrors the data on the primary system. In the event of an outage, recovery is initiated from the secondary site.

Depending on the nature of the incident, an organization can failover to the latest system image or to a specific recovery image. Frequently replicating system images between the primary and failover infrastructure ensures that the data is synchronized between them, which helps minimize the potential for data loss – while ensuring the site details are as up to date as possible. There are various approaches to accomplishing this type of redundancy, from the cloning of virtual environments to SAN storage replication, and custom scripting.

When it comes to replicating your databases, it is recommended to choose database software that has a feature set capable of supporting your redundancy requirements (master/replica, master/master, or even multi-master clustering).

Make Your Recovery Site Network-Independent

Even when an organization designs its platform to be fully redundant by using a secondary data center or different availability zones, using the network or carrier can still leave you with a single point of failure. If both data centers are on the same network, and that network faces a major issue – it can impact both your primary and failover site. For this reason, you should choose a secondary site that is network-independent of your primary data center.

Your existing hosting provider may be able to offer a network-independent secondary data center, which may also include failing over your public IP addresses to the secondary site. If your current provider’s secondary data center does not offer this independence, it would be advisable to consider using an alternative hosting provider for your recovery site.

Please note, working with multiple providers to achieve network independence can present a risk that your public IP addresses may not be routable to your failover site. To solve this issue, you can change the relevant DNS entries (A-records) to include the disaster recovery IP addresses. Then, at the time when it’s necessary to switch over to the recovery environment, the failover DNS records will already be propagated across the internet, and access to your platform will be restored near-instantaneously.

Test Early And Often

Disaster recovery is one of those things that need to work right – the first time (and every time) you use it. Yet, many organizations overlook a crucial step in the process, testing their disaster recovery plan, ensuring that failing over and “failing back” goes smoothly.

So, once you have built out your recovery site, and you are sure you’ve worked out each of these elements, you’ll want to perform failover testing – making sure everything works as anticipated – before a real emergency strikes. This way, any problems that come up during testing can be addressed, without the risk of hampering your production-site. Once the desired result has been achieved, it is recommended to test regularly – ensuring continued service continuity.

Depending on your system’s design, and the rate at which it changes, it can be helpful to perform maintenance and testing multiple times per year. Ideally, any changes to your infrastructure, network, hardware, or software should be tested after deployment, rather than waiting for the next annual test. 

Designing and implementing a disaster recovery solution doesn’t need to be complex, nor does it require extensive network knowledge. Failing over, to a curated copy of your primary production system, is a simple and cost-effective approach that – when tested properly –can provide you and your organization with peace of mind.

About The Author

Devon Rutherford is Pre-Sales Manager at Leaseweb USA.