IT management, virtualized infrastructure teams, and service providers should offer DRaaS, disaster-recovery-as-a-service, make DR environments easy to operate, and focus on enhancement services that add value to replication technology and streamline the DR process.
By Matt Sprague, Infrastructure Services Manager, Computer Design & Integration LLC
Hollywood and the media sometimes leverage the shock value of unpleasant events such as fires, floods, blizzards, and hurricanes. However, we may need to remind ourselves disasters are real and happen every day, and we offer solutions that actually help businesses and families recover from these tragedies.
Imagine your business lost its servers, inventory, and customer database. You would want to protect your assets from unforeseen events and natural disasters too. FEMA estimates up to one-half of all businesses close within a year after a major disaster. Losing data is not something we want to think about but, ironically, it’s only when customers do rethink the high cost of losses they come looking to us as VARs and MSPs for solutions.
While our customers develop their business continuity plans, we, as providers, also need to prepare for disaster recovery (DR) service offerings. I’d like to share my eight simple steps.
Step 1: Recognize The Power Of Virtualization
Virtualization has been a major force of change in IT over the last decade and no service has benefited as greatly as DR. Replicating the data and functionality of a production environment was once the hardest part of DR. In today’s 100 percent virtualized environments, it’s as easy as replicating VM data and settings to a DR site. We also enjoy a plethora of software and methods for replicating a copy of VM data and configurations, and for keeping them in sync.
Today, the hard part is over and providers who recognize this shift are offering DRaaS. They have switched their focus to enhancement services that add value to replication technology by streamlining recovery and making DR environments simple to operate. Two tips we can learn from them are:
Step 2: Mimic The Production Environment
Here is a simple uncompromising philosophy that allows you to provide streamlined DR for mission-critical environments successfully without added complexity: model the virtual DR environment based on the actual production environment detail-by-detail.
The guest VM must not detect whether it’s running in the production or DR environment. Treat the guest VM as sacred in the DR environment. No aspect of DR should require login to a VM. Nothing can be changed. No IP changes, no DNS changes, no configuration changes. This is the best way to guarantee the functionality is the same in DR as it is in production. In addition to promoting good DR functionality this approach offers several other benefits:
Step 3: Plan Ahead For Internal Networking Detail
Comprehensive DR design planning is not optional. Successful implementations do not skimp on the planning phase. If we maintain the philosophy I am suggesting, network discussions are clear and simple. For example, all of the following tips apply:
Step 4: Virtualize The Physical Networking Devices
Consider virtualizing your networking devices including virtual firewalls, load balancers, IDS/IDP systems, and VPN devices. If leveraging them, your DR plan should treat them like the virtual machines they are and just replicate them. Otherwise, the VMs that are replicated need to think they are still in the production environment when they boot up.
Use the following tips for mimicking all the functionality provided by the production devices in the DR environment:
Step 5: Test Your DR Setup
Testing is critical since we’re building an environment to match production rather than replicating the production environment. It’s the only way to know for sure that our setup is right.
Another benefit of this streamlined approach is that it requires no separate network to bubble test. If we were changing VM IP addresses to a dedicated DR network that might have connectivity back to production networks, we might want to have another separate, isolated network for testing purposes. This network would allow the VMs to boot up so we could verify functionality, but would not interfere with production environments. Since our DR environment is an exact copy of production and the two networks are isolated, there is no need for a separate network to bubble test.
Step 6: Try Partial Failover And Layer 2 Stretch
If partial failover to DR is required, stretch a layer 2 segment/VLAN between sites to allow VMs to come up in DR and communicate to VMs still running in production. Stretching layer 2 segments across physical boundaries is ordinarily a bad idea, and I agree; however, a disaster situation is generally a temporary one, so it can be tolerated for a short time.
To compromise, use a virtualized stretch like Layer2 VPN or Cisco Overlay Transport Virtualization (OTV) to allow the DR and production environments to communicate. Often this grants us more control over the communication and mitigates a lot of the issues created by stretching a VLAN to DR. However, when there is no physical connectivity between the production and DR sites, this is the only option.
Step 7: Preconfigure Public Networking
Maintaining the same public networking between production and DR sites is typically impossible. An exception occurs when an organization owns a large enough IP block to advertise via BGP to the Internet. In most cases, you are changing your public-facing IPs in DR. Here are some additional networking tips:
Step 8: Automate
No longer just another buzzword, automation is a key aspect of DR and ultimately requires that all the thinking and planning are done ahead of time. With workflow, process, and asset automation, even in an emergency situation, we just have to press the panic button, watch failover, and wait for our DR environment to come back online.
During difficult high-stress disasters, it may not be feasible to rely on a technician to follow a procedure to recover a system in DR. Keep automation in mind to empower you to implement these DR methodologies.
The Last Word on Disaster Recovery
DR environments are easier to implement thanks to strong virtualization trends. Technology empowers organizations to ship VM data to DR environments. Providers must focus on adding value to the services surrounding VM replication. DR environments can be streamlined to enable efficient and effortless failover during a disaster by adhering to a philosophy of not making changes to the guest VMs that are replicating from production.
I encourage you to leverage the tips in this article and apply them to the shared vision of a simple DR panic button with automated (or, at least clearly defined) failover that gives your organization and customers a strong return on their DR services, investments, and business continuity solutions.
Matt Sprague is Manager of Infrastructure Services at Computer Design & Integration LLC (CDI LLC), a hybrid IT cloud and managed services provider, where he manages a cross-functional team of infrastructure engineers that maintain datacenter and service offering infrastructure. An accomplished team leader with expertise in designing, implementing, and managing complex IT environments, Mr. Sprague has a proven track record of implementing sound, practical IT solutions designed to aid businesses in reaching their growth and profitability objectives. He is highly skilled at creating strategies that align technology and operational methodologies with organizational needs that maximize uptime and productivity.