The necessity of disaster recovery planning is now widely understood. For example, according to National Climatic Data Center, there were 50 disasters with economic impact over $1B each since 1990 – all affecting organizational ability to do business. Yet natural disasters represent only a small fraction of all the causes of IT systems’ downtime. When hardware failures, software failures, and human errors are taken into account, not having a functional disaster recovery plan is truly reckless and fraught with severe, adverse business implications. For example, the statistics on the following page, derived from Contingency Planning Research & Strategic Research Corporation data, illustrates hourly financial impact of downtime in different industries.
At the same time, having a disaster recovery plan on paper or even installing disaster recovery software does not guarantee the ability to recover either the data or the service within a reasonable time if the primary system is impacted. To ensure that the disaster recovery plan is functional, it needs to be tested annually or every time information systems are materially modified.
One such IT system that affects enterprises’ ability to do business broadly across many vertical industries is Microsoft Exchange. Today’s organizations rely on messaging and collaboration for internal communications, to link with the eco-system of suppliers and resellers, as well as to communicate with customers for sales and customer support issues. Downtime of the Microsoft Exchange messaging system is simply not acceptable.
Therein lays an IT manager’s dilemma. To ensure the DR plan for Microsoft Exchange works and disaster will not cause downtime, the DR plan needs to be adequately tested. But, to test the DR plan, an IT manager needs to create Exchange downtime by taking down the primary system and risk even more significant downtime if there are any problems with existing systems failover.
Backup and Recovery Group Testing/recovery
Let us consider a Microsoft Exchange recovery scenario based on Exchange recovery from back-up scenario and what this scenario means in terms of testing the DR plan. We will follow the procedures which minimizes the risk of compromising the live users’ data and service. For more details step by step instructions for recovering Microsoft Exchange server from backup to a new server follow method No. 3 in the following Microsoft Knowledge base article: http://support.microsoft.com/kb/823176:
1. First step of the plan is to create full backup of the Exchange server system state and the storage group to be recovered.
2. Secondly, a server needs to be identified to which this Exchange server will be restored. To restore the server from backup, the server must have the same name as the original server. However, two Exchange servers with the same name can not exist in the same active directory (AD) forest at the same time. Therefore, the recovery server needs to be in a separate AD Forest and will need to have DC and DNS servers available such as lab environment. The server needs to be substantially identical to the original Exchange server and have the same version of Windows operating system installed on the same volume and path. This server needs to be in a separate AD Forest and will need to have DC and DNS servers available.
3. The system state needs to be restored from back-up to the recovery server using "disaster recovery" switch.
4. After recovery, the administrator needs to make sure that organization, administrative group, database name, storage group name, and LegacyExchangeDN remain the same as on the original Exchange server. These parameters can be verified using system manager and lidfde utility
5. Now, the backup of information stores can be used to recover the database and the mailboxes. The administrator needs to make sure that the checkbox allowing the backup to overwrite the content of the information stores is checked.
6. Disconnect mailboxes hosted on the Exchange server being tested from the corresponding user records in active directory
7. Power down the source Exchange server
8. Move the recovered Exchange server into the from the lab environment with the separate active directory forest into the production environment and the forest that used to host the source Exchange server.
9. Ensure LAN layout support ability to have the same IP address at the recovery site where the recovery Exchange server is located or modify the DNS records to point to the recovered Exchange server
10. Connect mailboxes on the recovered Exchange server to the user accounts.
11. Verify mailboxes have all the data and users are able to connect.
12. Repeat steps 1 through 6 with the roles of the Exchange servers reversed to get users back on their original Exchange server.
Transaction-Based Replication Testing/Recovery
Some alternative disaster recovery solutions are more conducive to testing if the recovery plan works. One critical element to mitigate the risk of DR testing is employing a solution that uses active/active configuration, thereby eliminating the uncertainty of taking the primary system off-line to test DR.
One example of such solution for Microsoft Exchange is transaction-based replication. Transaction-based replication software scans selected users’ mailboxes for e-mail and other messaging transactions and replicates them in real time to active shadow mailboxes located on another live Exchange server at a different physical location. If the administrator initiates failover, software directs one or more users to temporarily connect to a shadow mailbox. This has all their up-to-date information, but is hosted on a server unaffected by the problem. Once the problem has been addressed; the user is re-directed back to their original mailbox which also receives all the users’ new messages, appointments, and other Exchange items.
With this solution, both primary and recovery sides are active at the same time. Thus, there is no need to bring down the entire site’s Exchange server just to verify that the recovery site will come up. Further, the solution enables granular failover for one or two mailboxes. If desired, administrator can create dedicated test mailboxes or proceed with failing over their own mailbox and those of other IT staff to verify that the data has been replicated accurately. This will also allow them to test that the end user experience, when operating against the shadow mailbox, does not differ from the user experience in normal operating mode including the ability to use Exchange Web access and mobile devices.
When evaluating a disaster recovery solution, it is important to consider the difficulties of planning and implementing periodic testing. Not testing the recovery plan puts under question your ability to recover data and service within the RTO objectives for the application. Not only must the testing be fairly manageable in terms of time requirements, but also testing that requires bringing down the entire server may create downtime and risk of losing data in the process. Solutions that mitigate these risks by facilitating active/active granular recovery will make testing easier so it can be done more often and greatly increase the confidence level for effective disaster recovery.
Renata Budko is a director of product management at Cemaphore Systems, the leading provider of transaction-based replication solution for Microsoft Exchange. Prior to Cemaphore, Budko had eight years of experience in messaging software and disaster recovery space in companies such as VMware, StarVox, and Hewlett-Packard.
"Appeared in DRJ's Summer 2007 Issue"