Managed DR: Reporting

Managed DR: Reporting

Postby becks » Tue Oct 23, 2012 2:54 pm

Hi Folks,

I would like understand what are few basic reports that any DR solution should be looked for esp for a virtual workloads. Some of the key reports would be the RPO/RTO of each workload; like so are there any other reports which are primarily used for compliance & auditing specifically? Also how of much of these reports are being leveraged by service providers offering managed DR?

Regards
bekz
becks
Reader
Reader
 
Posts: 3
Joined: Mon Sep 24, 2012 5:45 am

Re: Managed DR: Reporting

Postby grewjac » Thu Nov 08, 2012 11:16 am

If by "managed DR," you mean you've contracted with an IT recovery services vendor to host your recovery solution. You either "rent" their hardware or you pay $$ per square foot to place your hardware on their floor. Either way, there are only two items of primary importance you want to see, they are both the result of your regular recovery plan exercises:

1. Did executing the plan deliver system availability to users within the RTO?
2. Did executing the plan deliver the data without loss exceeding the RPO?

You see, When you've done your business impact analysis (BIA, you have established the RTO and RPO for every system. From that, strategies to meet the RTO's and RPO's were selected. Then, plans were written to implement those strategies, meaning, all the necessary arrangements and resources, including assigned recovery staff, hardware, networks, application source code, back-up data (whether on tape, streamed, or both), licenses/keys, etc., are made available to support the recovery.

So, when you "test," you should be seeing recovery staff following the plans to a successful recovery within RTO's and RPO's. Sure, it's a simple thing, but I can't tell you how many clients and companies I've seen that simply line everything up, and go through the motions of recovery without taking into account the difference between that and what a "real" disaster is like, when half the staff can't respond because roads and/or bridges are out, or the data center , where they are running the exercise scenario, is on fire. Does the plan address that? How would they exercise that scenario? They don't. Are they vulnerable, absolutely.

Hope this helps.
grewjac
Global Moderator
Global Moderator
 
Posts: 447
Joined: Fri Oct 01, 2004 11:38 am
Location: Westlake Village, CA USA

Re: Managed DR: Reporting

Postby becks » Thu Nov 08, 2012 4:56 pm

Thanks for reply grewjac. Yes by managed DR i meant solutions like DRaaS or IT recovery as service. Where is DR solution providers will be managing the DR related activities for a client.

I would like to know how RPO gets tested to what the providers are publishing? If there is any recovery plan execution reports templates which can be used that will be really helpful. Are there any other possible report metrics which we need to look for as to optimize the RTO or RPO?
becks
Reader
Reader
 
Posts: 3
Joined: Mon Sep 24, 2012 5:45 am

Re: Managed DR: Reporting

Postby grewjac » Fri Nov 09, 2012 11:06 am

Excellent question. That's where contracted services get... interesting. First, you need to be clear on what RTO and RPO really mean, because the definition means HUGE potential for loss IF you don't scrutinize the contract language

RTO is the point in time when the impact of an outage is unacceptably high, i.e., loss of revenue, market share, stakeholder and stockholder confidence, et cetera. The BIA data should nail this down clearly. If a given business operation can't be down for more than a millisecond, a minute, an hour, a day or a week, the recovery solution to regain availability of the application(s) to users in the operation must be designed accordingly. So, the availability solution MUST be designed to meet that time limit. Same with the RPO, which defines your storage requirements. If the business operation has zero tolerance for loss of data, e.g., banking or stock market transactions, the storage solution MUST include streaming data to a secure location geographically near enough to meet that RPO. These are pricy solutions, but banks and brokers happily pay for them because the alternative is far worse.

Now here is where it gets interesting:" recovery service providers ' contracts define RTO and RPO differently, simply by adding a clause like: "RTO and RPO is measured from the point in time the client formally declares the disaster to the [recovery services vendor]." This little phrase could be minutes to hours, depending upon the nature of event causing the outage, but if you have really low RTO's and/or RPOs, they're off the hook IF you don't insist on architecture that meets YOUR requirements despite that legal language. That means automatic hot failover and streamed data, if that's what's needed.

I hope this helps.
grewjac
Global Moderator
Global Moderator
 
Posts: 447
Joined: Fri Oct 01, 2004 11:38 am
Location: Westlake Village, CA USA

Re: Managed DR: Reporting

Postby becks » Fri Nov 09, 2012 11:33 am

Sure grewjac...All these points are helping me. Likewise you told I got to read RTO and RPO defined all the places for any DR. But what I am concerned is if a service provider publishes a RPO esp. doing such RPO tests might need even verifying the DB states or files.

I have to rely only on reports service providers provides for RPO validation but the actual RPO checks needs to happen on running recovery site applications. HotSites can be actively monitored so as to track these; for which I feel the RPO can be verified or can even go to an extend to be monitored continuously.

But for cold sites/dormant recovery site VMs RPO verification at application level could add to RTO; that could be the DR SLA decision as an organization anyone has to make probably. But this posses a key question as to how to rely on the RPO published for cold sites/dormant vms. Isn't it?
becks
Reader
Reader
 
Posts: 3
Joined: Mon Sep 24, 2012 5:45 am

Re: Managed DR: Reporting

Postby grewjac » Wed Nov 14, 2012 12:10 pm

Becks:

Remember that the driving force in setting RTOs and RPOs is the BIA process. If the BIA indicates a high-availability solution to meet the RTO, whoever designs the solution must assure the RTO will be met. Likewise, the recovery service vendor must assure the data protection solution meets the RPO requirement. When you have a cold site solution, that suggests the RTOs and RPOs are longer, perhaps recovery is "timely" one or two weeks after the outage begins.

I'm not sure what you mean by "...how to rely on the RPO published for the cold sites/dormant vms." If by "published," you mean what the recovery services contract schedule states, that goes back to my earlier post, where the vendors include language that asserts the "clock" doesn't start for their RTO and RPO until your company formally declares a disaster AND the vendor usually doesn't "consider the declaration "Formal" until they "accept" it, which means they want to confirm the facility where your solutions are located aren't being used for testing for another customer, or that there's a major, regional event that has their phones ringing for multiple declarations. So, they build this delay into their contract. One way around this is to design, build, and mange your own hot fail-over solution. And frankly, it's really a lot less costly, despite the initial capital outlay, because about 65% of hotsite fees are for test time.

Gregg Jacobsen, CBCP, MBCI (grewjac is just the first part of my email address.)
grewjac
Global Moderator
Global Moderator
 
Posts: 447
Joined: Fri Oct 01, 2004 11:38 am
Location: Westlake Village, CA USA


Return to Main BC Discussion Board

Who is online

Users browsing this forum: No registered users and 1 guest

cron