FICON and Mainframe Disaster Recovery Insourcing
- Published on January 31, 2008
- Written by Mike McClain, Senior Web Designer & Site Manager
As many companies saw during the blackouts, hurricanes and terrorist attacks, the hot site was not the disaster recovery life insurance policy that was promised. The methodology works fine for a contained event that does not affect a large geographic area. However, when there is a wide spread event, such as 9/11, then the hot site is quickly overwhelmed with multiple companies who simultaneously declare a disaster and they just cannot accommodate everyone in the facility they have been accustomed to using.
Couple triage fashion that a shared site strategy uses when disasters are declared with regulatory concerns and we see a trend toward bringing disaster recovery in-house. This phenomenon is referred to as DR insourcing.
FICON technology is an enabler to insource DR much more cost effectively than it would be with ESCON. In addition, FICON’s performance advantages when compared with ESCON make it the technology of choice for meeting RPO and RTO objectives.
This approach, bringing disaster recovery back in-house, addresses many of the complaints that surround the traditional hot site recovery scenario:
- Money – We spend a decent amount of money on a hot site and do not see any sustained benefit
- Success – Let’s face it, most tape-based recoveries performed at a hot site fail. There will be a signature on paper declaring the test was a success, but in most cases there were files missing or applications that could not run successfully and that is in a controlled test where great care was taken to checkpoint all the data, what would happen in a real disaster situation?
- Use – Shouldn’t there be a way to get use out of disaster recovery money instead of it just being an insurance policy you hope you never need?
- Guarantee – Even though we are paying regularly for the right to use a hot site, there is no guarantee we will be able to recovery where we normally test
- Prohibitive Cost – A hidden cost in all hot site contracts is the “declaration fee.” This is a fee charged when the client declares there has been a disaster and wants to utilize the hot site facilities. This, many times, precludes an organization from declaring a disaster for a single application or applications
For these reasons, and others, many companies are now looking to DR insource. The methodology is pretty straight forward. Either utilize existing facilities, or leverage the myriad datacenter floorspace that is available to deploy a disaster recovery solution that is owned and managed internally while utilizing today’s technology to get use of the equipment during non disaster recovery times. Finally, weigh the cost-benefit trade offs and evaluate whether or not to build a new data center geared toward insourcing DR.
The greater bandwidth and distance capabilities FICON has over ESCON are starting to make it an essential and cost effective component in HA/DR/BC solutions. As mentioned earlier, since Sept. 11, 2001, more and more companies are insourcing DR. Those that are doing so are building the mainframe piece of their new DR/BC datacenters using FICON, rather than ESCON. And more and more this includes cascaded FICON.
Cascaded FICON refers to an implementation of FICON that involves one or more FICON channel paths to be defined over 2 FICON directors that are connected to each other using an Inter-Switch Link (ISL). The processor interface is connected to one director, while the storage interface is connected to the other. This configuration is supported for both disk and tape, with multiple processors, disk subsystems and tape subsystems sharing the ISLs between the directors.
Until FICON cascading, the FICON architecture has been limited to a single domain due to the single byte addressing limitations inherited from ESCON. FICON cascading allows the end user to have a greater maximum distance between sites: up to an unrepeated distance of 36 km at 2 Gb/sec bandwidth.
Sept. 11, 2001, underscored how critical it is for an enterprise to
be prepared for disaster. Even more so for large enterprise mainframe
customers. A complete paradigm shift has occurred since 9/11 when we
discuss DR/BC. Disaster recovery is no longer limited to problems such
as fires or a small flood. Companies now need to consider and plan for
the possibility of the destruction of their entire data center, and
possibly the people that work in it. A great many articles, books, and
other publications have discussed the IT “lessons learned”
from Sept. 11, 2001:
1) To maintain business continuity it is absolutely critical to maintain geographical separation of facilities and resources. Any resource your enterprise has that cannot be replaced from external sources within your recovery time objective (RTO) should be available within the enterprise. It is also preferable to have these resources in multiple locations. We’re talking about buildings, hardware, software, data, and staff. Cascaded FICON gives this geographical separation that post 9/11 business requires; ESCON does not.
2) The most successful DR/BC implementations are oftentimes based on as much automation as possible. Sept. 11 proved that key staff and skills may no longer be present after a disaster strikes.
3) Financial, government, military, and other enterprises now have critical RTO that are seconds or minutes and not days and hours. For these end users it has become increasingly necessary to implement in in-house (insourced) DR solution. Cascaded FICON allows for considerable cost savings compared with ESCON when insourcing DR/BC/HA.
There are some items that need to be addressed when attempting DR insourcing. First and foremost are licensing concerns. Particularly on a mainframe, the costs can be prohibitive. However, more and more there are some creative licensing deals where emergency use licenses are low cost until they are enabled at the time of a disaster. On a mainframe the test/dev partitions can be moved to the DR site and a DR partition can be waiting, not being used until testing or an actual event.
A second item which must be addressed is the proximity to the main production site. How far away should it be? And what methods are there to get data to the recovery site? How much cost savings and performance efficiencies will I get by using FICON as opposed to ESCON as the protocol for extension?
While it looks on the surface that the cost to bring disaster recovery in-house are prohibitive, think of the amount of money being spent for a hot site on a yearly basis. A company I have worked with recently confided they spent about $36,000 a month for their hot site contract. That equates to more than $400,000 a year just for insurance.
We haven’t even discussed the cost associated with sending tape off site. If that can be cut down or eliminated we are talking about a significant amount of money which can be used to fund a new DR strategy. Even though the initial entry fee to mirror data and install additional equipment at the insourced DR site would be easily four to five times the yearly hot site cost, there are tangible benefits that cannot be ignored:
- Control – You own the site and the equipment and you decide how or when to use it
- Similar Costs – Admittedly the initial outlay will cause some sticker shock, but once the site is deployed the monthly cost savings will allow for a break even point between two and three years. From that point on, technology refreshes will be on par with the monthly hot site costs
- Increased Testing – No longer will you have to wait to test your disaster recovery plan and spend extra money to fly, feed, and house your employees. Since the site is always connected to your primary facility, a disaster test can be more spontaneous and closer to “real world” than the “staged” tests at a hot site
n Ability to reduce tape costs – Technology is such that tape
is being replaced by low cost disk (virtual tape) and that solution
can allow for mirroring the data between sites. Additionally, tape costs
can be reduced by making tape perform the role it was meant to …
n Ability to build data center at lower cost with FICON – The performance gains of FICON over ESCON have allowed FICON adopters to significantly consolidate not only their channel environment, but also to significantly consolidate their disk and tape storage onto fewer footprints while getting performance improvements. FICON DASD will typically yield 40 percent or better improvements in subsystem response times, while allowing the end user to consolidate “X” TB onto fewer DASD array frames.
In much the same way as initially deploying a disaster recovery strategy can be daunting, so to can the process for DR insourcing. First things first, if you have not already done so categorize applications into tiers of recovery. In today’s world, recovering everything at the same time is not feasible.
After the applications are categorized, both a technology strategy and a secondary site need to be developed. Paramount in this effort, understanding the RTO and RPO for each tier in order to put the proper technology to each tier. The RTO and RPO will dictate both the technology that needs to be used in order to replicate data and how far away the recovery site can be. The lower the RPO, the closer the site has to be in order to ensure the data is close to the point of interruption.
Don’t try to boil the ocean. My recommendation is to take sections of the disaster recovery strategy and implement in pieces over time. For instance, think of putting a remote tape, and recovery system of course, in the recovery site and not changing the RTO/RPO initially. This will allow the organization to implement the DR Insourcing strategy without having to incur higher costs for replication of data other than the remote tape solution. The second phase can introduce top tier data replication and subsequent phases can enable enhanced recovery for all other tiers over time. By deploying in phases an organization can spread out the costs while implementing a DR Insourcing strategy and reducing the RTO and RPO.
Mainframe DR insourcing may not be right for your organization. However there are benefits that demand serious consideration. If, for instance, you can keep your run rate at a similar level or just slightly increase it and get use out of your disaster recovery equipment then the cost/benefit analysis will look that much better. In fact, DR insourcing changes the disaster recovery model from that of an insurance policy to one of a dual-use situation.
Bear in mind, if an organization decides this is a path they want to
investigate, the analysis alone could take months. Beyond that there
are political issues to contend with when trying to change the current
disaster recovery objectives. Some business units will not want to hear
their application doesn’t bear enough merit to be in the top tier
of recovery. With due-diligence and a solid costing strategy, the conversation
is much easier to have. The business unit can argue that their application
should be in the higher tier and can also get into that higher tier
if they are willing to pay the higher costs associated with that recovery
objective. Many times the business unit, when presented with a bill,
will acquiesce if you are going to affect their bottom line.
Steve Guendert is McDATA’s FICON principal consultant and is regarded as one of the mainframe industry’s FICON experts. You may reach him at email@example.com.
Rick Boyd is McDATA’s technology recovery principal consultant and is a trusted advisor for planning both mainframe and open systems BC/DR throughout the financial industry. You can e-mail him at firstname.lastname@example.org.
"Appeared in DRJ's Winter 2006 Issue"