When It Rains It Freezes:Canadian Company Battles Northern ExposureWritten by Judith L. Eckles Thursday, 15 November 2007 14:45
Days of non-stop rain and temperatures hovering around freezing left southern Quebec and eastern Ontario blanketed by several inches of solid ice, halting virtually all travel, shutting down businesses, cutting off power to more than three million residents, and socking the Canadian economy with business losses estimated at $1.1 billion, or 0.2 percent of GDP.
It could’ve been worse.
"On January 5, we lost electrical power in the building," says Guy Chamberland, Corporate Director - IT for Domco Inc.
Domco is a leading North American manufacturer of vinyl floor coverings for commercial and residential markets. The company is headquartered about 30 miles southeast of Montreal in Farnham, Quebec.
Domco has a production facility in Farnham, as well as two more in the U.S., half a dozen distribution centers operated by its Domcor division in Canada, and a major customer service operation in Alabama. In all, there are more than a dozen sites in the U.S. and Canada, and all of them are networked into an AS/400 in Farnham.
Like most businesses in Farnham, Domco practically closed its doors when the ice storm hit. For the few employees who could navigate the icy streets — which were littered with stranded cars and fallen trees and utility poles — a cold office awaited because, of course, the power had been cut off. And besides, the prospect of a month or more with no electricity at home was more than enough for most employees to worry about.
However, Chamberland, who also leads Domco’s disaster recovery team, was prepared for a power outage. "We had a generator working, so we were still able to operate." And indeed, 500 users were still able to access the company’s AS/400, which handles everything from production planning to order entry and fulfillment, and without which, Chamberland says matter-of-factly, Domco would be "out of business."
What Chamberland wasn’t prepared for — what no one in the province was prepared for — was the accumulation of ice.
January 7: The ice tightens its grip
"We had a telecom wire running from the street to our building," Chamberland says, "and we expected that to be up." However, as the ice built up, it became clear that that expectation might require revision. The telecommunications line is critical because it links order entry, shipping, and invoicing from Domco’s six distribution locations across Canada to the AS/400 in Farnham. The line is also Domco’s principal connection to major operations in Florence, Alabama, and Houston, Texas. In fact, Chamberland was in Florence the week the disaster struck and had great difficulty reaching his disaster team in Farnham. He barely managed to get back to Farnham on Saturday, January 10.
Domco had installed a platform to support the telecommunications wire from the street into their building. But a chain is only as strong as its weakest link, which in this case was the telephone pole. And by January 8, under the weight of the ice, telephone poles in Farnham were snapping like toothpicks.
January 9: The inevitable
Losing the telecommunications line seemed inevitable, and it was. A day later, on January 9, Domco lost its telecommunications line. That evening, Jean-Guy Lafond, a member of Domco’s technical support staff, called SunGard Recovery Services and declared a disaster.
"We made full system backups," says Chamberland, "and Jean-Guy Lafond and Patrick Dubois drove these and our data backups to the airport." But not the Montreal airport, even though that’s the designated airport in their recovery plan.
Nothing was flying in or out of Montreal, says Chamberland, so "they kept driving until they reached an airport that hadn’t been closed by the ice storm. They finally took off from Burlington, Vermont, about an hour and a half from Montreal, and flew to Philadelphia."
January 10: The recovery begins
By the time Domco’s disaster recovery team arrived in Philadelphia Saturday evening, SunGard’s own recovery team had already been at work most of the day.
"We assigned an AS/400 for Domco to use for the recovery," says Bob Parker, SunGard’s Supervisor of Operations for IBM AS/400 and RS/6000 platforms. "We initialized the operating system, initialized 60 gigabytes DASD, and established the necessary communications lines."
Parker and the rest of the SunGard team assigned to Domco’s recovery had a head start. "I did a workshop with Domco," says Parker, "and we ran a very successful test with them in March ’97, so we were familiar with their system and their objectives."
By 8:15 p.m., Lafond and Dubois had completed overlaying Domco’s own operating system and microcode. They then started loading data, and by 5:30 the next morning, they’d finished restoring the system. Domco IPL’d the system at noon on Sunday, January 11 — less than 36 hours after declaring a disaster.
January 11: The second full day of the recovery
Sunday started a full week of 9 a.m. to 9 p.m. shifts for the two Domco employees, who turned over the reins to SunGard’s recovery team for the stretch between 9 p.m. and 9 a.m.
The next step for the recovery team was to establish communications with Domco’s Florence and Houston operations.
"The communications setup went smoothly," says Parker. "There was a minor problem with controllers, but these were resolved very quickly."
The problem was with Domco’s Frame Relay Access Device, or FRAD, says Charles Ernst, SunGard’s Supervisor of Network Operations. "They had a dedicated 56k frame relay circuit coming into the FRAD from MCI, and they ran two com ports off the FRAD to the AS/400. But initially there was some trouble with the FRAD because some recent changes in Domco’s production environment hadn’t been accounted for in their recovery configuration."
The dial-ups were working right away, so while Ernst and his team worked through the FRAD problem, Domco had its people dial in.
"We changed resource names for the frame relay and shipped a workstation controller to Domco’s Farnham office," says Ernst, "but by the time a technician arrived from Toronto, the controller had already been fixed."
January 12: Houston, we don’t have a problem
It took Memotec about a day to get the FRAD configured and set up, and from that point on — the morning of Jan. 12 — Domco was able to run production out of SunGard’s recovery facility.
Chamberland recalls that by midday Domco was "up and running at several sites. Monday afternoon, sites were coming back, one after another. One thing that was special, though, was that although our other sites could reach SunGard, we couldn’t reach SunGard from Farnham for the first week." That’s because the telecom system in Montreal was in tatters. Domco’s plan was to call in to have access to the frame relay, and let the locals operate off the local AS/400.
January 16: Homeward bound
By the Friday following the disaster declaration, the recovery at SunGard was going so smoothly that Domco recalled its two recovery specialists from SunGard’s Philadelphia facility. However, because power back in Farnham was still unreliable, Domco continued running operations off SunGard’s AS/400.
Starting January 16, says Chamberland, "we were running the AS/400 remotely using ‘PC Anywhere.’"
"We called them at the beginning of each shift," Parker says. "We’d let Domco know who was going to be on duty for us and whether there were any issues." SunGard also ran a daily backup routine for Domco’s data.
January 31: The thaw commences
"Electrical power was back in Farnham by the end of January," Chamberland says, "so we were without our main power from January 6 to the end of the month. During that time, we operated by generator. But those generators aren’t geared to operate for days at a time, so ours was breaking down and requiring constant maintenance. Our IT was probably running around 90 percent in Farnham, but total operations were probably at 50 percent of normal business. However, starting January 12, locations in the United States and across the rest of Canada were able to operate through SunGard without any problems.
"We had the recovery team for the first two weeks only," he says. "After that, we were operating remotely from Farnham with SunGard’s help. And by February 7 or 8, we were able to bring computer operations back to Farnham."
March 10: The aftermath
"I have to raise my hat to the recovery team," says Chamberland from his (fully electrified) office in Domco’s Farnham headquarters. "They left their families to take care of the recovery, and they’ve been very willing to help."
Now, he says, "We’re probably back to normal operations — normal life. It’s behind us. It’s really been an interesting experience. Everybody understands this has been quite an experience.
"When we do our debriefing, we’ll ask ourselves if we could have done anything differently," Chamberland says. "But I think it would have been difficult to convince management that we needed to prepare for a massive ice storm." In other words, they responded flawlessly to the unforeseeable.
When a recovery goes this smoothly, the disaster often goes unnoticed, which is good. Chamberland says that few people outside his group realized that Domco was operating in a disaster mode.
However, the recovery efforts didn’t go completely unnoticed. "Upper management was very appreciative," says Chamberland, "and Domco’s president has personally rewarded several employees from the IT department for their contributions during the outage."
Judith Eckles is Director of Marketing Communications for SunGard Recovery Services and has been with the company since 1990. She is the immediate past Chairperson for the Disaster Recovery Journal’s Editorial Advisory Board and was recently appointed to the Board of Directors of the Disaster Recovery Institute International, serving on a newly formed marketing committee.