Shuffle The Tapes And Rack
- Published on Friday, 26 October 2007 15:54
The situation was tense. Twenty-four tape drives - all wanting to be fed while each of us stood with 10 tape cartridges under each arm unable to answer the call...
This describes the situation in which we found ourselves during a recent recovery exercise. Previous to this, participants of a disaster recovery exercise would notify the Data Storage Technicians which tapes to send to the Recovery Center for a recovery exercise. This notification would normally take place two days before the starting date of the exercise. We had just started a disaster recovery tape rotation program in which our production systems sent their daily recovery tapes to our remote vault on a daily basis. This was our first opportunity to use this new rotation process. We anticipated that this would allow any of our customers to recover back to within the last 24 to 36 hours of the system failure. While the number of tapes sent to the recovery center increased four-fold, (with the increase directly attributed to this new rotation pattern), we had it all planned out -- we thought. No problems!
The process began orderly enough. Each day the Data Storage Technician would receive pick-lists for the several vaulting patterns to be sent to the remote vault. These lists were systematically generated by our tape management system. These tapes would be picked and packed for shipment in numerical sequence, then scanned and doubled checked to verify the shipment was correct.
The tapes were then sent to the remote vault where they were to remain in the cartons, as packed, for a fixed number of days. At the end of the rotation period, the tapes would be returned to the data center as scratch for reuse.
For a recovery exercise, or in the event of an actual disaster, one call to remote vault would send all our recovery tapes to the recovery center. The tapes were already picked, packed, awaiting the call and one hundred percent correct. Everything anyone would need to restore the system to "yesterday" would be in the remote vault.
When we arrived to conduct our recovery exercise, we had approximately 35 of these containers, each containing about 100 tape cartridges, waiting for us. Each of these 35 containers was unpacked and its tapes separately placed in the tape racks. The restore began, tape drives began calling for tapes, and about 30 minutes later we knew we had big problems.
It was somewhat inconvenient to look through 35 different little stacks to find a tape. We did some grumbling; however, when the drive was finished with the tape and unloaded the tape to be put back in the rack the blaring question came out, "Where did the tape go?" Since we had not made any kind of notation on the cartridge as to which little stack it came out of, we had no idea where to re-rack the tape. As used cartridges began to accumulate here, there and everywhere, a couple of suggestions were made.
We knew we had to put the tapes in some kind of order to avoid being buried in used tapes. We began an ordeal that extended some 14 hours where two and three members of our recovery team hand- sorted the tapes into one numerical sequence while attempting to keep up with more than 20 hungry tape drives. We had our hands full of tapes picking, racking, and stacking.
At the end of the 14 hours, we had successfully sorted the tapes into one ascending sequence. We were an exhausted but happy little group. The remainder of the exercise was somewhat uneventful at the recovery center. Oh, we had a couple of missing tapes due to mis-entries into the tape management system, (a learning point for our TMS folks); however, we were quite pleased with our progress. Then the second shoe fell.
When the time came to pack up and go home, we realized that we were not prepared to undo the 14-hour-hand-sort-routine. We began picking through all lists that had been cast aside as not being needed later in the exercise. Not all of the lists were marked as to which container they belonged and not all containers had contained pick lists. We found ourselves trying to guess which tapes went with which container. Six hours later, we felt pretty good. After all, we only had about 75 tapes left over.
We had no idea what the tapes contained or into which containers they should be packed.
We did the only desperate thing, we paced the leftover tapes in the leftover container and shipped it back to the remote vault, keeping our fingers crossed that we would not need them before the rotation period expired.
Somehow we survived what turned out to be a valuable recovery exercise. Vowing never to let this happen to us again, we returned home with a strong personal resolve to fix the problem. After asking a few questions we learned that the file used by the Data Storage Technicians ot pull the tapes could be downloaded onto a diskette. We could create diskettes for each day's tapes going to the vault. We would be sure that each container held not only tapes, but also a diskette and a pick list of those tapes.
Learning the record layout of the file, we used spread-sheet software to write a system of macros to read in the data from the diskettes, build a collective list of all tapes, sort the tape cartridges out in numerical sequence, (keeping track of the container from which they came) and assign slot numbers in the tape racks. We also had to allow for those little things that spring up such as, "Oh, by the way, where did that container come from? It wasn't there before." It worked! We reduced our time to rack the tapes from 14 hours to a little over three hours. When it was time to pack and go home, we had two options, either do a reverse sort, or use the original packing list.
Over the past couple of exercises, we have refined and improved our process into a smooth procedure. With multiple hands helping it became easier, faster and much more peaceful to manage our daily contingency tapes. Looking back, we can smile, or rather laugh at ourselves. Now, we have a sense of achievement.
All the tapes are racked and ready to go before the restore process reaches that point of call. One more learning point on the recovery path had been mastered.
We feel good that we met, we recognized and we conquered this rude and startling issue in an exercise rather than first meeting it in a real disaster.
Gary G. Wyne, CDRP, is a Business Continuity Planning Coordinator with Eli Lilly and Company.