What Is ILM
Definitions of ILM can vary, but ILM will be defined as a data archiving process which moves data automatically to the most cost-effective storage media available and is based on prescribed policies of accessibility, security, and long-term storage. This automatic transferral of data requires no manual intervention; reducing hardware and real estate costs, therefore ILM vendors are able to promise a significant return of investment (ROI).
The data generated by an enterprise can be placed into two categories:
Critical information is the data that is used for day-to-day operations and is located within the enterprise’s primary storage system, allowing for fast access.
Important information is the data that can be archived to secondary storage, typically lower cost disks or tapes at an off-site location. This information is historical, legal, and regulatory.
Critical data is accessed frequently, yet over time a file will be accessed more sporadically, thus the file’s status changes from critical to important. A prescribed policy can also determine a set length of time by which a file ceases to be critical, such as after 90 days. The ILM solution then automatically archives this data to secondary storage, without manual assistance from IT personnel. ILM solutions then create a “pointer” that contains the metadata for every file that has been automatically moved to secondary storage. If the file’s status then ever returns to critical status, the pointer directs the user straight to the file’s new location to be retrieved for use.
The efficacy of ILM can be compared with systems libraries have used to manage the thousands of books in their collections. It is fairly easy and cheap to buy books, yet expensive to manage storage of the books so you know where each book is at any point in time. Additionally, a system needs to be set up to manually manage the movement of these books as well as a system of categorizing the books. As new books are added to the collection (i.e. critical data) they need to be categorized and stored correctly. As books decrease in demand, they are filed away to an archive (i.e. important data). An ILM system would automatically categorize and store the new data books accordingly, as well as re-shelve the low-demand books elsewhere, therefore negating the need for such time-consuming management.
Where Does A Problem Arise With Backup?
Enterprises are recognizing through media “hype” surrounding ILM that it is something worth investing in, and are quite rightly looking to this new concept to improve the efficiency of their data storage management. But in doing so, enterprises can forget to take into account their existing back-up system and fail to ensure that the stored data isn’t duplicated.
The typical architecture of a back-up system saves files from primary (critical status) storage on a low-cost disk or tape on a daily basis. If one given file remains critical, this frequent backing up remains in process.
The ILM archiving of data is distinct from back-up operations as ILM archiving moves the operational, non-critical data into long-term storage, whereas backup protects critical data before it can be archived.
Back-up systems that are not ILM-aware will continue to store backed up files on tape or secondary disks regardless of the data already archived elsewhere. This is an important oversight as both sets of data must now be managed, incurring an increase in costs and reduction in efficiency. The result is a lower return on ILM investment than the IT directors would have expected.
Referring back to the library analogy, this duplication problem could occur if a library decided to ensure that its bestseller books are always available for borrowing and made copies of a bestseller book each time it was loaned out. The benefit of this is that the book is always available. However, once the book is no longer a bestseller (no longer critical data) and all the copies have been returned, the librarian would have to ensure there is space on the archive shelves for all these duplications. Although an important process has been put in place, it has proved costly to the library. In the same way, a non-ILM distributed back-up system can waste valuable storage space.
How To Counteract The Problem
A realistic and efficient solution to this major failing of backup is to implement an ILM-aware backup, such as distributed backup. Distributed backup removes entirely the need for daily backups of critical data onto costly tapes, thereby automatically reducing the level of storage management required by an enterprise.
A distributed back-up system collects the data from the network clients and sends it to offsite disk storage in a compressed and encrypted format. When the data is needed for a restore, the system will retrieve the data as required. The process is fully automated and ensures fast and multiple backup without duplicity. The back-up process is efficient and the user can be assured of achieving the anticipated ROI.
This ILM-aware distributed backup makes efficient use of ILM’s archive pointers by retaining one copy of a file on either backup or secondary storage. The pointers enable the backup to decipher which files have been archived and allow it to automatically remove these excess files from the back-up disks. This improves cost-efficiency by removing the problem of file duplication and uneconomical use of storage space.
ILM-aware distributed backup is able to do this by locating and recognizing a given file’s pointer in the back-up data (received from the client) and automatically searches the back-up disk for the original file, deleting it and saving the pointer.
A librarian could use pointers in the same way in order to solve the problem of having to store multiple copies of a bestseller each time one is made. A stamp (a pointer), for example, on the original copy would automatically tell the librarian that any other returned copies of the same book are not this original. The librarian can then discard these excess versions each time they are returned to the library so that they don’t have to find storage space on the shelf for more than one copy. The library’s indexing system will automatically detect the stamp on the original book and ensure that it is shelved accordingly.
This system means that current data in primary storage is backed up to disk, minimizing disk size and cost. Distributed backup results in faster, more frequent backups and simpler restore operations, while reducing hardware and storage costs and the necessity for daily administration.
It is important to realize that the life of a backup-file is separate and distinct at whatever stage of life it is at: from when it is born; to when it is kept on different tiers of storage media; to when the backup-file is deleted.
Eran Farajun is executive vice president for Asigra Inc., the multi-site backup/recovery specialist. His role at Asigra includes marketing and strategic business development. Farajun holds a law degree from the University of Sheffield in the UK.