
What is Driving the Data Explosion?
The Internet, comprehensive application software, and new computing and storage technologies have made it easier to create, collect, and store all types of data. Data can be managed and stored in structured relational databases; in semi-structured file systems, such as e-mail; and as unstructured fixed content, like documents and graphic files. Companies rely on this enterprise data to improve decision-making and to gain a competitive advantage; they must also retain data to comply with government-mandated retention requirements.
Data Growth Explodes Across Industries
Comprehensive CRM, ERP and mission-critical applications capture, create and process increasing volumes of data to keep businesses operating and profitable. Companies depend on the availability and fast access to this data 24x7x365. Today it would be difficult to find an organization in any industry – insurance, financial services, healthcare, pharmaceuticals, telecommunications, utilities, retail, and manufacturing, among others – that has not collected huge quantities of data for some or all of its business processes.
The federal government is another prime example. Thousands of tax returns enter into Internal Revenue Service systems each year, and each year the data grows as the population filing those returns increases.
In other areas of both federal and local governments, enormous amounts of data are collected daily in online databases to support thousands of users across government agencies.
Similarly, the financial sector, ruled by the Securities and Exchange Commission (SEC), is coping with a dramatic increase data associated with stocks, bonds, and mutual fund transactions. Various SEC rulings necessitate audit trails, record archiving, and maintain more stringent reporting regulations (to both the SEC and the customer), increasing the growth in data output and retention requirements.
Data Retention Requirements Intensify Demand for Storage
Compounding the data growth challenges are new laws requiring that different types of data across industries be saved for longer periods of time. Each type of data has unique data retention and compliance challenges, including costs for storage and easy access. Companies may need to protect corporate interests by retrieving historical financial transactions to satisfy audit inquiries and resolve claims. In other cases, corporate policy or government regulations dictate that data be accessible for years after it is collected and after the original computer systems have been retired. These new requirements include strict penalties for non-compliance, driving the demand for cost-effective data management and storage solutions.
Data Duplication Multiplies Data Growth Numbers
Data duplication also contributes to the data growth statistics. It is not uncommon for organizations to maintain several back-up copies of critical data or to implement mirrored databases that provide assurance against data loss. Disaster recovery plans often require data duplication in order to store critical data in an alternate location. Data duplication also exists in application development and testing environments, where there are often several clones of the original production database. It is important to consider that as data is duplicated, storage and maintenance costs increase proportionally. Unless companies can find more cost effective ways to manage the source copy of the data and the duplication, the multiplicity of costs will continue to be prohibitive.
Why is Database Growth a Problem?
Since most of today’s business-critical applications depend on large relational databases, it is absolutely necessary to control continuous database growth; yet, relational data is the most complex type of data to manage. Unchecked database growth has an impact on both disaster recovery plans and daily business operations. From a recovery perspective, database growth results in increased costs and slower recovery time. Companies deploy various strategies for implementing disaster recovery plans to ensure that their business can recover from a disaster and resume operations in an acceptable time period.
Key factors affecting disaster recovery plans include the recovery time objective (RTO), functions needed to operate the business in emergency mode, available funds and government regulations. Some companies deploy a mirrored site available in near real-time, while others deploy a recovery plan at an alternate location. When a “hot site” is deployed, the secondary location is configured to support all business operations. Complete copies of the data exist at the primary and secondary locations, and these databases must be managed at both locations, doubling costs in some cases. With alternate sites, the RTO is critical, and any time wasted on non-essential tasks that prevent a business from resuming operations, could cost millions of dollars per hour in downtime. Why recover an entire database containing years of rarely accessed historical data that directly affects the RTO?
From a day-to-day perspective, as database size increases, the performance of mission-critical applications deteriorates. Larger databases take longer to load, unload, search, reorganize, index, and optimize. Response time slows. Access to decision-making information becomes more difficult. Service levels decline. Larger databases require more maintenance. The corporate data explosion has stretched back-up and reorganization windows to the point where database and application availability are seriously threatened. Without a way to safely remove older, rarely accessed data from production databases, this problem will only worsen over time.
Purchasing capacity upgrades to address performance problems may appear to be cost effective; however, over time, upgrades are needed more often to keep pace with database growth. The demand for high performance is more critical than ever, requiring more frequent and larger increases in capacity to satisfy demand. As a result, IT organizations often spend millions of dollars in hardware and software license fees per year just to expand server, storage, and CPU capacity. And most importantly, this short-term solution does not address the root of the problem.
Advantages of Database Archiving
Database archiving streamlines critical databases by removing rarely accessed data and saving it to an Archive File that can be stored on the most cost-effective medium. Retaining the referential integrity and providing easy access are key requirements when archiving data from a relational database. Companies must ensure that archived data retains its business context and can be quickly researched and easily restored when necessary. In the meantime, application performance, availability, and reliability are greatly improved, and planned increases in capacity can be deferred – with the potential to save millions of dollars.
By separating mission-critical data from non-critical data, many companies can safely reduce the size of overloaded databases by up to 50 percent or more during the initial archive. Significant improvements in application performance and availability are realized immediately. On-going database archiving (daily, weekly or monthly) helps manage database growth and keeps applications operating at peak performance. Response time is faster and access to current business data is easier. Service levels improve, and productivity is enhanced. Expensive upgrades and maintenance fees can be deferred or eliminated, with the potential to save millions of dollars. Most importantly, companies can satisfy data retention requirements and still retain easy access to the archived data when needed.
By keeping databases streamlined, database archiving can reduce the time and resources needed to rebuild the database when disaster strikes (see Figure 1 on page 71). IT organizations can maintain databases at a size that allows them to meet their disaster recovery service level agreements. In the event of a disaster, the key strategy is to get mission critical systems operational as quickly as possible. With overloaded databases, all the data (including years of rarely accessed data) must be restored just to get business critical data back online. Having to restore large volumes of historical data can slow the recovery process by hours or even days. Now with streamlined databases, a company can recover business-critical data first, to quickly resume operations. Archived data can be restored on an as-needed basis. This approach dramatically reduces the RTO at a time when every second counts. Streamlined databases reduce the total cost and resources required when companies deploy a “hot site” recovery plan.
According to the Meta Group, increasing data replication requirements coupled with escalating data growth, can negatively impact disaster recovery processes and recovery times. With routine database archiving, the primary database contains much less data because the archived data has been removed. A smaller, primary production database means a smaller production database at a “fail-over hot site.” As a result, companies can reduce the total “fail-over hot site” requirements. In addition, many disaster recovery plans have service level agreements (SLAs) requiring that the recovery plan be completed within a specified timeframe. Database archiving will help IT organizations meet these requirements by reducing the time needed to complete the recovery of critical systems.
Summary
Database archiving keeps operational databases streamlined and enhances disaster recovery plans, enabling companies to recover operational data and rebuild alternate databases in much less time. After the critical systems are operational, a phased recovery plan can be used to restore archived data, as needed based on the business value of the data.
Jim Lee is the vice president of product marketing at Princeton Softech. With more than 15 years of experience in application development and consulting, Lee’s background includes application development, short- and long-term product planning, risk assessment, cost-benefit analysis, customer consulting, and evaluating emerging technologies.




