Networked Storage Simplified
From the start, we will agree to not lead into a discussion about storage networks using acronyms without first explaining each one. And secondly, we will further unmask the “storage mystique” by simply categorizing all of the storage implementation options into one of two areas.
Whether it is Network Attached Storage (NAS) or a Storage Area Network (SAN), Fibre Channel (FC) or Internet Protocol (IP), the only alternative to Direct Attached Storage is Networked Storage.
A Brief History Of Storage Connectivity Options
In the beginning, the earth was dark and without form. Shortly thereafter, mainframes dominated the planet and storage was handled by simply plugging disk and tape subsystems into a mainframe channel. Eventually, open systems such as UNIX, OS/2, Novell, and Windows NT began getting deployed in great numbers, and a new high performance interface, named the Small Computer Systems Interface (SCSI), was developed to deal with the massive 140 Megabyte (MB) disk drives of the day. Initially, SCSI-1 had a meager 5 MB/Second throughput rate, but over time, SCSI throughput doubled with each new version, and we now have SCSI throughput speeds of 160 MB/Second. The concept of attaching the storage device directly to the server is often referred to as Direct Attached Storage (DAS) or Server Attached Storage (SAS)
So now you are probably wondering, “Heck, that’s faster than storage networks. Why in the world mess with putting storage on the network in the first place?” An excellent question indeed.
You see, direct attached SCSI or DAS has significant limitations. There are limits to the number of devices that can be attached to a SCSI Host Bus Adapter (commonly referred to as an HBA); 6-15 devices was the maximum. There is also a limit as to the distance that a SCSI device can be separated from its server (6-25 meters). Deploying a DAS device also meant that in order to access it, you had to attach the storage unit (e.g., RAID, disk jukebox or tape library) to the server that was managing that device, which can be an administrative nightmare. There had to be a better way.
Enter Network Attached Storage
In 1987, Auspex systems introduced the world’s first Network Attached Storage (NAS) server, a high-powered, thin file server with large storage capacity for the growing demand of networked users sharing files. Taking advantage of the growing popularity of Sun Microsystems Network Files System (NFS), Auspex offered companies the ability to place storage directly onto the network where it could easily be shared by the users without attaching to a general purpose server. The response was dramatic, and the first networked storage architecture was firmly ensconced in the vernacular of the storage market.
The fundamentals of NAS have changed little over the years. There is now support for Microsoft’s Common Internet File Systems (CIFS) and the Network Data Management Protocol (NDMP) is now used to move data to a backup device, but the basics are the same. A NAS solution is essentially an appliance that consists of a special purpose operating system and processor that is optimized to serve and store data at the file-level across a TCP/IP network.
Storage Area Networking
SCSI-based storage and NAS-based configurations are both important ways of bringing storage to the network, but they are best utilized in situations where there is a relatively low volume of data traversing the network. This is because the movement of large amounts of data files between the server and the storage device can gobble up available network bandwidth, and cause degradation of LAN performance. In short, the storage-to-server file transfers hog the network’s pipeline, shutting out or limiting its availability to users.
Large enterprises that want the ability to store and manage vast amounts of information while maintaining an overall high-performance network environment now have another option: the Storage Area Network (SAN).
In a SAN environment, storage devices such as tape libraries and RAID arrays can be connected to a storage switch and can communicate with servers executing on different platforms. These communications paths between server(s) and storage device(s) are via a high-speed interconnection, such as Fibre Channel (FC), or Internet Protocol (IP)-based approaches such as Internet SCSI (iSCSI) or Storage over IP (SoIPÔ). These setups allow for any-to-any communication among all devices on the SAN. It also provides alternative paths from server to storage device. In other words, if a particular server is unavailable, another server on the SAN can access the storage device. A SAN also makes it possible to mirror data, making multiple copies available. The high-speed interconnection that links servers and storage devices essentially creates a separate, external network that’s connected to the LAN but acts as an independent network.
There are a number of advantages to SANs. SANs allow for the addition of bandwidth without burdening the messaging network, or LAN. SANs also make it easier to perform online backups without users feeling the bandwidth pinch. SANs also provide a method for scaling up storage capacity without interrupting network operations.
As if things are not confusing enough, there is a new storage connection interface that is being developed, InfiniBand. InfiniBand is not just a fancier storage area network; in fact, the InfiniBand documentation stresses that it is a system area network. It supports not only storage devices but also other system peripherals, including input, video, graphics, and output devices. InfiniBand merges both storage area networks and system area networks and gets the PCI bus out of the way. Computers will still have an internal path to memory for communication within the box, but InfiniBand interfaces talk directly to the memory controller, bypassing the PCI bus. This is the same principle as the old mainframe’s Direct Memory Access (DMA) bus. However, the end nodes in an InfiniBand network can be computers, routers, or I/O devices (such as SCSI disks, Fibre Channel networks, or even video boards).
InfiniBand grew out of two separate initiatives aimed at eliminating the current limitations of the PCI bus. Intel announced Next-Generation I/O (NGIO) in 1998. Compaq Computer, Hewlett-Packard, IBM, and 3Com developed a competing standard called Future I/O. The two standards were remarkably similar: both had a switched fabric, channel-based communication bypassing the traditional I/O bus. In fact, in the early stages, the preliminary designs were difficult to distinguish from one another. In 1999, the two groups got together and decided to merge their proposals into System I/O, which became InfiniBand. InfiniBand appears to be a general solution, combining aspects of storage area networks, system area networks, and I/O buses. Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun lead the InfiniBand Trade Association.
Positioning Storage Options
In order to position the various technologies, we need to expand on the storage model in figure 1.
Direct Attached Storage
Direct Attached Storage (DAS or SAS) in the open systems market is currently limited to two options. Option 1 is good old SCSI; the second choice is Fibre Channel. But wait! Isn’t fibre channel synonymous with SAN? Not in a direct-attached, point-to-point configuration, in figure 2.
So, are server-attached storage configurations still viable? Absolutely! In situations that require high speed, network free, “server-to-storage” access, DAS is still an excellent alternative. Another place DAS makes sense is for sites that are budget constrained and do not accept the lower Total Cost of Ownership (TCO) and Return on Investment (ROI) that networked storage offers. DAS is also a good choice for remote locations that have a small number of servers with light user loads. Finally, some peripherals such as tape drives do not offer an FC interface, so unless a SCSI-to-Fibre Channel router is purchased, DAS may be the only viable option.
So when do you use SCSI vs. Fibre Channel as the interface to a DAS configuration? The answers depend on your technology game plan.
Some sites are slow adopters of technology, and they are most comfortable with doing things the “tried and true” way they have always been done. These are generally smaller organizations or departments, but may have the need for some large capacity storage. In these instances a SCSI based solution is best. Fibre Channel DAS, on the other hand, makes sense for customers that have a SAN in mind for the future and want to ease into the technology slowly. Fibre Channel is also the option of choice for attaching multiple servers to a shared enterprise disk RAID system in a multi-hosted point-to-point configuration. Figure 3 is an example of this design.
Whether SCSI or FC interfaces are employed, a StorNet study found that about 85 percent of all storage deployed at our customers is DAS.
Once a potential user has accepted the limitations of a DAS storage model, and decides to go with a networked storage alternative, things get interesting in a hurry. Once again there are currently only two methods of designing and configuring a storage network, NAS or SAN. Which one is best depends on the organization or site’s needs. Regardless, the benefits of implementing a storage network are immediate and obvious.
Put another way, the options for implementing storage networks are really quite simple. A potential user can choose to use either a file-level based, shared storage resource such as a NAS solution, or a high performance, switched fabric block-level approach such as SAN. Figure 4 will expand on these views of networked storage.
If we then look at storage networks as being inclusive of both NAS and SAN, why do storage vendors make it an either/or decision? Therein lies the question that has potential end-users scratching their heads and wondering what to do next. And rather than make a decision, many companies simply continue buying DAS from their server vendors. This is unfortunate, as they lose the opportunity to implement an improved enterprise storage solution.
There is, however, an alternative. An experienced storage solutions and services integrator can be consulted – even if it is only for the service of an assessment and design document to demonstrate the pros and cons of the different storage options and configuration choices.
Network Attached Storage
The standards for NAS are strong standards indeed. There are two networking standards for accessing networked attached data. The Network File System (NFS) is the de facto standard for the UNIX community, and the Common Internet File System (CIFS) is the standard for all flavors of the Windows Operating System. NAS devices provide the ability to support true file sharing between NFS and CIFS servers.
In a NAS configuration, the actual file system is resident on the NAS device itself, freeing up the CPU of the application server from having to manage the I/O associated with a file system. In a nutshell, NAS servers off-load all of the functions of organizing and accessing all directories and managing data on disk, as well as managing the cache. NAS can also be employed to consolidate file-serving applications from distributed UNIX and Windows NT servers to a single NAS platform.
Another ideal application for NAS is in technical engineering applications such as geoseismic or pharmaceutical applications. These are environments where multiple engineers or researchers may simultaneously access a large file or group of files. Software development, document imaging, and CAD/CAM design are all good places to recommend a NAS solution.
In summary, NAS is the best choice for UNIX and Windows NT data sharing applications, consolidated file service applications, technical and scientific applications, and other file-based storage needs.
The appliance model of NAS (a.k.a. filer)
An appliance is a device that performs a single function very well. A popular and accelerating trend in networking has been to use appliances instead of general-purpose computers to provide common services. For instance, special-purpose routers from companies like Cisco Systems and Nortel
Networks have almost entirely replaced general-purpose computers for packet routing, even though general-purpose computers originally handled all routing functions. Similarly, modern printers are more likely to plug into the network than into a general-purpose computer. Other examples of network appliances include network terminal concentrators, network FAX servers, and network backup servers.
Appliances have been successful because they are easier to use, more reliable, and have better price/performance benefits than general-purpose computers. These benefits arise because appliances can be optimized specifically for their single function, without the compromises necessary to meet the many conflicting requirements of a general-purpose system.
For example, a typical NAS appliance may have less than 50 instructions within its Operating System and not include any proprietary hardware but instead use off-the-shelf popular components. Thus, NAS filers can be quickly added to existing networks.
Additional individual appliance “boxes” can be added as necessary according to storage needs, without the hassles of having to upgrade the general purpose server or DAS; this is another reason why network administrators have embraced the NAS concept.
Network Appliance is the current leader in NAS and their network storage appliance (a.k.a. filer) brings the advantages of an appliance to the Windows and UNIX market. Filers cannot run applications and do not run a general-purpose operating system like UNIX or Windows NT. Filers feature ease of use and price/performance, and upward scalability that cannot be matched.
However, a filer is designed to have a single brain (CPU) and large amounts of disk capacity behind it. This approach has created some scalability and failure issues that have been addressed with a newer clustered NAS approach.
Clustered NAS is similar to the filer approach in most respects except one, scalability. Rather than a single CPU with large amounts of disk space behind it, clustered NAS offers relatively small chunks of storage capacity, each with its own processor. These “storage blocks” can then be connected together much like Lego blocks (see figure 5).
The benefit of this approach is that as capacity is added to the NAS pool, incremental processing, cache, and connectivity is also added. The end result is high scalability without sacrificing performance. This approach is quickly gaining enthusiastic support.
Storage Area Networks
Often seen as competing technologies, in reality, SAN and NAS complement each other very well to provide access to different types of data. SANs are optimized for high-volume, block-oriented data transfers, while NAS is designed to provide data access at a file level. Both technologies satisfy the need to remove direct storage-to-server connections to facilitate more flexible storage access.
A storage area network (SAN) is a high-performance subnet, based on fibre channel or IP, whose primary purpose is the transfer of data between computer systems and storage devices, and among multiple storage elements (e.g., direct disk-to-tape transfer). One can think of a SAN as an extended and shared storage bus. A SAN consists of a communication infrastructure, which provides physical connections, and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust. While there is debate among industry insiders, a switch is generally required in the configuration to qualify as a SAN. Until recently, the only viable means of switching data paths to a storage device was through a Fibre Channel switch. However, the emergence of IP Storage protocols such as iSCSI and SoIP has extended this capability to traditional IP networking switches as well.
Because SANs are optimized to transfer large blocks of data between servers and storage devices, they are ideal for applications such as:
• Mission-critical database applications – where predictable response time, availability, and scalability are essential
• Centralized storage backups – where performance, data integrity, and reliability ensure that critical data is secure
• High-availability and application failover environments – to ensure very high levels of application availability at reduced costs
• Scaleable storage virtualization – which detaches storage from direct host attachments and enables dynamic storage allocation from a centralized pool
• Improved disaster tolerance – which provides high performance over extended distance between host server and connected devices
Fibre Channel vs. IP Networked SAN
Once the decision has been made to implement a SAN to reap the maximum performance benefits of networked storage, the next step is to decide whether it will be an FC-based SAN, an IP-based SAN, or a combination of both. Note that the term IP Storage is used as opposed to iSCSI or SOIP. This is because there is still no standard in place that precludes the use of either of these protocol choices, although iSCSI has the lion’s share of attention in the market today.
As heated as the debate is between the NAS and SAN vendors, it pales in comparison to the rhetoric surrounding which topology is best suited to implement a SAN. Figure 6 shows SAN option choices.
The Case For Fibre Channel
Fibre Channel was designed specially to address server-to-storage interface limitations. At SAN sites, Fibre Channel interfaces are delivering measurable operational benefits not previously possible with standard connections, such as Direct Attached SCSI. For example, by connecting RAID to the backend of a server over a Fibre Channel bus, higher bandwidth results in quicker I/O transfers over longer distances than is possible with a standard interface. The RAID and Fibre Channel combination team up to improve storage subsystem reliability through fault-tolerant storage array operations and redundant pipeline data paths.
In an FC-AL (Arbitrated Loop) based SAN, up to 126 nodes can be connected per loop. Multiple loops can be added as needed. Switched-based SANs provide unlimited scalability. This modular scaling capability provides a sound infrastructure for long-term growth. The fibre communications channel supports multiple protocols and has a current bandwidth limitation of 200 MB/second. The FC interface can sustain this bandwidth up to 10 kilometers. Each storage unit on a FC network is a peer node, allowing for the flexibility of direct storage device-to-storage device communications via either arbitrated loop or switched fabric.
Most Fibre Channel devices are dual ported. Using both ports in a dual-loop configuration provides a redundant path to/from the device, guaranteeing access should one path fail. This high-availability configuration is ideal for mission-critical applications. FC interfaces provide the performance required to meet an array of bandwidth intensive storage management functions like backup, remote vaulting, and hierarchical storage.
Fibre Channel switches and hubs provide for simplified storage device scalability, hot plugging of storage devices, and isolation between functions. This translates into easily scaleable bandwidth and improved subsystem availability.
The Case For Internet Protocol
Fibre Channel’s shortcomings are that it requires new skill sets to be learned for building and managing the storage component, and the price per SAN port is up to five times that of standard Ethernet IP ports. IP storage developers make use of the existing network infrastructure and capacity, thus eliminating the need for new expertise (training) while holding down additional SAN port costs.
The Internet SCSI (iSCSI) IP protocol stores/retrieves data to/from any SCSI storage device over an Ethernet port, which connects to the existing IP core infrastructure. Alternatively, a second port, which is Fibre Channel wired, could be used to connect directly to a storage device (or to a switch for connectivity to a storage device). iSCSI is economical because it maximizes the existing site networking infrastructure and requires no additional storage management training.
Storage over IP (SoIP), a competing IP protocol, combines wire-speed Gigabit Ethernet performance with support for SCSI, Fibre Channel, iSCSI, and all types of NAS storage interfaces. This interoperability enables the building of standards-based, manageable IP storage networks. SoIP reduces the CPU overhead commonly associated with iSCSI’s creating and reassembling of the TCP/IP packets by replacing the server-based drivers with an in-switch conversion process.
The foregoing storage discussion is but the tip of the iceberg when making an enterprise data storage and protection storage management decision. The storage networking debate is not an either/or issue. The answer is “it depends.” It depends on your company and site, your application, and your budget. Figure 7 shows the storage options that are available, and is an excellent illustration for presenting storage alternatives to your organization.
Derek Gamrudt is the chief technology officer at StorNet, Inc. (www.stornet.com). Gamradt joined the company in 1990 and has extensive knowledge of all aspects of storage management.