It's been over a year since we took a look at how organisations can create more scalable data storage solutions. Since then, most organisations' needs for storage space have expanded due to more and larger files, including the increasing use of media files, more sophisticated document and presentation files, larger and more complex databases, and so forth.
In addition, government regulations and court precedents are forcing companies to retain more data and to do it for longer periods of time. For example, changes to US federal rules that went into effect earlier this month require parties involved in federal litigation to provide electronic information as part of the discovery process. This means once a lawsuit has been filed (or perhaps even anticipated), deleting or overwriting files that pertain to it will be as illegal as shredding paper documents.
But there's more to it than storage space. A truly scalable storage solution requires not just the ability to store more and more data, but also the ability to organise and find that data; otherwise the stored data is not very useful. The security of stored data is another big concern — especially in regulated industries where HIPAA, the GLB Act and the like require that the privacy of certain information be protected. Last, but certainly not least, reliability is of utmost importance. Loss of important data can result in irreparable harm to your organisation.
Creating a fast, user-friendly, reliable, secure and scalable storage strategy is getting more difficult all the time, but luckily, new technologies are being developed to address these challenges. Let's take a look at some scalable storage solutions for organisations of all sizes.
Scalability and expandability
The first element of a scalable storage strategy is easy expandability, so that as the organisation and/or amount of data increases, you can expand storage space to meet your needs without interfering with the ability of users to access previously stored data.
This can be accomplished in different ways, depending on your budget, the size of your organisation and the amount of data you need to store. Many small companies can still get by with a file server with SATA or SCSI RAID arrays. Disk capacity has increased significantly over the last few years, with a corresponding drop in cost per gigabyte. "Hot swappable" disks allow you to replace or add a disk while the rest of the system goes on functioning. SATA hot swap RAID cases can be purchased starting at a few hundred pounds.
As the organisation's needs grow, network-attached storage (NAS) becomes a more viable solution. Using a fibre channel connection allows you to place the storage subsystem a greater distance from the server. An example is the Aberdeen XDAS NAS system.
The next step up is a storage area network (SAN), which is a separate back-end network of storage devices. SANs are designed to provide better performance and disk utilisation.
Enterprises with huge amounts of data can implement extremely high capacity and high bandwidth systems such as those offered by Sun's StorageTek systems, which can provide up to 330 TB of storage with over eleven hundred 300 GB disk drives.
Scalability and performance
Faster access to data results in increased productivity, and better performance becomes more important as files grow larger. New technologies address this need for speed.
Fibre channel is the traditional answer to the bandwidth problem, and is typically used by SANs. More recently, SANs are operating over gigabit Ethernet, or using SCSI over IP for high performance communications of the network devices, methods traditionally used by NAS. The lines between the two are thus becoming more blurred as time goes on.
The Sun StorageTek system mentioned earlier supports an aggregate data bandwidth of 68 GB per second for super high performance.
Scalability and reliability
As the storage system grows, your fault tolerance and backup solutions must keep up. Remember that fault tolerance and backup are two separate issues and you should not use fault tolerance as a substitute for backup. Fault tolerance should be built into any well designed scalable storage solution. RAID is the most common disk fault tolerance technology. Most — but not all — levels of RAID provide fault tolerance by providing a complete copy (mirroring) of data or by writing parity information that can be used to reconstruct data that's distributed across disks if one or more of those disks should fail.
New NAS systems support advanced RAID configurations such as RAID 6, which is a dual parity scheme that allows for two simultaneous drive failures without data loss or downtime and costs less than RAID 1 (disk mirroring), especially for large capacity drives.
Tape backup, although relatively low cost, is slow and makes it difficult to search for specific files. Disk based backup is thus becoming more popular. Products such as Microsoft's Data Protection Manager, a disk-based backup and recovery server, are making it easier to implement. For more information on how it works, see http://www.microsoft.com/systemcenter/dpm/default.mspx.
Scalability and security
Securing stored data requires a multi-level strategy that includes physical security of the devices on which the data is stored, security of the data residing on a disk (for example, by using file level or disk level encryption), and securing the data as it travels across the network between the server or network storage device where it's stored and the workstation of the user accessing it (for example, by implementing IPsec).
Scalability and usability
Usability depends on mechanisms for organising large amounts of data and finding what you need quickly. Tiered storage architectures (also known as Hierarchical Storage Management or HSM) divide data based on performance, availability, and recovery requirements. Data that's accessed often is stored on fast and readily accessible media, whereas data that's less likely to be needed can be stored on lower cost but cheaper, less easily accessible media.
For example, files can be migrated from high cost, high performance fibre channel SAN devices to slower and less expensive SATA RAID arrays when the need for them is less immediate, then later to cheap tape as they are less likely to be accessed.
Summary
New trends in storage make it easier than ever for companies to expand their data capacity, but scaling up your data storage requires that you also think in terms of scalability of performance, reliability, security, and usability issues as well. The basics of storage management haven't changed a lot over the past year, but new products and technologies have made it easier to address these peripheral issues, thereby making extreme scalability in data storage much more feasible.






