The reliability of the hardware and software can also be verified from customer references and industry analysts. Beyond that, you should consider performing what I call an empirical component reliability analysis. This requires the following steps:
- Review and analyze problem management logs.
- Review and analyze supplier logs.
- Acquire feedback from operations personnel.
- Acquire feedback from support personnel.
- Acquire feedback from supplier repair personnel.
- Compare experiences with other shops.
- Study reports from industry analysts.
Repairability is the relative ease with which service technicians can resolve or replace failing components. Two common metrics used to evaluate this trait are how long it takes to do the actual repair and how often the repair work needs to be repeated. In more sophisticated systems, this can be done from remote diagnostic centers, where failures are detected and circumvented and arrangements are made for permanent resolution with little or no involvement of operations personnel. Recoverability
Recoverability refers to the ability to overcome a momentary failure in such a way that there is no impact on end-user availability. It could be as small as a portion of main memory recovering from a single-bit memory error, and as large as having an entire server system switch over to its standby system with no loss of data or transactions. Recoverability also includes retries of attempted reads and writes out to disk or tape, as well as the retrying of transmissions down network lines. Responsiveness
Responsiveness is the sense of urgency all people involved with high availability need to exhibit. This includes having well-trained suppliers and in-house support personnel who can respond to problems quickly and efficiently. It also pertains to how quickly the automated recovery of resources, such as disks or servers, can be enacted. Robustness
The final characteristic of high availability is robustness, which describes the overall design of the availability process. A robust process will be able to withstand a variety of forces -- both internal and external -- that could easily disrupt and undermine availability in a weaker environment. Robustness puts a high premium on documentation and training to withstand technical changes as they relate to platforms, products, services, and customers; personnel changes as they relate to turnover, expansion, and rotation; and business changes as they relate to new direction, acquisitions, and mergers. Understanding and applying these seven characteristics of high availability can help transform the continuous uptime of your infrastructure into what may be the most significant R of all, a reality.
Tech Update forum. Find out what's where in the new Tech Update with our
Guided Tour. Let the editors know what you think in the
Mailroom.






