The seven Rs of high availability

ANALYSIS Reliability
The reliability of the hardware and software can also be verified from customer references and industry analysts. Beyond that, you should consider performing what I call an empirical component reliability analysis. This requires the following steps:
  1. Review and analyze problem management logs.
  2. Review and analyze supplier logs.
  3. Acquire feedback from operations personnel.
  4. Acquire feedback from support personnel.
  5. Acquire feedback from supplier repair personnel.
  6. Compare experiences with other shops.
  7. Study reports from industry analysts.
An analysis of problem logs should reveal any unusual patterns of failure. You should study them by supplier, product, using department, time and day of failures, frequency of failures, and time to repair. Suppliers often keep on-site repair logs you can use to conduct a similar analysis. You'll find that feedback from operations personnel can often be candid and revealing as to how components are truly performing. This can especially be the case for off-site operators. For example, they may be doing numerous resets on a particular network component every morning prior to start-up, but they may not bother to log it since it always comes up. Similar conversations with various support personnel such as systems administrators, network administrators, and database administrators may solicit similar revelations. You might think that feedback from repair personnel from suppliers would be biased, but in my experience they can be just as candid and revealing about the true reliability of their products as the people using them. This then becomes another valuable source of information for evaluating component reliability, as is comparing experiences with other shops. Shops that are closely aligned with your own in terms of platforms, configurations, services offered, and customers can be especially helpful. Reports from reputable industry analysts can also be used to predict component reliability. Repairability
Repairability is the relative ease with which service technicians can resolve or replace failing components. Two common metrics used to evaluate this trait are how long it takes to do the actual repair and how often the repair work needs to be repeated. In more sophisticated systems, this can be done from remote diagnostic centers, where failures are detected and circumvented and arrangements are made for permanent resolution with little or no involvement of operations personnel. Recoverability
Recoverability refers to the ability to overcome a momentary failure in such a way that there is no impact on end-user availability. It could be as small as a portion of main memory recovering from a single-bit memory error, and as large as having an entire server system switch over to its standby system with no loss of data or transactions. Recoverability also includes retries of attempted reads and writes out to disk or tape, as well as the retrying of transmissions down network lines. Responsiveness
Responsiveness is the sense of urgency all people involved with high availability need to exhibit. This includes having well-trained suppliers and in-house support personnel who can respond to problems quickly and efficiently. It also pertains to how quickly the automated recovery of resources, such as disks or servers, can be enacted. Robustness
The final characteristic of high availability is robustness, which describes the overall design of the availability process. A robust process will be able to withstand a variety of forces -- both internal and external -- that could easily disrupt and undermine availability in a weaker environment. Robustness puts a high premium on documentation and training to withstand technical changes as they relate to platforms, products, services, and customers; personnel changes as they relate to turnover, expansion, and rotation; and business changes as they relate to new direction, acquisitions, and mergers. Understanding and applying these seven characteristics of high availability can help transform the continuous uptime of your infrastructure into what may be the most significant R of all, a reality.
Have your say instantly in the
Tech Update forum. Find out what's where in the new Tech Update with our
Guided Tour. Let the editors know what you think in the
Mailroom.

Post your comment

In order to post a comment you need to be registered and logged in

Log in or create your ZDNet UK account below

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Membership FAQ

ZDNet UK Live

apexwm

Fedora is the same way as well. The yum update system uses "presto" which shrinks the amount of data needed for download. It's a great system....

3 hours ago by apexwm on Can you believe it - 2765 kB will be freed?
cybfor

Updated ID cards considered for 2012: [zdnet.co.uk] The government is considering introducing a new generation of ID... http://dlvr.it/KpBZ

cybfor

Google, Viacom trade blows in YouTube copyright spat: [zdnet.co.uk] Google and the US media giant Viacom have issued... http://dlvr.it/Knht

CIMITL

Be sure to include an audio option - eg. a beep tone - to intensify and reiterate the action. This will greatly benefit some consumers and give...

4 hours ago by CIMITL
DataSecurityUK

Data disposal is really important to get right. There are standards set by UK and US federal governments to ensure that data is kept secure. If...

5 hours ago by DataSecurityUK
chaycon1

Online Fiber Optic Certification Join a talented group of professionals, who are dedicated to Fiber Optic Networking technology. The online course...

7 hours ago by chaycon1 on BT launches 40Mbps fibre-based broadband
chaycon1

Online Fiber Optic Certification Join a talented group of professionals, who are dedicated to Fiber Optic Networking technology. The online course...

7 hours ago by chaycon1 on Google to build gigabit broadband to the home
J.A. Watson

Hi Dava, I'm glad to hear from you, and glad that you see things from the other side. I think that is the most important point of the whole...

7 hours ago by J.A. Watson on Ubuntu 10.04 (Lucid Lynx) and the Latest Tempest
dava4444

please please please please please please kill that spam bot.

8 hours ago by dava4444 on ZDNet UK: faster, smarter, still IT all the way
253chelisa253

hi

8 hours ago by 253chelisa253 on How security will look in 10 years
lezlow

it is only greedy[microsoft]?

9 hours ago by lezlow on Researchers break into BitLocker
dava4444

it didn't post the link it's 'Ubuntu 10.04 Lucid Lynx Beta-1 First Look' on youtube :) Dava

11 hours ago by dava4444 on Ubuntu 10.04 (Lucid Lynx) and the Latest Tempest
dava4444

Hi James I disagree, Ubuntu needs a GUI update and this one IMO is quite good. your pics show a low res. here's a high res. on YouTube* The...

11 hours ago by dava4444 on Ubuntu 10.04 (Lucid Lynx) and the Latest Tempest
dava4444

Hi any news on the comment bot? knocking me back from my own blog is a bit cheeky lol *Mulder to Scully* "I think it has an agenda.." I know, I...

12 hours ago by dava4444 on ZDNet UK: faster, smarter, still IT all the way
benny boy

if you look at the Brentwood exchange on samknows it servers 21,000 residential propertiesm, Lowestoft serves 31,000! Come on BT sort yourselves...

12 hours ago by benny boy on BT fibre broadband coming to 69 more towns
pbreddit

[programming] H.264 - a sting in the tail http://reddit.com/bfu4q [zdnet.co.uk]

reddit

H.264 - a sting in the tail [programming] 13 points, submitted by zigzag [zdnet.co.uk] http://reddit.com/bfu4q

cybfor

Malware infects second Vodafone HTC phone: [zdnet.co.uk] A second Android-based HTC Magic from Vodafone has been... http://dlvr.it/KhKx

miyabi81

Chatter preview http://www.zdnet.co.uk/news/application-development/2010/03/17/salesforce-opens-up-chatter-developer-preview-40088348/

cybfor

US gov t considers undercover social networking: [zdnet.co.uk] The Obama administration has considered sending... http://dlvr.it/Kh3L

Latest in Processors

Featured white papers

Achieving PCI Compliance for:Privileged Password Management & Remote Vendor Access

For multi-store outlets, including retail, banking, grocery, gas, hospitality, convenience stores and others, reducing (or avoiding) the cost of in-store system support and maintenance while maintaining compliance with PCI and other requirements has become a strategic challenge.

Download now

Web 2.0 Security Threats: How to Protect Your Enterprise Network

Speaker: Dr. Chenxi Wang, Principal Analyst, Security and Risk Management, Forrester Research, Inc. As Enterprises are increasingly connected to the Internet and as hard organizational boundaries are fast disappearing, security professionals are facing fresh challenges in Enterprise computing.

Download now

MindManager - Tutorial for New Users - Short

This tutorial is for new MindManager users and teaches you how to get started, by creating maps, reading maps and organizing your information.

Download now