Get real with high availability
Published: 06 Jan 2003 13:59 GMT
Too many IT organisations focus on providing "high availability" according to the IT department's definition. Instead, it's the end user who should determine whether the system is really "available." Here are some straightforward suggestions on how to define realistic availability requirements -- and avoid the consequences of unacceptable availability levels.
The first step in planning your availability is to discover your users' true requirements for availability, and for IT services in general. This requires you to consult with as many users as possible, making sure that you at least consult with all users of critical applications. The initial response of most users is that the system must be available all the time. Of course, you need to explain that the cost for providing system availability gets higher and higher as more availability is needed. You also need to explain that these costs will be passed on to users somehow, either directly or indirectly.
The service level agreement
These consultations with users form the basis of a service level agreement between the provider of IT services and the users. You can choose to limit yourself to a simple agreement that covers just system availability, or you can expand the agreement to include response time, help desk availability, new feature request turnaround time, and many other performance and quality issues. If you're starting from scratch, I recommend including just the system availability portion. Then, as the system becomes more stable and your IT organisation matures, you can expand on that agreement. This approach has many benefits:
- The users don't expect too much too soon. The final judges of the IT organisation's performance are the users, so it's crucial to manage their expectations.
- It buys the IT organisation time to improve on services. This is an opportunity for the IT organisation to be one step ahead of user requirements. It gives the organisation a better feel for the resource demands associated with meeting availability requirements, and it allows for better planning.
- It allows for a less demanding agreement. Since users know that the agreement will be improved later, they're more willing to settle for a realistic short-term target.
Never commit to something you know you can't achieve. Agree on a target that you can achieve in the short term, and establish a timetable for achieving higher system availability in the future. Pilot the system availability target internally within the IT organisation or with one small department. Once you've demonstrated that you can meet your target, roll out the new service level standards throughout the rest of the organisation.
Helping users identify their availability requirements
Ask users the following questions to help identify their availability requirements:
What are your scheduled operations? What times of the day and days of the week do you expect to be using the system or application? The answers to these questions help you identify times when your system or application must be available. Normally, the responses coincide with users' regular working hours. For example, users may primarily work with an application from 8:00 A.M. to 5:00 P.M. Monday to Friday. However, some users want to be able to access the system for overtime work. Depending on the number of users who access the system during off hours, you can choose to include those times in your normal operating hours. Alternatively, you can set up a procedure for users to request off-hours system availability at least three days in advance.
When external users or customers access a system, its operating hours are often extended well beyond the normal business hours. This is especially true with online banking, Internet services, e-commerce systems, and other essential utilities such as electricity, water, and communications. Users of these systems usually demand availability 24 hours a day, 7 days a week, or as close as possible.
How often can you tolerate system outages during the times that you're using the system or application? Your goal is to understand the impact on users if the system becomes unavailable when it's scheduled to be available. For example, a user may say that he can afford only two outages a month. This answer also tells you whether you can ever schedule an outage during times when the system is committed to be available. You may want to do so for maintenance, upgrades, or other housekeeping purposes. For instance, a system that should be online 24 hours a day, 7 days a week may still require a scheduled downtime at midnight to perform full backups.
How long can an outage last if one does occur? This question helps identify how long the user is willing to wait for the restoration of the system during an outage, or to what extent outages can be tolerated without severely impacting the business. For example, a user may say that any outage can only last for up to a maximum of three hours. Often, a user can tolerate longer outages if they're scheduled.












