Availability management is the process of measuring and identifying the available levels of IT capacity and resources which can be used in service level reviews with clientele. Prior to the inception of service, the quantity and nature of resources must be defined within a Service Level Agreement (SLA).
Many facets of the proposed service are covered in the SLA. First off, it must be decided as to what exactly is included within the agreed-upon service. Is it simple hard drive access? Internet usage? Video conference capability? Usually in direct relation to this is the cost of the service – you pay for what you get. And if you don’t get it, then what are the repercussions for the service provider?
Capacity is an important consideration, as there are times when extensive numbers of reports might be generated, or times where an unusually large number of users will need the system. Exactly how much processing power or bandwidth is being allotted? And for that matter, when are these resources being allotted? Are they only available during certain hours? Is there a lag in the response time for a given function?
What happens if something goes wrong? If it’s something minor, who can a user call? There is always the possibility that incidents may occur, so the SLA must outline what the expected response and resolution times would be. As well, it must be able to give an estimate as to the number or frequency of these occurrences.
If a problem of a larger scale occurs, there must also be a contingency plan. Where is the contingency site? What documentation would be necessary for restoration, and where is it located? Will any third parties need to be involved in the process?
Most of the time, availability is calculated using a model comprised of the availability ratio and tools like fault tree analysis. The calculation also usually includes the following:
- Serviceability – In instances where a service is performed by a third party organization, Serviceability is the predicted availability of any given piece of the system.
- Recoverability – The expected time to repair a system and restore it to a functional state after a failure.
- Security – The ability of the system to withstand attempts at intrusion from outside sources or from internal sources that do not have proper access rights.
- Resilience – The ability of an IT network to avoid or withstand failure.
- Reliability – The period of time for which a service is expected to be performed without failure, under normal circumstances.
- Maintainability – The ease of maintenance for a given component, whether said maintenance is in response to a failure, or to prevent failure.
These availability management criteria, and the targets set in their regard, help organizations to sustain the availability of their IT services, without incurring unjustifiably high costs. The availability plan which is created is then put into use, and is monitored to ensure that both availability and maintenance obligations are being met. In this manner, both the supplier and the consumer have a much better picture of what their getting themselves into before making the deal.
