Cloud service-level agreements – Musato Technologies
loader image

We enable business and digital transformation decisions through the delivery of cutting-edge ICT solutions and products...





Get inspired…
  
  
  

Cloud service-level agreements for businesses

Cloud Service-level agreements are supposed to help keep customers and providers happy, but writing one for the cloud means you must set boundaries to account for more players and possibilities.

Dodge the Traps in Creating an SLA

Cloud computing users think service-level agreements are important for the cloud, but they usually don’t have a handle on how to enforce them. Without proper consideration and tools, even a good service-level agreement (SLA) could fail if you don’t know it’s being violated or why.

One of the challenges with cloud computing SLA is that the experience delivered by a cloud application is the sum of the performance of three or more entities. Figuring out which one might be causing a problem can be a challenge, so the first task in creating an SLA decision framework for the cloud is to develop a simple entity map that shows who provides each portion of a cloud service and where their portion transitions into another’s area.

A typical cloud application starts with user-owned facilities such as a mobile device or an entire company network. From this user-supplied piece, the cloud application connects through a wide area network, usually the Internet,
to the cloud provider’s infrastructure.

Some users employ a virtual private network (VPN) for cloud access from fixed sites, and others may have more than one cloud provider, so it’s possible that there will be more than the three standard responsibility zones in your own
cloud.

Cloud applications generate workflows that move across these zones. You’ll want to understand exactly how that movement takes place for each type of cloud application you run. It should be possible for you to identify, based on
the name of an application, how workflows to fulfill users’ needs. That workflow is the basis for your SLA decisions.

To establish good SLA management and policy decisions, you need to measure the behavior of each of the players in your cloud. You should always start with measuring response time and then measure conditions at the boundary points of different zones.

End-to-end response time measurement is best done at the user connection point so you can read the full response time. In some cases, this means building response time monitoring into the application itself; however, the device’s
TCP/IP software often provides some of that data through a management interface.

For zone-boundary monitoring, some form of traffic or protocol monitoring is hard to beat. These tools put probes, software tools, or hardware elements in the network at various places, and they allow a central management console
to view the traffic flow using deep packet inspection to sort out applications.

One big mistake users make at this point is to focus on monitoring without knowing what’s good or bad. A network management system (NMS) may collect data in a repository naturally (OpenNMS does this, for example).

This data collection allows you to run queries to analyze performance and conditions over time and set baselines for normal behavior as well as thresholds for what you’d consider being SLA violations.

If your management system doesn’t provide a repository, you’ll want to add network analytics tools to gather and correlate management data and set your performance baselines.

Network analytics can be a strong foundation for decisions concerning cloud SLAs. Make sure the tool can add cloud performance data obtained from the cloud management system APIs to network data obtained from your own NMS.

If you have a VPN or a hybrid cloud with a large data center component, it’s even smart to first look at tools from your primary network vendor. These are always helpful in maintaining your own IT and network infrastructure performance and will also help manage cloud SLA decisions.

It all comes down to how SLA errors are detected. A good system has three inputs in the detection process. One is subjective user reports of poor performance, the second is an end-to-end response time problem for one or more applications and the third is a report of a specific problem at a zone boundary.

In all cases, you should first assess the impact of the problem and then target possible contributors to it. Your workflow-zone map will let you see whether there’s a general problem with several applications at a zone boundary or with only one application.

In the former case you probably are experiencing a network or cloud infrastructure problem and in the second a cloud application problem. For the first case, you need to use monitoring tools to examine all the zone borders in the workflows affected to see where the problem is occurring.

That problem usually manifests as a longer delay between two zone boundary points or a loss of packets at a boundary. Your traffic probes will most likely identify either of these faults.

If there’s a problem, the remediation should be treated as a small project, with a project manager and a fixed set of tasks that are usually called the escalation procedure.

Some users even employ simple software project management or fault tracking tools to monitor the process of cloud SLA issues from their detection to their resolution. Fault tracking tools intended for software projects can sometimes be used, but some network analytics tools include at least an option for fault tracking.

Taking an organized approach to a cloud computing SLA and the decisions that come out of its enforcement is critical if the SLA is to be successful. If you start your deliberations with plans to support SLA decisions, you’ll have a better experience overall. Contact Musato Technologies to learn more about our ICT solutions and services.

Leave a Reply