Tuesday, July 14, 2009

Network Troubleshooting: IP SLA+ WMI = Better Web Services

Why is the network so slow? I’m sure you’ve never heard this complaint before :) Diagnosing the problem isn’t always easy with so many possible culprits. You can start by running down the network troubleshooting checklist:
The DNS service?
The web server?
The WAN link?

IP SLA and WMI information is critical to diagnosing potential network problems. For most Cisco devices, IP SLA can give you performance information for the connectivity layers of a net-centric service like a web application or VoIP. In Microsoft environments, WMI can do the same for the application/server/desktop layer. Combining WMI with IP SLA provides performance information about both layers and gives an end-to-end view of your web application or other net-centered service to most efficiently troubleshoot any issues.

Using IP SLA to Access the User Experience
IP SLA (Internet Protocol Service Level Agreements) is embedded in the Cisco IOS (Internet Operating System) for most Cisco routers and switches. IP SLA operations can measure delay (round trip time), jitter, packet loss, connectivity, voice quality scores, and many other key metrics for monitoring and troubleshooting network elements.

Additionally, threshold levels can be set for most metrics. When a metric crosses a threshold level, IP SLA sends an SNMP trap to the specified IP addresses.

You can configure an IP SLA HTTP operation to monitor the overall user experience for the “connectivity layer” of a web application (or any other net-centered application such as email, VoIP or videoconferencing). This operation uses a synthetic web transaction to measure the total round trip time (RTT) to perform a DNS query, establish a TCP connection to the HTTP service, and retrieve the web site’s home page. By configuring the HTTP operation on the LAN switch closest to users, the total RTT (or latency) is an accurate measure of the users’ experience (as opposed to measuring RTT from a central network management server).

Next, configure an IP SLA ICMP Echo operation to monitor RTT between the switch on the user LAN and the switch to which the web server is connected. This way, if the HTTP operation indicates the web transaction is slow or unresponsive, you can check the WAN RTT between the switches to see whether the problem is related to the WAN link or something on the web server.

Watching the Applications and Servers: Adding WMI
WMI (Windows Management Instrumentation) is an instrumentation tool similar to IP SLA that Microsoft has created for its products. WMI provides thousands of performance metrics for applications such as MS Exchange and MS SQL Server, as well as for server hardware and operating system components.

Microsoft has a built-in performance administration tool for monitoring WMI data for applications and servers. Using the tool you can view each server’s CPU utilization, physical memory and free disk space. Each of these sub systems is critical to the server’s performance regardless of the application running. Lack of memory, CPU cycles and low disk space are common causes of slowdowns on a server. You’ll have to go into each server to view the individual performance counters or you can use network management software to simplify the process by collecting any of the thousands of available WMI counters from across multiple servers.

Getting the End-To-End View
An end-to-end view of the network will really help you troubleshoot network problems much faster and avoid the common complaints you often hear. To get an end-to-end view consider network management software such as dopplerVUE that integrates fault and performance data from a variety of sources, including SNMP, syslog, WMI and IP SLA, you can integrate metrics from both layers of a web service into a single end-to-end dashboard view. Using dopplerVUE’s drag-and-drop interface, you can quickly create an integrated view of both layers of the service without having to shift between tools or viewers (screenshot below).

1 comment: