The Network Zone: September 2009

Tuesday, September 29, 2009

3 Key Steps to Actively Monitoring HSRP…

I recently discussed how to build a resilient network using HSRP/VRRP and as a follow-up, here are a few key steps to actively monitoring HSRP.

With HSRP on our network, there is a good deal of network reliability for end users. As the network engineer, this means when a link fails, end users rarely notice it. The backup link simply handles the load and business continues as usual. Just the way I want it. While my monitoring system does provide an alert to the link down condition, I like to handle these situations as a higher priority, since it has become a single point of failure.

Here are a few tips to actively managing your HSRP implementation:

1. Map out each pair – know when a primary route goes out, what path has been designated as the alternate (if you have many HSRP routes you can combine them onto a single map. The pre-created map makes it easy to find the paired item).

2. Create custom alerts for HSRP interfaces that indicate which path is a primary or secondary HSRP link. The HSRP interfaces need to be treated differently than a switch port to a user workstation due to their critical nature.

3. After service has been restored, review the interface load of the secondary link and evaluate how well it handled the traffic. Use this information to ensure your backup pipes have adequate capacity. This will improve your disaster recovery planning for any future events.

Here are some monitoring screenshots that show my HSRP map and an active alarm.

Figure 1. HSRP MAP (Primary is solid line, Secondary is dashed line)

Figure 2. HSRP Active Alarm – (identifies HSRP link route impacted)

Wednesday, September 23, 2009

Looking to Reduce IT Costs…Optimize Network Traffic

In these days of pinching pennies and saving dimes, the best way to help your organization is to find ways to reduce costs. Do you know which network resources are costing you the most? Answering this question can lead to optimizing network traffic and cutting costs.

For instance, the cost of LAN traffic within your office is usually fairly affordable, however, once the packets hit the WAN, the price tag increases significantly.

You can use Netflow to identify which network resources are adding the most to your monthly bill. Start monitoring the circuits that make up the majority of your high cost list with Netflow and you might find some network efficiencies that lead to savings that have a big financial impact.

To find cost savings, I use a network management tool, dopplerVUE, that has a bandwidth locator that sorts and finds top bandwidth users in networks. dopplerVUE provides Netflow support to give you multiple ways to view traffic in your network.

If you want more details on where all your traffic is going and how to configure Netflow to give you the answers, check out this recent post.

Friday, September 18, 2009

Pre-Release of dopplerVUE 2.0 Now Available

To piggyback on the post last week about HSRP, I’d like to share that the pre-release of dopplerVUE 2.0 is available for download. dopplerVUE 2.0 has exciting new features including discovering and mapping HSRP primary and secondary links, a new interface with graphics for better visibility, improved SNMP table polling and more! The 30-day trial is free, and who doesn’t like free?

Here is the new feature list:

New interface with new graphics for better readability
Manage ANY metric on ANY device with improved SNMP table polling
Discover and map HSRP primary and secondary links
New WMI wizard
Easily create personal workspaces with the new dVUE templates
Distributed architecture to extend the number of managed devices
Improved database and architecture for increased performance

Plus - for distributed enterprise networks dopplerVUE connects with NeuralStar to create a two-tiered, centrally monitored, fully replicated enterprise network management system that provides:
Increased fault tolerance

Improved performance through load balancing
Powerful disaster recovery capabilities
Enhanced scalability

If you decide to check it out and have questions, please don’t hesitate to post here in the blog or send me a message! I’m happy to help with network related questions.

Here are a couple of screen shots of the new dopplerVUE interface:

Thursday, September 17, 2009

Three Easy Techniques for Cutting Alarm Clutter

Have you ever missed (or almost missed) a critical network alarm that could have prevented a serious network performance or availability problem because it was hidden among non-essential alarms? Hopefully the answer is no, but the situation highlights a serious problem – “alarm clutter”.

Today’s network devices and servers are capable of providing a dizzying set of alarms on almost anything from packet errors to available memory. That’s a lot of power for troubleshooting and problem solving, but it can also mean that even in a small network of only a few hundred elements you can become overwhelmed by a storm of alarms.

Here are three easy techniques for managing the volume of alarms and their relative severity. Using them in the right circumstances can help you find and fix problems more quickly by spending less time wading through a sea of distractions.

Technique 1: Duration-based alarming
Duration-based alarming is a common technique for reducing the number of alarms from a particular device or server. Instead of reporting every instance of an alarm condition, an alarm is issued only if the condition persists for an unusual period of time.

For example, suppose interface utilization on a router occasionally exceeds 90% every few minutes. Normally, this wouldn’t be a concern and an alarm isn’t warranted (in fact, it could mean the router is optimally “sized” for the expected or nominal level of traffic for the interface). On the other hand, if interface utilization exceeds 90% for 15 minutes or more, a bottleneck has developed and an alarm should be generated. With duration based alarming, you are notified only when an actual problem develops—not every time a short, transient condition occurs.

Technique 2: Average-value alarming
Average-value alarming offers a similar approach. Instead of creating an alarm every time a measure exceeds a pre-determined threshold, an alarm is issued only if the average value of the measure over time exceeds the threshold.

It’s not uncommon, for example, to see processor utilization periodically “spike” at 100% for a few seconds. However, if a processor experiences an average of 90% utilization for 20 minutes that would be cause for concern and you would fully expect an alarm.

Technique 3: Severity-level alarming
Rather than setting just one alarm threshold, try setting multiple threshold values that represent increasing levels of severity.

Disk space used, for example, increases gradually to the point where applications can no longer function. Obviously, you want an alarm when available disk space is at 90%, but wouldn’t it be helpful to know when disk space is at 70% and then 80% so you have time to “clean up” the disk before applications suffer? You could configure a minor alarm when available disk space is at 70%, a major alarm at 80%, and finally a critical alarm at 90%.

These are just three of the most useful ways to reduce alarm clutter to focus on actionable alarms. Using them will help you identify significant network issues earlier, before users are impacted.

Friday, September 11, 2009

Building a resilient network using HSRP/VRRP

A group of key servers losing network connectivity can lead to a real bad day. You can improve your network routing resiliency by adding hot-standby routers and HSRP (Hot Standby Routing Protocol) or use a similar method such as VRRP (Virtual Router Redundancy Protocol) for non-Cisco devices. HSRP provides a straightforward approach by having two physical routers accessible by any given switch. It’s a great approach to improving the reliability of your key equipment. Here are some quick basics on how it works:

1. Two routers share a virtual IP address that is used as the gateway IP.
2. A primary and secondary router are designated with each given a priority number. The primary router priority number is higher than the secondary router.
3. The primary router sends a Hello packet to the secondary on a timed basis. If this packet is not received then the secondary becomes the primary. Very low packet loss occurs during a failover and most transmissions using TCP are completed seamlessly due to the re-try mechanism. The re-try mechanism can be controlled by adjusting the length of the time between Hello packets.
4. Once back online, the primary router sends out a notification that includes its priority number. The router with the highest priority number becomes the primary.

This process can be used in groups with multiple routers offering to become the primary router in the event of a failure. Some devices even support using the same interface for multiple groups. This can be helpful when cross connecting multiple departments and to minimize the amount of duplicate hardware necessary.

The following is a sample set of IOS commands necessary to implement HSRP.

Router> enable
Router# configure terminal
Router(config)# interface ethernet 0
Router(config-if)# ip address 172.16.6.5
255.255.255.0
Router(config-if)# standby 1 ip 172.16.6.100
Router(config-if)# end
Router# show standby
Router# show standby ethernet 0

For more information review the Cisco documentation.

Thursday, September 10, 2009

Looking Forward to the Modern Day Marine Show

I’ve attended a lot of events lately, and it’s not over yet! I’ll be attending the Modern Day Marine show, held September 29th – October 1st in Quantico, Virginia. The show focuses on systems and technology, but also caters to equipment and services, making it attractive for all areas of interest. It looks like there are a good number of vendors exhibiting (there are three large tents and an outside exhibitor area), and a few events outside of the expo such as a large Robotics Rodeo and a grand banquet and reception.

This should be a great opportunity to hear more about the U.S. Marine Corps wants and needs in the network management space. It will also be a great venue to get feedback on the upcoming dopplerVUE 2.0 and NeuralStar 9.5 releases. Keep your eyes peeled for my post-show report in early October. Stay tuned for more information on the new release later this week!

Think you might be interested in attending Modern Day Marine? Check it out, here. There will also be two additional Marine Military Expos, Marine South and Marine West, held in 2010.

Thursday, September 3, 2009

A Tip for Managing Wired Networks

Earlier in the week, I posted three tips for managing wireless networks. Thought I would also share a tip for managing wired networks to round things out for the week. Here is a response to a question I often get from customers.

How can I tell when a Cisco device configuration has been altered or accessed?
You can be notified of any configuration changes or attempts by enabling the CiscoConfigManEvent trap. This feature sends you a trap whenever a user exits the configuration session. Simply point the trap to your network management system to see when somebody accesses a Cisco configuration session.

For dopplerVUE Users
You can forward traps as an email for 24x7 instant access to changes on network devices. To read more about this trap and how to configure it, please reference the following Cisco article.

Tuesday, September 1, 2009

3 Tips for Managing Wireless Networks

The convenience of wireless networks can’t be underestimated. In today’s environment, mobile computing is an expectation. So when users run into connectivity and bandwidth issues, you can be sure that complaints are soon to follow. Here are three tips for managing wireless networks to help you avoid some potential headaches.

1. Improve wireless connectivity to the access point
If users frequently lose their wireless signal it’s going to be a frustrating situation. When wireless coverage drops, check for an increase in environmental disruptions. Look for Bluetooth headsets, video cameras and microwaves. These personal wireless devices can cause interference and often are the culprits.

If users still cannot connect reliably (or they were never able to connect at all), you may have a coverage problem, which frequently can be fairly easy to resolve. Use a laptop and the wireless strength meter to map coverage holes in your wireless system. Some free tools such as Netstumbler provide detailed graphs of signal strength and noise level. In addition, periodically perform a laptop survey to find rogue wireless routers to limit their use and ensure optimal network performance. Simply use the same laptop and wireless utility to scan for wireless networks for various points within your buildings.

2. Tracking the availability of the wireless access point
This is the intersection between wired and wireless network management. At a minimum, periodically ping all wireless access points to check their up and down status, making sure they are working and can interconnect to the wired network. Some network management tools like dopplerVUE let you do this on a regular schedule. For SNMP enabled wireless devices such as the Cisco 1200 series, you will also be able to monitor the CPU load and other metrics to know when the device is overloaded, slowing down or dropping all traffic on the access point.

3. Assure sufficient bandwidth to the access point
Once you know users have a decent connection, managing bandwidth may become an issue. Many wireless systems can handle 20 or more users. Monitoring each user on an ad hoc network is likely to be of limited value. Since all users are funneling through the same fixed wire connection, monitoring bandwidth where your access point connects to the network will provide a good summary of the traffic volume. This will help you understand when to upgrade a system and alert you before users call to complain about connectivity issues.

Tip for dopplerVUE users:
For a better understanding of when your network is busy, include wireless access points in your managed network inventory. dopplerVUE will monitor them for their up and down status and core system performance such as CPU load. Use the bandwidth usage finder to watch the real time traffic flow, comparing performance to the automatic benchmarks calculated by dopplerVUE. You will always know the availability status of your wireless access points and how much traffic is generated on each.