Monday, July 13, 2009

Heading Off Trouble with Exchange Servers

I recently discussed the frequency of email failures in a June post. As a follow-up I wanted to provide some practical tips on managing Microsoft Exchange Servers to ensure the highest possible service levels for your users and head off problems before they become critical.

For Exchange servers, Microsoft's Windows Management Interface (WMI) performance counters provide a simple and effective method for monitoring Exchange servers. If your network management solution supports WMI, you can easily leverage WMI to manage Exchange servers.

Monitoring Queue Size
With over one thousand WMI performance counters available for an Exchange server, you can get very sophisticated in managing your devices and processes. For most people, however, the following counters for the Information Store service can provide a good indication of overall Exchange performance.
- MSExchangeISMailbox:SendQSize
- MSExchangeISMailbox:ReceiveQSize
- MSExchangeISPublic:SendQSize
- MSExchangeISPublic:ReceiveQSize
These counters reflect the message queue sizes for each instance of the public or mailbox stores. Although brief spikes are not uncommon, all of these counters should be close to zero during normal operations. Queue sizes that do not return to nearly zero within 10 to 15 minutes indicate a potential issue with message routing or service processing; however, larger environments may have queue sizes ranging from 5 to 10 while exhibiting acceptable performance. For these environments, queue sizes between 5 and 10 are not uncommon.

Another counter to consider is the MTA Work Queue Length (MSExchangeMTA:WorkQueueLength), which shows the number of queued messages being sent to or received from email servers other than Exchange Server 2003. A queue size that consistently exceeds 10 or 20 messages may indicate a problem with the MTA service.

Monitoring Email Delivery
Additionally, six more performance counters related to email delivery can provide a more rounded view of Exchange server performance. The counter values are unique to each environment, but monitoring them over time provides a baseline for a server’s steady state performance.
- MSExchangeISMailbox:AvgDeliveryTime(s)
- MSExchangeISMailbox:MsgsSentPerMin
- MSExchangeISMailbox:MsgsDeliveredPerMin
- MSExchangeISPublic:AvgDeliveryTime(s)
- MSExchangeISPublic:MsgsSentPerMin
- MSExchangeISPublic:MsgsDeliveredPerMin

Average delivery time values should be in the range of 600 to 900 milliseconds. Values greater than 1500 milliseconds indicate a performance problem. While the number of messages sent and delivered per minute is mostly informational in nature, it provides a good indication of general performance.

Monitoring Server Performance
To effectively monitor an Exchange server, it is important to monitor the underlying server resources as well. Again, there are thousands of available performance counters, but the following counters offer a good overview of server performance and resources without swamping you in data.
- Processor:%ProcessorTime. Processor or CPU utilization, on average, should be less than 70%. Utilization greater than 85-90% for more than 30 minutes, or 90-100% for more than 10 minutes, indicates an overloaded server.
- PagingFile:%Usage. The paging file for virtual memory should be less than 75%. Excessive paging, say 85-90% for any period of time, is cause for concern.
- Memory:AvailableMBytes. Physical memory values below 20MB indicate insufficient RAM.
- LogicalDisk:%DiskTime. The amount of time a disk spends reading and writing data should be in the neighborhood of 60-70%, although brief spikes are not unusual.
- LogicalDisk:%FreeSpace. Exchange uses a lot of disk space, so overall free space should be monitored closely. The Windows and Exchange volume should have 256MB of free space; the Exchange database volume should have 1GB of free space; and the transaction log volume should have 100MB of free space.
- NetworkInterface:CurrentBandwidth. Acceptable interface bandwidth will depend on the type and size of the network, but generally speaking the average bandwidth should be 50-60% of maximum capacity.

If your network management solution doesn't support WMI or you are looking for a proven solution consider dopplerVUE. It provides powerful network management capabilities in an easy to use software package.

No comments:

Post a Comment