New Best Practice for RPC Timeouts in Exchange

Tuesday, June 26, 2012

Exchange 2010 and 2007 use RPC (Remote Procedure Calls) for all client and RPC proxy calls. For example, email clients (Outlook, Outlook Anywhere (OA), and ActiveSync) use RPC for MAPI connectivity.

The default keep alive time for RPC connections uses the IIS idle connection timeout, which is 15 minutes. This usually doesn't cause a problem on local LAN or WAN connections, but routers and switches that are used to connect Internet clients to internal Exchange servers often have more aggressive timeouts. Typically these network devices have a 5 minute timeout which causes problems for external clients, particularly Outlook Anywhere, iPhone, and iPad clients. Symptoms include messages stuck in the Outbox and poor email performance on the remote clients, and high CPU utilization on the Exchange Client Access Servers (CAS).

The new best practice is to adjust the RPC keep alive timeout value on the Client Access Server from 15 minutes to 2 minutes. Since RPC is a function of Windows, not Exchange, this value is adjusted under the Windows NT registry key. The value is located here:

HKLM\Software\Policies\Microsoft\Windows NT\RPC\MinimumConnectionTimeout

Normally the MinimumConnectionTimeout DWORD value does not exist, which means RPC uses the default value of 900 seconds (15 minutes). To adjust it, create or modify the MinimumConnectionTimeout value and set the value to decimal 120 (seconds, or 2 minutes). IIS must be restarted on the CAS to affect the change.

The following command will create the appropriate values:

reg add "HKLM\Software\Policies\Microsoft\Windows NT\RPC" -v "MinimumConnectionTimeout" -t REG_DWORD -d 120

The Outlook and ActiveSync clients honor this new timeout during the connection to the CAS, so both client and server now send a Keep-Alive packet after two minutes of inactivity, effectively maintaining both TCP connections needed.

A colleague of mine works for a large global company that was affected by this. They have several thousand iPads connecting to nine load balanced CAS servers and all the CAS were peaking at 100% CPU utilization. Once they implemented this change the average load on the CAS is now 20-30% and the iPad performance is much improved.

This is my new best practice and I make this change on every Exchange CAS deployment. For more information about RPC over HTTP see Configuring Computers for RPC over HTTP on TechNet.

26 comments:

DavidJune 26, 2012 at 9:34 AM
Great info Jeff. I will share this with our team and see if we can implement it in our environment soon.
ReplyDelete
Replies
Scott LadewigJune 26, 2012 at 12:15 PM
After the registry value is changed, is a server restart or service restart required?
ReplyDelete
Replies
Jeff Guillet - @exptaJune 26, 2012 at 12:31 PM
IIS must be reset for the change to go into affect.
ReplyDelete
Replies
TCTJune 26, 2012 at 8:43 PM
How does this work in a load balanced configuration? Most of the hardware guides (Alteon, F5) use a high persistence timeout for rpc.

Also, is this specific for internal Exchange traffic or external traffic?
ReplyDelete
Replies
TCTJune 26, 2012 at 8:46 PM
How does this work in a load balanced configuration? Most of the hardware vendors (F5, Radware, Citrix) use a high rpc persistence timeout. I've seen anywhere from 1 hour to 3 hours.

Also, is this specific to internal traffic or external traffic?
ReplyDelete
Replies
DGoossensJune 27, 2012 at 4:43 AM
Tnx Jeff!
Will keep this in mind on our next deployments!
I've seen the problem before, so this will be a big help!
ReplyDelete
Replies
Jeff Guillet - @exptaJune 27, 2012 at 8:55 AM
Hi TCT,

The RPC timeout value is configured on the CAS servers and the clients that connect to them will use that value. It doesn't matter if they are load balanced, but you must configure the same timeout value on all load balanced CAS servers.

If your network devices allow a longer timeout value it doesn't matter. A two minute timeout value is still a safe value to use. The trouble usually lies in border or edge network devices that have more aggressive (shorter) timeout limits.

This is a Windows RPC timeout value and is not Exchange-specific. It affects both internal and external RPC timeouts. That said, I don't know of any other software or technology that uses long (15 minute) RPC timeouts.
ReplyDelete
Replies
MaartenJuly 18, 2012 at 9:55 AM
Hi Jeff,

Thnx for this great post. It solved our connection problem between Outlook (2010) and Exchange (2007). The problem we are still facing is that the outlook connection to the Domain Controller to get the GAL (Outlook in online mode) still times out. When someone is away for 20 minutes or more and he/she tries to send an e-mail by selecting the “To” field the error still popups that outlook is unable to connect to the Exchange Server (Domain Controller name in popup box). Do you know/have a smart solution to also keep the connection between Outlook client and domain controller online to get GAL information?

The timeout value on the Firewall is 20 minutes so it seems like the connection between outlook and Domain Controller (RPC) is not refreshed within 20 minutes when the Windows 7 system is in an idle status.

The keep alive registry settings does not seems to work anymore in Windows 7 like it did in XP and 2003.

Thank you.

Maarten
The Netherlands.
ReplyDelete
Replies
MaartenJuly 18, 2012 at 10:03 AM
Hi Jeff,
We are migrating to a VMware View (VDI) environment so cached mode is not an option for us anymore. With Cached mode on everything works perfect…
ReplyDelete
Replies
MaartenJuly 18, 2012 at 11:09 AM
The exact error is: The connection to Microsoft Exchange is unavailable. Outlook must be online or connected to complete this action.

The message in the balloon is: Outlook is trying to retrieve data from the Microsoft Exchange server with the name from the domain controller in it.
ReplyDelete
Replies
AnonymousSeptember 12, 2012 at 9:24 AM
Very interesting read, but it starts with the preface that this is a new best practice ... can you possibly provide the TechNet or some other official Microsoft article listing this best practice?

I ask because it seems timeout values vary dramatically by load balancer vendor, giving me the impression they are "winging it". You grab Netscaler, they leave the default 2min timeout ... which lines up with what you have. You look at a F5 or Baracuda ... they will put it at 2 hours. Some other vendors even recomend a full day.

Only reference I've been able to find on MS's own site is to set it set any load balancer int he path to at least 45minutes for EWS: http://technet.microsoft.com/en-us/library/ff625248.aspx Now EWS isn't RPC ... but it uses IIS so by your own information it should be at 15minutes by default.

Could you by chance link this best practice?

ReplyDelete
Replies
GaryJanuary 8, 2013 at 12:33 PM
What about multi-role exchange 2010 servers CAS-HUB-Mailbox, would that reg key affect negatively the other roles??
ReplyDelete
Replies
GaryJanuary 9, 2013 at 11:53 AM
THANK YOU JEFF - I assume I have to reboot after adding the key (sometimes we get lucky and dont have to!)??
ReplyDelete
Replies
AnonymousJanuary 30, 2013 at 2:08 AM
I Jeff - I'm really surprized that an RPC parameter could affect an ActiveSync connection (iPad, iPhone...) which for me is a pure HTTP connection and not RPC over HTTP, but maybe I'm wrong?
ReplyDelete
Replies
UnknownFebruary 8, 2013 at 1:37 PM
I made this change to all 4 of our CAS/HUB servers and after restarting IIS it worked for 3 hours and then we started seeing the 1040 events come back again in 15 minute increments.

This was the case for all 4 servers. Any ideas of why this is happening?

ReplyDelete
Replies
AnonymousFebruary 24, 2013 at 12:22 PM
Thanks for this article. We put the 2 minutes timeout in place on CAS servers what improved our situation a bit but we are still struggling with long-running established TCP connection to CAS servers on RPC ports. Our internal firewall has 2 hours timeout, F5 even longer (Firewall -> F5 -> CAS). But anyhow our CAS servers keep long-running TCP connections which are already closed on clients and connections even don't decrease significantly over weekend. Our F5 shows balanced connections but on CAS we see big differences between servers. Any idea?
ReplyDelete
Replies
AnonymousMarch 18, 2013 at 10:13 AM
Thanks for the great documentation of this issue.
Applied to a number of my clients, all seems well so far!
ReplyDelete
Replies

Add comment

Thank you for your comment! It is my hope that you find the information here useful. Let others know if this post helped you out, or if you have a comment or further information.

Pages

New Best Practice for RPC Timeouts in Exchange

26 comments: