The default keep alive time for RPC connections uses the IIS idle connection timeout, which is 15 minutes. This usually doesn't cause a problem on local LAN or WAN connections, but routers and switches that are used to connect Internet clients to internal Exchange servers often have more aggressive timeouts. Typically these network devices have a 5 minute timeout which causes problems for external clients, particularly Outlook Anywhere, iPhone, and iPad clients. Symptoms include messages stuck in the Outbox and poor email performance on the remote clients, and high CPU utilization on the Exchange Client Access Servers (CAS).
The new best practice is to adjust the RPC keep alive timeout value on the Client Access Server from 15 minutes to 2 minutes. Since RPC is a function of Windows, not Exchange, this value is adjusted under the Windows NT registry key. The value is located here:
HKLM\Software\Policies\Microsoft\Windows NT\RPC\MinimumConnectionTimeout
Normally the MinimumConnectionTimeout DWORD value does not exist, which means RPC uses the default value of 900 seconds (15 minutes). To adjust it, create or modify the MinimumConnectionTimeout value and set the value to decimal 120 (seconds, or 2 minutes). IIS must be restarted on the CAS to affect the change.
The following command will create the appropriate values:
reg add "HKLM\Software\Policies\Microsoft\Windows NT\RPC" -v "MinimumConnectionTimeout" -t REG_DWORD -d 120
The Outlook and ActiveSync clients honor this new timeout during the connection to the CAS, so both client and server now send a Keep-Alive packet after two minutes of inactivity, effectively maintaining both TCP connections needed.
A colleague of mine works for a large global company that was affected by this. They have several thousand iPads connecting to nine load balanced CAS servers and all the CAS were peaking at 100% CPU utilization. Once they implemented this change the average load on the CAS is now 20-30% and the iPad performance is much improved.
This is my new best practice and I make this change on every Exchange CAS deployment. For more information about RPC over HTTP see Configuring Computers for RPC over HTTP on TechNet.
Great info Jeff. I will share this with our team and see if we can implement it in our environment soon.
ReplyDeleteAfter the registry value is changed, is a server restart or service restart required?
ReplyDeleteIIS must be reset for the change to go into affect.
ReplyDeleteHow does this work in a load balanced configuration? Most of the hardware guides (Alteon, F5) use a high persistence timeout for rpc.
ReplyDeleteAlso, is this specific for internal Exchange traffic or external traffic?
How does this work in a load balanced configuration? Most of the hardware vendors (F5, Radware, Citrix) use a high rpc persistence timeout. I've seen anywhere from 1 hour to 3 hours.
ReplyDeleteAlso, is this specific to internal traffic or external traffic?
Tnx Jeff!
ReplyDeleteWill keep this in mind on our next deployments!
I've seen the problem before, so this will be a big help!
Hi TCT,
ReplyDeleteThe RPC timeout value is configured on the CAS servers and the clients that connect to them will use that value. It doesn't matter if they are load balanced, but you must configure the same timeout value on all load balanced CAS servers.
If your network devices allow a longer timeout value it doesn't matter. A two minute timeout value is still a safe value to use. The trouble usually lies in border or edge network devices that have more aggressive (shorter) timeout limits.
This is a Windows RPC timeout value and is not Exchange-specific. It affects both internal and external RPC timeouts. That said, I don't know of any other software or technology that uses long (15 minute) RPC timeouts.
Hi Jeff,
ReplyDeleteThnx for this great post. It solved our connection problem between Outlook (2010) and Exchange (2007). The problem we are still facing is that the outlook connection to the Domain Controller to get the GAL (Outlook in online mode) still times out. When someone is away for 20 minutes or more and he/she tries to send an e-mail by selecting the “To” field the error still popups that outlook is unable to connect to the Exchange Server (Domain Controller name in popup box). Do you know/have a smart solution to also keep the connection between Outlook client and domain controller online to get GAL information?
The timeout value on the Firewall is 20 minutes so it seems like the connection between outlook and Domain Controller (RPC) is not refreshed within 20 minutes when the Windows 7 system is in an idle status.
The keep alive registry settings does not seems to work anymore in Windows 7 like it did in XP and 2003.
Thank you.
Maarten
The Netherlands.
Hi Maarten,
DeleteI haven't heard of that problem before, but most customers use Outlook in cached mode which means they're using the OAB. I recommend cached mode for lots of other reasons, too. Would that be a solution for you?
Jeff
Hi Jeff,
ReplyDeleteWe are migrating to a VMware View (VDI) environment so cached mode is not an option for us anymore. With Cached mode on everything works perfect…
Sounds like a networking problem, perhaps in VMware. What exactly is the error you get when you try to access the GAL using the To: button?
DeleteWell, first it is loading for a couple of seconds and then the "unable to connect to exchange server” popup appears with the name from the domain controller in the balloon.
DeleteAfter clicking the error message away and hit the “To” field again everything works. It is related to the timeout settings window from the Firewall (timed it with stopwatch :) ).
Seems like the connection to the domain controller for GAL only gets active when hitting the TO button and of course when idle for 20+ minutes the session is closed by firewall. E-mail messages are still appearing in the inbox so the connection from client to exchange seems ok.
The exact error is: The connection to Microsoft Exchange is unavailable. Outlook must be online or connected to complete this action.
ReplyDeleteThe message in the balloon is: Outlook is trying to retrieve data from the Microsoft Exchange server with the name from the domain controller in it.
Maarten, I assume this is a load balanced CAS array? If so, check your timeout settings on the load balancer. It sounds like it's timing out there.
DeleteSee a similar problem on the F5 forums: https://devcentral.f5.com/Community/GroupDetails/tabid/1082223/asg/52/aft/1174067/showtab/groupforums/Default.aspx
Hi Jeff,
DeleteNo load balancers are active in this setup. We managed to solve the problem by raising the time-out for the high RPC ports from 20 minutes to 2,5 houres on the firewalls. The Windows OS will resfresh the connections every 2 houres by default so the time-out limit will never be reached.
The time-out from client to exchange server we solved with your RPC registry settings, again thnx.
Cheers,
Maarten
Glad you got it nailed down. I'm sure this will help others.
DeleteVery interesting read, but it starts with the preface that this is a new best practice ... can you possibly provide the TechNet or some other official Microsoft article listing this best practice?
ReplyDeleteI ask because it seems timeout values vary dramatically by load balancer vendor, giving me the impression they are "winging it". You grab Netscaler, they leave the default 2min timeout ... which lines up with what you have. You look at a F5 or Baracuda ... they will put it at 2 hours. Some other vendors even recomend a full day.
Only reference I've been able to find on MS's own site is to set it set any load balancer int he path to at least 45minutes for EWS: http://technet.microsoft.com/en-us/library/ff625248.aspx Now EWS isn't RPC ... but it uses IIS so by your own information it should be at 15minutes by default.
Could you by chance link this best practice?
Hi Justin,
DeleteBest practices are created based on real-world experience in multiple deployments where a solution or design is found to be favorable. The guidance above has been used many times to resolve aggressive network timeout issues with great success.
Scott Schnoll mentioned this change in his TechEd 2012 session, "Microsoft Exchange Server 2010 SP2 Tips & Tricks". See http://video.ch9.ms/teched/2012/na/EXL305.pptx
Thanks for the TechED link! I had saved this page for future reference and didn't realize you responded and just wanted to make sure you know I appreciate the followup.
DeleteWhat about multi-role exchange 2010 servers CAS-HUB-Mailbox, would that reg key affect negatively the other roles??
ReplyDeleteThis only affects the CAS role. It has no affect on other roles.
DeleteTHANK YOU JEFF - I assume I have to reboot after adding the key (sometimes we get lucky and dont have to!)??
ReplyDeleteI Jeff - I'm really surprized that an RPC parameter could affect an ActiveSync connection (iPad, iPhone...) which for me is a pure HTTP connection and not RPC over HTTP, but maybe I'm wrong?
ReplyDeleteI made this change to all 4 of our CAS/HUB servers and after restarting IIS it worked for 3 hours and then we started seeing the 1040 events come back again in 15 minute increments.
ReplyDeleteThis was the case for all 4 servers. Any ideas of why this is happening?
Thanks for this article. We put the 2 minutes timeout in place on CAS servers what improved our situation a bit but we are still struggling with long-running established TCP connection to CAS servers on RPC ports. Our internal firewall has 2 hours timeout, F5 even longer (Firewall -> F5 -> CAS). But anyhow our CAS servers keep long-running TCP connections which are already closed on clients and connections even don't decrease significantly over weekend. Our F5 shows balanced connections but on CAS we see big differences between servers. Any idea?
ReplyDeleteThanks for the great documentation of this issue.
ReplyDeleteApplied to a number of my clients, all seems well so far!