New Best Practice for RPC Timeouts in Exchange

Tuesday, June 26, 2012
Exchange 2010 and 2007 use RPC (Remote Procedure Calls) for all client and RPC proxy calls.  For example, email clients (Outlook, Outlook Anywhere (OA), and ActiveSync) use RPC for MAPI connectivity. 

The default keep alive time for RPC connections uses the IIS idle connection timeout, which is 15 minutes.  This usually doesn't cause a problem on local LAN or WAN connections, but routers and switches that are used to connect Internet clients to internal Exchange servers often have more aggressive timeouts.  Typically these network devices have a 5 minute timeout which causes problems for external clients, particularly Outlook Anywhere, iPhone, and iPad clients.  Symptoms include messages stuck in the Outbox and poor email performance on the remote clients, and high CPU utilization on the Exchange Client Access Servers (CAS).



The new best practice is to adjust the RPC keep alive timeout value on the Client Access Server from 15 minutes to 2 minutes.  Since RPC is a function of Windows, not Exchange, this value is adjusted under the Windows NT registry key.  The value is located here:

HKLM\Software\Policies\Microsoft\Windows NT\RPC\MinimumConnectionTimeout

Normally the MinimumConnectionTimeout DWORD value does not exist, which means RPC uses the default value of 900 seconds (15 minutes).  To adjust it, create or modify the MinimumConnectionTimeout value and set the value to decimal 120 (seconds, or 2 minutes).  IIS must be restarted on the CAS to affect the change.


The following command will create the appropriate values:

reg add "HKLM\Software\Policies\Microsoft\Windows NT\RPC" -v "MinimumConnectionTimeout" -t REG_DWORD -d 120

The Outlook and ActiveSync clients honor this new timeout during the connection to the CAS, so both client and server now send a Keep-Alive packet after two minutes of inactivity, effectively maintaining both TCP connections needed.

A colleague of mine works for a large global company that was affected by this.  They have several thousand iPads connecting to nine load balanced CAS servers and all the CAS were peaking at 100% CPU utilization.  Once they implemented this change the average load on the CAS is now 20-30% and the iPad performance is much improved.

This is my new best practice and I make this change on every Exchange CAS deployment.  For more information about RPC over HTTP see Configuring Computers for RPC over HTTP on TechNet.

26 comments:

  1. Great info Jeff. I will share this with our team and see if we can implement it in our environment soon.

    ReplyDelete
  2. After the registry value is changed, is a server restart or service restart required?

    ReplyDelete
  3. IIS must be reset for the change to go into affect.

    ReplyDelete
  4. How does this work in a load balanced configuration? Most of the hardware guides (Alteon, F5) use a high persistence timeout for rpc.

    Also, is this specific for internal Exchange traffic or external traffic?

    ReplyDelete
  5. How does this work in a load balanced configuration? Most of the hardware vendors (F5, Radware, Citrix) use a high rpc persistence timeout. I've seen anywhere from 1 hour to 3 hours.

    Also, is this specific to internal traffic or external traffic?

    ReplyDelete
  6. Tnx Jeff!
    Will keep this in mind on our next deployments!
    I've seen the problem before, so this will be a big help!

    ReplyDelete
  7. Hi TCT,

    The RPC timeout value is configured on the CAS servers and the clients that connect to them will use that value. It doesn't matter if they are load balanced, but you must configure the same timeout value on all load balanced CAS servers.

    If your network devices allow a longer timeout value it doesn't matter. A two minute timeout value is still a safe value to use. The trouble usually lies in border or edge network devices that have more aggressive (shorter) timeout limits.

    This is a Windows RPC timeout value and is not Exchange-specific. It affects both internal and external RPC timeouts. That said, I don't know of any other software or technology that uses long (15 minute) RPC timeouts.

    ReplyDelete
  8. Hi Jeff,

    Thnx for this great post. It solved our connection problem between Outlook (2010) and Exchange (2007). The problem we are still facing is that the outlook connection to the Domain Controller to get the GAL (Outlook in online mode) still times out. When someone is away for 20 minutes or more and he/she tries to send an e-mail by selecting the “To” field the error still popups that outlook is unable to connect to the Exchange Server (Domain Controller name in popup box). Do you know/have a smart solution to also keep the connection between Outlook client and domain controller online to get GAL information?

    The timeout value on the Firewall is 20 minutes so it seems like the connection between outlook and Domain Controller (RPC) is not refreshed within 20 minutes when the Windows 7 system is in an idle status.

    The keep alive registry settings does not seems to work anymore in Windows 7 like it did in XP and 2003.

    Thank you.

    Maarten
    The Netherlands.

    ReplyDelete
    Replies
    1. Hi Maarten,

      I haven't heard of that problem before, but most customers use Outlook in cached mode which means they're using the OAB. I recommend cached mode for lots of other reasons, too. Would that be a solution for you?

      Jeff

      Delete
  9. Hi Jeff,
    We are migrating to a VMware View (VDI) environment so cached mode is not an option for us anymore. With Cached mode on everything works perfect…

    ReplyDelete
    Replies
    1. Sounds like a networking problem, perhaps in VMware. What exactly is the error you get when you try to access the GAL using the To: button?

      Delete
    2. Well, first it is loading for a couple of seconds and then the "unable to connect to exchange server” popup appears with the name from the domain controller in the balloon.

      After clicking the error message away and hit the “To” field again everything works. It is related to the timeout settings window from the Firewall (timed it with stopwatch :) ).

      Seems like the connection to the domain controller for GAL only gets active when hitting the TO button and of course when idle for 20+ minutes the session is closed by firewall. E-mail messages are still appearing in the inbox so the connection from client to exchange seems ok.

      Delete
  10. The exact error is: The connection to Microsoft Exchange is unavailable. Outlook must be online or connected to complete this action.

    The message in the balloon is: Outlook is trying to retrieve data from the Microsoft Exchange server with the name from the domain controller in it.

    ReplyDelete
    Replies
    1. Maarten, I assume this is a load balanced CAS array? If so, check your timeout settings on the load balancer. It sounds like it's timing out there.

      See a similar problem on the F5 forums: https://devcentral.f5.com/Community/GroupDetails/tabid/1082223/asg/52/aft/1174067/showtab/groupforums/Default.aspx

      Delete
    2. Hi Jeff,

      No load balancers are active in this setup. We managed to solve the problem by raising the time-out for the high RPC ports from 20 minutes to 2,5 houres on the firewalls. The Windows OS will resfresh the connections every 2 houres by default so the time-out limit will never be reached.

      The time-out from client to exchange server we solved with your RPC registry settings, again thnx.

      Cheers,
      Maarten

      Delete
    3. Glad you got it nailed down. I'm sure this will help others.

      Delete
  11. Very interesting read, but it starts with the preface that this is a new best practice ... can you possibly provide the TechNet or some other official Microsoft article listing this best practice?

    I ask because it seems timeout values vary dramatically by load balancer vendor, giving me the impression they are "winging it". You grab Netscaler, they leave the default 2min timeout ... which lines up with what you have. You look at a F5 or Baracuda ... they will put it at 2 hours. Some other vendors even recomend a full day.

    Only reference I've been able to find on MS's own site is to set it set any load balancer int he path to at least 45minutes for EWS: http://technet.microsoft.com/en-us/library/ff625248.aspx Now EWS isn't RPC ... but it uses IIS so by your own information it should be at 15minutes by default.

    Could you by chance link this best practice?

    ReplyDelete
    Replies
    1. Hi Justin,

      Best practices are created based on real-world experience in multiple deployments where a solution or design is found to be favorable. The guidance above has been used many times to resolve aggressive network timeout issues with great success.

      Scott Schnoll mentioned this change in his TechEd 2012 session, "Microsoft Exchange Server 2010 SP2 Tips & Tricks". See http://video.ch9.ms/teched/2012/na/EXL305.pptx


      Delete
    2. Thanks for the TechED link! I had saved this page for future reference and didn't realize you responded and just wanted to make sure you know I appreciate the followup.

      Delete
  12. What about multi-role exchange 2010 servers CAS-HUB-Mailbox, would that reg key affect negatively the other roles??

    ReplyDelete
    Replies
    1. This only affects the CAS role. It has no affect on other roles.

      Delete
  13. THANK YOU JEFF - I assume I have to reboot after adding the key (sometimes we get lucky and dont have to!)??

    ReplyDelete
  14. I Jeff - I'm really surprized that an RPC parameter could affect an ActiveSync connection (iPad, iPhone...) which for me is a pure HTTP connection and not RPC over HTTP, but maybe I'm wrong?

    ReplyDelete
  15. I made this change to all 4 of our CAS/HUB servers and after restarting IIS it worked for 3 hours and then we started seeing the 1040 events come back again in 15 minute increments.

    This was the case for all 4 servers. Any ideas of why this is happening?

    ReplyDelete
  16. Thanks for this article. We put the 2 minutes timeout in place on CAS servers what improved our situation a bit but we are still struggling with long-running established TCP connection to CAS servers on RPC ports. Our internal firewall has 2 hours timeout, F5 even longer (Firewall -> F5 -> CAS). But anyhow our CAS servers keep long-running TCP connections which are already closed on clients and connections even don't decrease significantly over weekend. Our F5 shows balanced connections but on CAS we see big differences between servers. Any idea?

    ReplyDelete
  17. Thanks for the great documentation of this issue.
    Applied to a number of my clients, all seems well so far!

    ReplyDelete

Thank you for your comment! It is my hope that you find the information here useful. Let others know if this post helped you out, or if you have a comment or further information.