HTTP randomly blocked?



  • We have two firewalls running on RC2 in a dual-WAN configuration on different hardware.  One of them has been running fine for awhile 
    now.  The other has an intermittent problem where all of the users behind it will randomly get "Page cannot be displayed", or "server 
    not responding" while browsing.  This will happen completely randomly, and only lasts for a minute or so, then everything starts 
    working again.  This happens probably a half-dozen or more times per  day.
        Here is a little more info on the box:
    Foxconn mbd (latest BIOS)
    Intel 10/100 NICs plus a Netgear wireless NIC bridged to the LAN

    We did try disabling the hardware checksumming in the advanced 
    config.  Also tried swapping out NICs.  And while this is happening 
    the connections are actually still up, not only in the status screen 
    but also functionally as we can be remoted in through one or the 
    other WAN link observing the problem with no break in our remote 
    session.  We can also ping external hosts okay.  I.e. it seems to 
    only be affecting HTTP. 
      Any and all help/ideas/etc is appreciated!



  • Might be states getting closed too quickly. Under Advanced, change Firewall Optimization to conservative and see if that changes anything.



  • @cmb:

    Might be states getting closed too quickly. Under Advanced, change Firewall Optimization to conservative and see if that changes anything.

    That seems to have made it worse.  Trying it at "Aggressive" now to see what happens.



  • Conservative definitely wouldn't make it worse if it were states timing out too quickly. Aggressive would make it worse if that were the case.

    I think it's time to evaluate some packet captures and see what's really happening on the wire.



  • @cmb:

    Conservative definitely wouldn't make it worse if it were states timing out too quickly. Aggressive would make it worse if that were the case.

    I think it's time to evaluate some packet captures and see what's really happening on the wire.

    Aggressive made no difference.  Haven't had a chance to do packet captures yet but did have somebody unplug one line for awhile.  As soon as they did that it started working fine.  They're going to switch which line is plugged in tomorrow.  If it still works well then we'll know it's a load-balancing issue.  (Speaking of which, forgot to mention we have "sticky" connections enabled).



  • Almost sounds like you are running out of states.  Try increasing the defeault 10K state limit to something higher depending on ram.  Roughly 2K per state.



  • @sullrich:

    Almost sounds like you are running out of states.  Try increasing the defeault 10K state limit to something higher depending on ram.  Roughly 2K per state.

    I did consider that, (forgot to mention, sorry!).  They've never gone over a couple hundred states.  It's a very small office.
      However, we did go ahead with the connection switch mentioned above and both lines work fine independently, it's only when in LB/failover mode that we're getting the issue.  (Is there a way to move an entire thread to a different forum?  :)
      Checked the build dates between this one and the one that works and discovered they are slightly different, even though both are "RC2", so we're next going to try the new RC3 and also duplicating the CD from the working unit and see if either of those works better.
      Stay tuned…



  • So we swapped CDs for an RC1 that's been working fine for months and it's still happening!
      I would say it's a hardware problem were it not for the fact that each connection on its own works flawlessly.  It's only when in LB mode that the issue occurs.
      Interestingly, I did compare the config.xml files and noticed that there was a monitorip set on the load balance entry for the one that's having problems, with that setting being blank, (i.e. "<monitorip>"), on the good one.  Would this make any difference?  (Can't test it until later). </monitorip>



  • Yes it could.  You need a working monitorip.



  • @sullrich:

    Yes it could.  You need a working monitorip.

    Even outside of the per-connection monitor IP? 
    I don't know how it ended up this way, (and we'll manually editing the config.xml tonight to test), but here's the settings from the good(!) unit's load-balancing entry:

    <lbpool><type>gateway</type>
    <behaviour>balance</behaviour>
    <monitorip><name>W1LBW2</name>
    <desc>Normal round robin</desc>
    <port><servers>wan|4.2.2.5</servers>
    <servers>opt1|4.2.2.6</servers></port></monitorip></lbpool>

    And here's the bad one:

    <lbpool><type>gateway</type>
    <behaviour>balance</behaviour>
    <monitorip>4.2.2.5</monitorip>
    <name>W1LBW2</name>
    <desc>Normal round robin</desc>
    <port><servers>wan|4.2.2.5</servers>
    <servers>opt1|4.2.2.6</servers></port></lbpool>

    Will post back with tweak results tomorrow…



  • Removing that extraneous monitor IP in the LB config seems to have fixed it.  Also bumped states up to 20k as my feeble attempt at a stress test managed to occupy just over 1000 states (approx. 20 simultaneous browser page loads).  Will post back again if any more weirdness happens…


Log in to reply