Slow internal LAN web traffic with PFSense



  • Hi Group,

    I deployed a PF sense installation for a 50 user setup.  We run an internal LAN web server running an Apple Web Objects application.  When we moved away from a Juniper Netscreen and set up the PFsense Setup (P4 3G, 1G RAM, (3) 3COM 3C905B NICs).  We had good user performance from LAN to WAN and vice-versa but the running of the Web Objects web application was slow and timeout errors occurred constantly from other LAN users.  This webserver is on the same subnet as the LAN so traffic should not be going through the router.  With this we upgraded the PFsense server to a dual Xeon 2.5 quad core machine, 500GB HD with 8 gigs of RAM and 3 Intel server NICS.  The same problem is occurring.  The problem goes away when we revert back to the Juniper device with no other network changes made.  The LAN network is simple, just the router and several managed Cisco switches that are daisy-chained. - that's it.  I checked autonegotiate and MTU settings, no effect.  I restarted the switches, PC's, servers etc.  no effect.   I am at a total loss as to what the problem can be as the router should have no effect on inter-LAN traffic but by comparison with the Juniper device it clearly is affecting something.  If anyone has any thoughts this would be greatly appreciated.

    best,
    Mike



  • Can you give a bit more information on what is enabled on pfSense.
    Is it the DHCP server for the clients?
    Is DNS forwarder enabled?
    If so, is there an entry in Services, DNS Forwarder, Domain Overrides to point DNS queries for your internal names to your internal DNS Server?
    e.g. Domain: mycompany.local IP: 192.168.1.10 Description: Forward DNS requests for internal names to internal DNS server at 192.168.1.10
    I am just trying to think of things that might slow down stuff, e.g. if DNS requests are first going out to internet land and failing, and then maybe the clients have a 2nd local DNS server address which they try next.
    Also, is pfSense running Squid proxy server? is Squid in transparent mode? other packages?
    As long as the clients get a local LAN IP address back from their DNS requests, then a proxy server on pfSense shouldn't come into play.



  • Try also to capture packages on lan interface to check if inter lan traffic for this web server is reaching firewall.

    If so and you have no idea on What is doing this, enable bypass traffic on same interface option on system -> advanced.

    Att,
    Marcello Coutinho



  • Sure, here is some more info.

    We have an active directory server on the same LAN subnet (10.50.100.x) that is acting as DHCP, ADC and DNS.  Per the PFsense documentation we have both internal A records and external A records.  The internal A records are hosted on that server that forwards to external DNS.  The PFsense device is not used for any DNS services.  The PFsense device just has its DNS set to an external DNS server with no forwarding set up.  I did not do this intentionally for any specific reason, I just figured there was no need as it was not servicing DNS.  Interestingly enough, IP requests (bypassing DNS) still show slowness for internal web traffic.  We use no add on packages in PFsense other than the standard core.  We are running the latest 2.0 release updated yesterday.  Initially we were running 1.6 in test mode and major release upgraded along with the hardware upgrade to 2.0.  We had a Barracuda web filter on the network but removed it because we suspected that this may be an issue.  No improvement was seen.

    I am going to do a packet capture and test the bypass traffic when the next maintenance window opens.  Is it typical and for what purpose does PFSense screen inter-subnet traffic?  Any other suggestions to check?  Is Adapter MTU or the type of adapter a real concern, I have heard to stay away from 3com, which motivated the shift to Intel adapters?  Would you suspect the hardware is an issue or does it seem appropriately sized for the installation?  What about the Cisco managed switches, is there a way to hard bypass traffic to the router for the internal web server in managed switches?  What about QoS, I never enabled this in the PFsense or switches, could this be a help or hindrance?  Thank you for the replies.  Any suggestions of things to check during the maintenance window (any ideas way out there are also good…) is greatly appreciated.



  • Just some random thoughts - maybe the Juniper system used to serve DNS? If so, then maybe some clients still think that the Juniper IP is a DNS server and now the pfSense is not providing DNS. Maybe the DHCP on your ADC was setup to give the Juniper IP as an alternate DNS server, and is still doing that? Anywhere else in DHCP or client settings that might still have a reference to something that the Juniper did?

    pfSense should only process packets that are directed at it on the LAN, it shouldn't be looking at other packets that fly past. In fact, with the managed switches, ethernet packets that don't have the pfSense LAN MAC address or broadcast address won't even be put onto the port the pfSense is connected to, so it can't see them. A packet capture inline with the pfSense LAN cable should just see broadcast packets and genuine IP packets with a pfSense LAN IP source or destination.



  • @mklopfer:

    Interestingly enough, IP requests (bypassing DNS) still show slowness for internal web traffic.

    On the web client issue a traceroute (tracert on Windows) to the web server's IP address. The output should list all the intermediate systems between the client and server. Are there any surprises?

    On the web client issue a ping with a count of (say) more than 20 to the web server's IP address. Do you get consistently prompt responses? If so, your problem is more likely a characteristic of the web conversations than a characteristic of the network. Does ping report any losses? If so, you probably have a network problem such as misbehaving hardware or overload.



  • @phil.davis:

    Just some random thoughts - maybe the Juniper system used to serve DNS? If so, then maybe some clients still think that the Juniper IP is a DNS server and now the pfSense is not providing DNS. Maybe the DHCP on your ADC was setup to give the Juniper IP as an alternate DNS server, and is still doing that? Anywhere else in DHCP or client settings that might still have a reference to something that the Juniper did?

    pfSense should only process packets that are directed at it on the LAN, it shouldn't be looking at other packets that fly past. In fact, with the managed switches, ethernet packets that don't have the pfSense LAN MAC address or broadcast address won't even be put onto the port the pfSense is connected to, so it can't see them. A packet capture inline with the pfSense LAN cable should just see broadcast packets and genuine IP packets with a pfSense LAN IP source or destination.

    That makes total sense about the DNS, but we double, triple, checked, etc, the Juniper had no DNS functionality and likewise the PFsense is mirrored for this.  If I do a packet capture should I do this from the PFsense itself or from the testing client machine for the web application (a LAN machine experiencing slowness) using something like Ethereal?



  • @wallabybob:

    @mklopfer:

    Interestingly enough, IP requests (bypassing DNS) still show slowness for internal web traffic.

    On the web client issue a traceroute (tracert on Windows) to the web server's IP address. The output should list all the intermediate systems between the client and server. Are there any surprises?

    On the web client issue a ping with a count of (say) more than 20 to the web server's IP address. Do you get consistently prompt responses? If so, your problem is more likely a characteristic of the web conversations than a characteristic of the network. Does ping report any losses? If so, you probably have a network problem such as misbehaving hardware or overload.

    Hi, no, the traffic shows a direct ping without involvement of the router.  No packet losses are reported for any pings for a number of different runs on different machines.  Traffic appears to be directly routed to the webserver without any intermediates.  The interesting thing is the problem is sporadic.  Some sessions will be fast while others are slow.  So some users complain while others say everything is working perfectly.  The users seem to change on a daily basis which makes me think the problem is associated with sessions and not users.  Yes, this would be symptomatic of a web application problem, but when we go back to the old system for days there are no problems, but once we revert to PFSense within an hour users are complaining of web timeouts for the web server on the same subnet.


  • LAYER 8 Global Moderator

    If the client is on the same lan as the webserver, pfsense would have nothing to do with the traffic.  Unless you are routing traffic to different networks via interfaces on the pfsense - as already stated it would never even see the packets, other than broadcast.

    Can you draw out your network to show connectivity between client and this webserver?

    In a typical setup with only 1 lan segment, you would have a switch.  And from this switch you would have a connection to your webserver, client, and then 1 to pfsense which is the gateway off this segment.

    When client is talking to webserver, switch would pass traffic between the ports - pfsense has nothing to do with this communication at all.  Nothing!

    So unless your either routing traffic between segments using pfsense, or bridging traffic between interfaces with web connected to one interface and client on other interface pfsense is no involved in the communication at all.



  • Since the basic network seems to check out OK I would suspect something about the web sessions.

    How is the configuration different when you exchange pfSense and Juniper? For example, do they have different IP addresses and is there something on the web server that is still accessing the Juniper IP address?

    Have you looked through the web server logs for unexpected timeout reports?

    Do you see similar behaviour with (say) FTP sessions?



  • If you have a busy (connections, not traffic) internet connection it may be that exhaustion of a pfSense resource (states for example) is affecting the web sessions (DNS lookups failing, for example).

    Has the Juniper been tweaked in any way for your environment? Such tweaks might give some clues about specifics of your environment that could relate to to the web server performance.



  • I had a chance yesterday to bring up the system and test the suggestions mentioned so far in the user environment.  I implemented the bypass firewall rules on same subnet option, checked DNS, and performed packet captures as well as logging everything I could see was pertinent to look at later.  The problems still occur and we were forced to revert back to the Netscreen system.  They immediately resolve when it is reverted back to the Netscreen.  I mirrored the Netscreen's DNS settings and confirmed its configuration–-it is plain Jane, no forwarders or anything, it is not used as a DNS server, just a firewall, I double checked this.  In the PFSense router, the state table and resources are no more than 10% filled/used up.  There is no taxing of system resources when the problems occur.  The webserver which performs badly in the PFSense environment serves both internal and external users.  Because there are only a few external users and the problem is persistent but intermittent in intensity, so it is hard to judge the performance difference for internal versus external users.  As expected, I saw no internal traffic from the packetcapture run on the PFSense box on the LAN side--just crosstraffic between the interfaces.

    For those who asked this is the configuration of the network since we simplified through the process of trying to resolve the aforementioned problems:

    The trust port of the PFsense is connected to a set of Cisco Catalyst switches to the bottom of the gigabit backbone which the floor users are plugged directly into.  All PFSense NICs are Intel Pro/1000MT's, and the LAN adapter MTU was changed from 1500 to 1492 to address potential packet fragmentation causing the problem. The port is gigabit-full duplex/autonegotiate which is mirrored on the PFSense box.  The trust network is 10.50.100.1/24.  The webserver with the webobjects application resides on this network at 10.50.100.8.  This server has a virtual network (for the virtual servers on it) on it that is 10.50.150.x/24, this leaves out of a second port on the server and goes into a ZyXel switch only for this 'internal' network.  A backup of one of the virtual machines for this application resides on this network with no other connections.  There is no direct link between this 10.50.150.x network and the router - just the server NICs.   The DMZ port of the PFSense is plugged into a second, ZyXel switch which is set as autonegotiate for 100-full duplex.  Some of the servers are connected to the DMZ.  The Untrust port (WAN) port of the PFSense is connected to a Fatpipe WARP WAN load balance over a 'transfer' network 10.51.200.x/24.  The Fatpipe WARP maps the IP's from the 10.51.200.x to the corresponding IPs (linearly) for each of the external network IP bands we have.  (eg 10.51.200.5 ---> 66.192.146.5 and 11.22.33.5).  We have a TW Telecom fiber and a TW Telecom T1 (Versapack) supplying the two WANs.  The Fatpipe Warp is set with no firewall capability internal to it.

    One thing that I did see that was a little odd was the following a dump of the states:

    udp 224.0.0.1:626 <- 10.50.100.8:626 NO_TRAFFIC:SINGLE  
    tcp 10.50.100.8:64000 <- 10.51.200.8:64000 <- 74.109.251.106:55817 ESTABLISHED:ESTABLISHED  
    tcp 74.109.251.106:55817 -> 10.50.100.8:64000 ESTABLISHED:ESTABLISHED  
    tcp 10.50.100.8:64000 <- 10.51.200.8:64000 <- 74.109.251.106:55821 ESTABLISHED:ESTABLISHED  
    tcp 74.109.251.106:55821 -> 10.50.100.8:64000 ESTABLISHED:ESTABLISHED  
    tcp 10.50.100.8:443 <- 10.51.200.8:443 <- 74.78.171.115:53729 FIN_WAIT_2:ESTABLISHED  
    tcp 74.78.171.115:53729 -> 10.50.100.8:443 ESTABLISHED:FIN_WAIT_2  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49365 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49365 -> 10.51.200.8:49365 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49411 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49411 -> 10.51.200.8:49411 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49417 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49417 -> 10.51.200.8:49417 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49424 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49424 -> 10.51.200.8:49424 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49433 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49433 -> 10.51.200.8:49433 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49438 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49438 -> 10.51.200.8:49438 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49504 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49504 -> 10.51.200.8:49504 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49531 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49531 -> 10.51.200.8:49531 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49545 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49545 -> 10.51.200.8:49545 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49597 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49597 -> 10.51.200.8:49597 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49605 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49605 -> 10.51.200.8:49605 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49624 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49624 -> 10.51.200.8:49624 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2004 <- 10.50.100.8:49671 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49671 -> 10.51.200.8:49671 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49693 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49693 -> 10.51.200.8:49693 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49704 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49704 -> 10.51.200.8:49704 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2006 <- 10.50.100.8:49733 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49733 -> 10.51.200.8:49733 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49744 CLOSED:SYN_SENT  
    tcp 10.50.100.8:49744 -> 10.51.200.8:49744 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
    tcp 66.192.146.8:2005 <- 10.50.100.8:49749 CLOSED:SYN_SENT

    The "SYN_SENT:CLOSED" looks like the device might be failing to send out or in a loopback. The line "tcp 10.50.100.8:49671 -> 10.51.200.8:49671 -> 66.192.146.8:2004 SYN_SENT:CLOSED  " looks like an attempt for the server to talk out over its external IP on the one WAN connection over the 'transfer' network that is failing.  I do not know why the device itself would be trying to talk to itself over it's WAN external IP, and I suspect this has something to do with NAT reflection.

    Another report notes similar concerns:   http://forum.pfsense.org/index.php?topic=21779.0 and http://forum.pfsense.org/index.php?topic=11554.0

    Is it possible that there is a problem with the external transfer of data for this server, the resulting dropped performance for external access causes degraded performance for internal users?  Again, I just have reports from internal users, but external users may also have problems, there are just less of them to get a good read.  When I was configuring the PFsense initially I tried to avoid reconfiguring the internal DNS server by experimenting with NAT reflection, I eventually gave up and reconfigured the DNS, but some elements of NAT reflection may still have propagated with some of the rules before I dropped this configuration.  The way the Juniper Netscreen worked is that there was automatic NAT reflection as default.  As I see the "SYN_SENT:CLOSED" error as an issue with other failed NAT reflection reports, and my packet looks like it is trying to loop back, can the NAT reflection and associated errors be a potential cause of the web performance problems?   If so which settings should I change to rid any element of NAT reflection in the PFsense configuration, globally and for all implemented rules?


    Here is a syslog excerpt:

    Mar 29 08:54:44 dnsmasq[61848]: read /etc/hosts - 2 addresses
    Mar 29 08:54:44 dnsmasq[61848]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:54:44 dnsmasq[61848]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:54:44 dnsmasq[61848]: using nameserver 10.50.100.11#53
    Mar 29 08:54:44 dnsmasq[61848]: using nameserver 4.2.2.2#53
    Mar 29 08:54:44 dnsmasq[61848]: using nameserver 216.136.95.2#53
    Mar 29 08:54:44 dnsmasq[61848]: reading /etc/resolv.conf
    Mar 29 08:54:44 dnsmasq[61848]: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
    Mar 29 08:54:44 dnsmasq[61848]: started, version 2.55 cachesize 10000
    Mar 29 08:54:43 dnsmasq[47908]: exiting on receipt of SIGTERM
    Mar 29 08:54:43 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:54:43 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:54:43 dnsmasq[47908]: using nameserver 10.50.100.11#53
    Mar 29 08:54:43 dnsmasq[47908]: using nameserver 4.2.2.2#53
    Mar 29 08:54:43 dnsmasq[47908]: using nameserver 216.136.95.2#53
    Mar 29 08:54:43 dnsmasq[47908]: reading /etc/resolv.conf
    Mar 29 08:54:43 check_reload_status: Syncing firewall
    Mar 29 08:50:18 kernel: em2: promiscuous mode disabled
    Mar 29 08:50:18 kernel: em2: promiscuous mode enabled
    Mar 29 08:44:52 apinger: Starting Alarm Pinger, apinger(22259)
    Mar 29 08:44:52 check_reload_status: Reloading filter
    Mar 29 08:44:51 apinger: Exiting on signal 15.
    Mar 29 08:44:51 php: : rc.newwanip: on (IP address: 10.50.100.1) (interface: lan) (real interface: em2).
    Mar 29 08:44:51 php: : rc.newwanip: Informational is starting em2.
    Mar 29 08:44:46 check_reload_status: rc.newwanip starting em2
    Mar 29 08:44:46 php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (10.50.100.1)
    Mar 29 08:44:43 php: /interfaces.php: Creating rrd update script
    Mar 29 08:44:43 apinger: Starting Alarm Pinger, apinger(50269)
    Mar 29 08:44:43 check_reload_status: Reloading filter
    Mar 29 08:44:42 php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (10.50.100.1)
    Mar 29 08:44:42 apinger: Exiting on signal 15.
    Mar 29 08:44:40 dnsmasq[47908]: read /etc/hosts - 2 addresses
    Mar 29 08:44:40 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:44:40 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
    Mar 29 08:44:40 dnsmasq[47908]: using nameserver 216.136.95.2#53
    Mar 29 08:44:40 dnsmasq[47908]: using nameserver 4.2.2.2#53
    Mar 29 08:44:40 dnsmasq[47908]: reading /etc/resolv.conf
    Mar 29 08:44:40 dnsmasq[47908]: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
    Mar 29 08:44:40 dnsmasq[47908]: started, version 2.55 cachesize 10000
    Mar 29 08:44:40 check_reload_status: updating dyndns lan
    Mar 29 08:44:40 kernel: em2: link state changed to UP
    Mar 29 08:44:40 check_reload_status: Linkup starting em2
    Mar 29 08:44:39 dnsmasq[15881]: exiting on receipt of SIGTERM
    Mar 29 08:44:37 kernel: em2: link state changed to DOWN
    Mar 29 08:44:37 check_reload_status: Linkup starting em2
    Mar 29 08:44:34 check_reload_status: Syncing firewall
    Mar 29 08:36:38 php: /pkg_edit.php: The command 'killall iperf' returned exit code '1', the output was 'No matching processes were found'
    Mar 29 08:35:03 php: /index.php: Successful webConfigurator login for user 'AGA' from 10.50.100.16
    Mar 29 08:35:03 php: /index.php: Successful webConfigurator login for user 'AGA' from 10.50.100.16
    Mar 29 07:51:02 printer: error cleared
    Mar 29 07:49:06 printer: offline or intervention needed
    Mar 29 06:56:10 printer: error cleared
    Mar 29 06:55:29 printer: offline or intervention needed
    Mar 29 06:24:39 printer: error cleared
    Mar 29 06:22:21 printer: offline or intervention needed
    Mar 28 22:34:12 dnsmasq[15881]: ignoring nameserver 127.0.0.1 - local interface
    Mar 28 22:34:12 dnsmasq[15881]: ignoring nameserver 127.0.0.1 - local interface
    Mar 28 22:34:12 dnsmasq[15881]: using nameserver 216.136.95.2#53
    Mar 28 22:34:12 dnsmasq[15881]: using nameserver 4.2.2.2#53
    Mar 28 22:34:12 dnsmasq[15881]: reading /etc/resolv.conf
    Mar 28 22:30:21 apinger: Starting Alarm Pinger, apinger(5592)
    Mar 28 22:30:21 check_reload_status: Reloading filter
    Mar 28 22:30:20 apinger: Exiting on signal 15.
    Mar 28 22:30:20 php: : ROUTING: setting default route to 10.51.200.1
    Mar 28 22:30:20 php: : rc.newwanip: on (IP address: 10.51.200.2) (interface: wan) (real interface: em1).
    Mar 28 22:30:20 php: : rc.newwanip: Informational is starting em1.
    Mar 28 22:30:14 check_reload_status: rc.newwanip starting em1
    Mar 28 22:30:14 php: : Hotplug event detected for wan but ignoring since interface is configured with static IP (10.51.200.2)

    Here is the dashboard:
    Name
    Version 2.0.1-RELEASE (i386)
    built on Mon Dec 12 18:24:17 EST 2011

    FreeBSD 8.1-RELEASE-p6

    Unable to check for updates.
    Platform pfSense  
    CPU Type Intel(R) Xeon(TM) CPU 3.00GHz
    Current: 750 MHz, Max: 3000 MHz  
    Uptime  
    Current date/time Thu Mar 29 14:00:49 PDT 2012
    DNS server(s) 127.0.0.1
    10.50.100.11
    4.2.2.2
    216.136.95.2

    Last config change Thu Mar 29 13:58:03 PDT 2012
    State table size  
    Show states  
    MBUF Usage 1282/25600  
    CPU usage      
    Memory usage      
    SWAP usage      
    Disk usage

    Interfaces
      WAN     10.51.200.2   1000baseT <full-duplex>LAN     10.50.100.1   1000baseT <full-duplex>DMZ     10.50.101.1   100baseTX <full-duplex>Gateways
    Name Gateway RTT Loss Status
    FPGW  10.51.200.1  0.365ms  0.0%  Online

    Here is the interface summary:

    Status: Interfaces  
    WAN interface (em1)  
    Status up  
    MAC address 00:1b:21:c7:15:7f  
    IP address 10.51.200.2    
    Subnet mask 255.255.255.0  
    Gateway FPGW 10.51.200.1  
    ISP DNS servers 127.0.0.1
    10.50.100.11
    4.2.2.2
    216.136.95.2

    Media 1000baseT <full-duplex>In/out packets 11461325/11458745 (5.43 GB/7.37 GB)  
    In/out packets (pass) 11458745/10676937 (5.43 GB/7.37 GB)  
    In/out packets (block) 2580/0 (140 KB/0 bytes)  
    In/out errors 0/0  
    Collisions 178

    LAN interface (em2)  
    Status up  
    MAC address 00:1b:21:90:37:e3  
    IP address 10.50.100.1    
    Subnet mask 255.255.255.0  
    Media 1000baseT <full-duplex>In/out packets 11282101/11278313 (7.75 GB/5.85 GB)  
    In/out packets (pass) 11278313/12018152 (7.74 GB/5.85 GB)  
    In/out packets (block) 3788/0 (1.30 MB/0 bytes)  
    In/out errors 0/0  
    Collisions 0

    DMZ interface (em0)  
    Status up  
    MAC address 00:1b:21:ca:b8:79  
    IP address 10.50.101.1    
    Subnet mask 255.255.255.0  
    Media 100baseTX <full-duplex>In/out packets 2107389/2107315 (889.02 MB/846.24 MB)  
    In/out packets (pass) 2107315/2119352 (889.02 MB/846.24 MB)  
    In/out packets (block) 74/0 (3 KB/0 bytes)  
    In/out errors 0/0  
    Collisions 0

    Here are the system tunables:

    Tunable Name Description Value
    debug.pfftpproxy  Disable the pf ftp proxy handler.  default (0)

    vfs.read_max  Increase UFS read-ahead speeds to match current state of hard drives and NCQ. More information here: http://ivoras.sharanet.org/blog/tree/2010-11-19.ufs-read-ahead.html  default (32)

    net.inet.ip.portrange.first  Set the ephemeral port range to be lower.  default (1024)

    net.inet.tcp.blackhole  Drop packets to closed TCP ports without returning a RST  default (2)

    net.inet.udp.blackhole  Do not send ICMP port unreachable messages for closed UDP ports  default (1)

    net.inet.ip.random_id  Randomize the ID field in IP packets (default is 0: sequential IP IDs)  default (1)

    net.inet.tcp.drop_synfin  Drop SYN-FIN packets (breaks RFC1379, but nobody uses it anyway)  default (1)

    net.inet.ip.redirect  Enable sending IPv4 redirects  default (1)

    net.inet6.ip6.redirect  Enable sending IPv6 redirects  default (1)

    net.inet.tcp.syncookies  Generate SYN cookies for outbound SYN-ACK packets  default (1)

    net.inet.tcp.recvspace  Maximum incoming/outgoing TCP datagram size (receive)  default (65228)

    net.inet.tcp.sendspace  Maximum incoming/outgoing TCP datagram size (send)  default (65228)

    net.inet.ip.fastforwarding  IP Fastforwarding  default (0)

    net.inet.tcp.delayed_ack  Do not delay ACK to try and piggyback it onto a data packet  default (0)

    net.inet.udp.maxdgram  Maximum outgoing UDP datagram size  default (57344)

    net.link.bridge.pfil_onlyip  Handling of non-IP packets which are not passed to pfil (see if_bridge(4))  default (0)

    net.link.bridge.pfil_member  Set to 0 to disable filtering on the incoming and outgoing member interfaces.  default (1)

    net.link.bridge.pfil_bridge  Set to 1 to enable filtering on the bridge interface  default (0)

    net.link.tap.user_open  Allow unprivileged access to tap(4) device nodes  default (1)

    kern.randompid  Randomize PID's (see src/sys/kern/kern_fork.c: sysctl_kern_randompid())  default (347)

    net.inet.ip.intr_queue_maxlen  Maximum size of the IP input queue  default (1000)

    hw.syscons.kbd_reboot  Disable CTRL+ALT+Delete reboot from keyboard.  default (0)

    net.inet.tcp.inflight.enable  Enable TCP Inflight mode  default (1)

    net.inet.tcp.log_debug  Enable TCP extended debugging  default (0)

    net.inet.icmp.icmplim  Set ICMP Limits  default (0)

    net.inet.tcp.tso  TCP Offload Engine  default (1)

    kern.ipc.maxsockbuf  Maximum socket buffer size  default (4262144)</full-duplex></full-duplex></full-duplex></full-duplex></full-duplex></full-duplex>



  • Bump!

    Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.



  • @mklopfer:

    Bump!

    Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.

    IMO you should try to really isolate the problem first, before trying to fix it. There are too many variables. Follow the problem step by step from the clients to the web server; take a look at the logs in the web server, check where is it connecting and how, which services is it using. You should also check the switches logs. Focus on isolating and understanding the problem before trying to do anything else.



  • Thanks feadin,

    There is nothing of note on the web application logs, traces from client to server, or the switch logs.  The only thing that looked out of sorts was the entry I posted last from the state table.  The web server is http/https only for the client side and all packets either route to the local LAN or through the pfSense system and out to the WAN.  No additional information of merit is given.  We have checked DNS, switches, etc. in diagnostics by replacement followed by testing.  Nothing has helped. We can not recreate the problems seen in a test environment, and everything appears to work correctly when several users are on the web system at once for testing.  Once all the users come on then the problems become evident.    My suspicion of the NAT might be a false lead – this is why I am asking the community before I chase it.  Taking down the working routing system to something that doesn't work correctly really ticks off the users and causes substantial downtime, so I have to dry lab and plan everything before going live with a test.

    @feadin:

    @mklopfer:

    Bump!

    Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.

    IMO you should try to really isolate the problem first, before trying to fix it. There are too many variables. Follow the problem step by step from the clients to the web server; take a look at the logs in the web server, check where is it connecting and how, which services is it using. You should also check the switches logs. Focus on isolating and understanding the problem before trying to do anything else.



  • @mklopfer:

    Thanks feadin,

    There is nothing of note on the web application logs, traces from client to server, or the switch logs.  The only thing that looked out of sorts was the entry I posted last from the state table.  The web server is http/https only for the client side and all packets either route to the local LAN or through the pfSense system and out to the WAN.  No additional information of merit is given.  We have checked DNS, switches, etc. in diagnostics by replacement followed by testing.  Nothing has helped. We can not recreate the problems seen in a test environment, and everything appears to work correctly when several users are on the web system at once for testing.  Once all the users come on then the problems become evident.    My suspicion of the NAT might be a false lead – this is why I am asking the community before I chase it.  Taking down the working routing system to something that doesn't work correctly really ticks off the users and causes substantial downtime, so I have to dry lab and plan everything before going live with a test.

    What kind of connections and services use the webserver on it's server-side? Did you check those when problems start?
    If you cannot reproduce this on a lab, I would start testing right on the client computer when the problem starts. Testing connectivity between client and web server, then connectivity between web server and every service and/or host it uses; like databases, dns, wins, even broadcasts. Go step by step. No point on keeping this on a basic network level only, check other levels as well as they are all interdependent. Even if the problem is at a basic network level, checking other levels allows you to isolate it much faster. After you isolate the problem the solution will be easy. I see no point on trying possible solutions blindly.



  • Thank you everyone for your help - the system has been running for 1 week with no user reported problems.  What I did was explicitly go to every 1:1 and port forward entry and disable NAT reflection.  In advanced I disabled every reference to NAT reflection.  All of the SYN:CLOSED entries in the state table dissipated after this.  I noticed a number of FIN WAIT 2 entries, to attempt to resolve this I used the advice from another thread and set the packet timeout to 1 second for the web server routing entry - it was disastrous for performance and I had to revert.  Despite a number of FIN WAIT2's in the state table, everything works fine now.



  • Good old NAT reflection. This is why split DNS is used. Internal IP are served to LAN and external are served to WAN originating connections.
    That is what it sounded like you where doing, but since you turned off NAT reflection, it seems that it was not.



  • @podilarius:

    Good old NAT reflection. This is why split DNS is used. Internal IP are served to LAN and external are served to WAN originating connections.
    That is what it sounded like you where doing, but since you turned off NAT reflection, it seems that it was not.

    This is the strange thing–-dns on the inside resolved correctly for the webserver when we were still having problems----there must have been something hardcoded somewhere that caused the problem.  Potentially this is in the PFsense box itself---it would just continue to try and unsuccessfully NAT data - causing timeouts.  When the capability was disabled, with no other network changes, everything worked well



  • Not sure why. If the DNS returned internal address (assuming they are on the same subnet) then the traffic should never have gotten to the firewall at all. If you were going to DMZ from a LAN for instance, then it would go through the firewall, but NAT reflection would not have much to do here. You could even switch to advanced outbound NAT and not NATed for that traffic at all, just pure firewall and routing.



  • @podilarius:

    Not sure why. If the DNS returned internal address (assuming they are on the same subnet) then the traffic should never have gotten to the firewall at all. If you were going to DMZ from a LAN for instance, then it would go through the firewall, but NAT reflection would not have much to do here. You could even switch to advanced outbound NAT and not NATed for that traffic at all, just pure firewall and routing.

    What it seemed like was happening was the web server was spending time trying to maintain dropped connections to the outside at the expense of inside connections - which should never touch the firewall.  All internal machines used an internal DNS server that specified the IP for the web server that was on the same subnet.  It looks like the symptoms we were seeing were indirectly related to the reflective NAT issue.  For some reason there were tons of connections between the server and itself trying to loop back over an external address–-my best guess is that something somewhere was hardcoded to talk over that IP.  But if that were the case, removing NAT reflection would not resolve the issue - it would still try and talk out and back and be blocked.  I'm still at a loss to the exact mechanism of the problem but any speculation to help others in the future is welcome.



  • @mklopfer:

    What it seemed like was happening was the web server was spending time trying to maintain dropped connections to the outside at the expense of inside connections - which should never touch the firewall.  All internal machines used an internal DNS server that specified the IP for the web server that was on the same subnet.  It looks like the symptoms we were seeing were indirectly related to the reflective NAT issue.  For some reason there were tons of connections between the server and itself trying to loop back over an external address–-my best guess is that something somewhere was hardcoded to talk over that IP.  But if that were the case, removing NAT reflection would not resolve the issue - it would still try and talk out and back and be blocked.  I'm still at a loss to the exact mechanism of the problem but any speculation to help others in the future is welcome.

    My guess would be that the html/php/asp is telling the client to go to http://<externalip>/internalpage.html/php/asp instead of ./internalpage.html/php.asp and as a result you where getting essentially redirected to the external ip instead of it using the internal ip from DNS. This happens sometimes when your webpage needs to load data from another page. This is generally the wrong way to setup a website IMO.</externalip>


Log in to reply