Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Slow internal LAN web traffic with PFSense

    Scheduled Pinned Locked Moved General pfSense Questions
    22 Posts 7 Posters 17.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mklopfer
      last edited by

      I had a chance yesterday to bring up the system and test the suggestions mentioned so far in the user environment.  I implemented the bypass firewall rules on same subnet option, checked DNS, and performed packet captures as well as logging everything I could see was pertinent to look at later.  The problems still occur and we were forced to revert back to the Netscreen system.  They immediately resolve when it is reverted back to the Netscreen.  I mirrored the Netscreen's DNS settings and confirmed its configuration–-it is plain Jane, no forwarders or anything, it is not used as a DNS server, just a firewall, I double checked this.  In the PFSense router, the state table and resources are no more than 10% filled/used up.  There is no taxing of system resources when the problems occur.  The webserver which performs badly in the PFSense environment serves both internal and external users.  Because there are only a few external users and the problem is persistent but intermittent in intensity, so it is hard to judge the performance difference for internal versus external users.  As expected, I saw no internal traffic from the packetcapture run on the PFSense box on the LAN side--just crosstraffic between the interfaces.

      For those who asked this is the configuration of the network since we simplified through the process of trying to resolve the aforementioned problems:

      The trust port of the PFsense is connected to a set of Cisco Catalyst switches to the bottom of the gigabit backbone which the floor users are plugged directly into.  All PFSense NICs are Intel Pro/1000MT's, and the LAN adapter MTU was changed from 1500 to 1492 to address potential packet fragmentation causing the problem. The port is gigabit-full duplex/autonegotiate which is mirrored on the PFSense box.  The trust network is 10.50.100.1/24.  The webserver with the webobjects application resides on this network at 10.50.100.8.  This server has a virtual network (for the virtual servers on it) on it that is 10.50.150.x/24, this leaves out of a second port on the server and goes into a ZyXel switch only for this 'internal' network.  A backup of one of the virtual machines for this application resides on this network with no other connections.  There is no direct link between this 10.50.150.x network and the router - just the server NICs.   The DMZ port of the PFSense is plugged into a second, ZyXel switch which is set as autonegotiate for 100-full duplex.  Some of the servers are connected to the DMZ.  The Untrust port (WAN) port of the PFSense is connected to a Fatpipe WARP WAN load balance over a 'transfer' network 10.51.200.x/24.  The Fatpipe WARP maps the IP's from the 10.51.200.x to the corresponding IPs (linearly) for each of the external network IP bands we have.  (eg 10.51.200.5 ---> 66.192.146.5 and 11.22.33.5).  We have a TW Telecom fiber and a TW Telecom T1 (Versapack) supplying the two WANs.  The Fatpipe Warp is set with no firewall capability internal to it.

      One thing that I did see that was a little odd was the following a dump of the states:

      udp 224.0.0.1:626 <- 10.50.100.8:626 NO_TRAFFIC:SINGLE  
      tcp 10.50.100.8:64000 <- 10.51.200.8:64000 <- 74.109.251.106:55817 ESTABLISHED:ESTABLISHED  
      tcp 74.109.251.106:55817 -> 10.50.100.8:64000 ESTABLISHED:ESTABLISHED  
      tcp 10.50.100.8:64000 <- 10.51.200.8:64000 <- 74.109.251.106:55821 ESTABLISHED:ESTABLISHED  
      tcp 74.109.251.106:55821 -> 10.50.100.8:64000 ESTABLISHED:ESTABLISHED  
      tcp 10.50.100.8:443 <- 10.51.200.8:443 <- 74.78.171.115:53729 FIN_WAIT_2:ESTABLISHED  
      tcp 74.78.171.115:53729 -> 10.50.100.8:443 ESTABLISHED:FIN_WAIT_2  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49365 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49365 -> 10.51.200.8:49365 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49411 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49411 -> 10.51.200.8:49411 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49417 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49417 -> 10.51.200.8:49417 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49424 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49424 -> 10.51.200.8:49424 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49433 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49433 -> 10.51.200.8:49433 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49438 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49438 -> 10.51.200.8:49438 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49504 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49504 -> 10.51.200.8:49504 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49531 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49531 -> 10.51.200.8:49531 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49545 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49545 -> 10.51.200.8:49545 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49597 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49597 -> 10.51.200.8:49597 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49605 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49605 -> 10.51.200.8:49605 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49624 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49624 -> 10.51.200.8:49624 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2004 <- 10.50.100.8:49671 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49671 -> 10.51.200.8:49671 -> 66.192.146.8:2004 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49693 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49693 -> 10.51.200.8:49693 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49704 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49704 -> 10.51.200.8:49704 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2006 <- 10.50.100.8:49733 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49733 -> 10.51.200.8:49733 -> 66.192.146.8:2006 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49744 CLOSED:SYN_SENT  
      tcp 10.50.100.8:49744 -> 10.51.200.8:49744 -> 66.192.146.8:2005 SYN_SENT:CLOSED  
      tcp 66.192.146.8:2005 <- 10.50.100.8:49749 CLOSED:SYN_SENT

      The "SYN_SENT:CLOSED" looks like the device might be failing to send out or in a loopback. The line "tcp 10.50.100.8:49671 -> 10.51.200.8:49671 -> 66.192.146.8:2004 SYN_SENT:CLOSED  " looks like an attempt for the server to talk out over its external IP on the one WAN connection over the 'transfer' network that is failing.  I do not know why the device itself would be trying to talk to itself over it's WAN external IP, and I suspect this has something to do with NAT reflection.

      Another report notes similar concerns:   http://forum.pfsense.org/index.php?topic=21779.0 and http://forum.pfsense.org/index.php?topic=11554.0

      Is it possible that there is a problem with the external transfer of data for this server, the resulting dropped performance for external access causes degraded performance for internal users?  Again, I just have reports from internal users, but external users may also have problems, there are just less of them to get a good read.  When I was configuring the PFsense initially I tried to avoid reconfiguring the internal DNS server by experimenting with NAT reflection, I eventually gave up and reconfigured the DNS, but some elements of NAT reflection may still have propagated with some of the rules before I dropped this configuration.  The way the Juniper Netscreen worked is that there was automatic NAT reflection as default.  As I see the "SYN_SENT:CLOSED" error as an issue with other failed NAT reflection reports, and my packet looks like it is trying to loop back, can the NAT reflection and associated errors be a potential cause of the web performance problems?   If so which settings should I change to rid any element of NAT reflection in the PFsense configuration, globally and for all implemented rules?


      Here is a syslog excerpt:

      Mar 29 08:54:44 dnsmasq[61848]: read /etc/hosts - 2 addresses
      Mar 29 08:54:44 dnsmasq[61848]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:54:44 dnsmasq[61848]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:54:44 dnsmasq[61848]: using nameserver 10.50.100.11#53
      Mar 29 08:54:44 dnsmasq[61848]: using nameserver 4.2.2.2#53
      Mar 29 08:54:44 dnsmasq[61848]: using nameserver 216.136.95.2#53
      Mar 29 08:54:44 dnsmasq[61848]: reading /etc/resolv.conf
      Mar 29 08:54:44 dnsmasq[61848]: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
      Mar 29 08:54:44 dnsmasq[61848]: started, version 2.55 cachesize 10000
      Mar 29 08:54:43 dnsmasq[47908]: exiting on receipt of SIGTERM
      Mar 29 08:54:43 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:54:43 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:54:43 dnsmasq[47908]: using nameserver 10.50.100.11#53
      Mar 29 08:54:43 dnsmasq[47908]: using nameserver 4.2.2.2#53
      Mar 29 08:54:43 dnsmasq[47908]: using nameserver 216.136.95.2#53
      Mar 29 08:54:43 dnsmasq[47908]: reading /etc/resolv.conf
      Mar 29 08:54:43 check_reload_status: Syncing firewall
      Mar 29 08:50:18 kernel: em2: promiscuous mode disabled
      Mar 29 08:50:18 kernel: em2: promiscuous mode enabled
      Mar 29 08:44:52 apinger: Starting Alarm Pinger, apinger(22259)
      Mar 29 08:44:52 check_reload_status: Reloading filter
      Mar 29 08:44:51 apinger: Exiting on signal 15.
      Mar 29 08:44:51 php: : rc.newwanip: on (IP address: 10.50.100.1) (interface: lan) (real interface: em2).
      Mar 29 08:44:51 php: : rc.newwanip: Informational is starting em2.
      Mar 29 08:44:46 check_reload_status: rc.newwanip starting em2
      Mar 29 08:44:46 php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (10.50.100.1)
      Mar 29 08:44:43 php: /interfaces.php: Creating rrd update script
      Mar 29 08:44:43 apinger: Starting Alarm Pinger, apinger(50269)
      Mar 29 08:44:43 check_reload_status: Reloading filter
      Mar 29 08:44:42 php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (10.50.100.1)
      Mar 29 08:44:42 apinger: Exiting on signal 15.
      Mar 29 08:44:40 dnsmasq[47908]: read /etc/hosts - 2 addresses
      Mar 29 08:44:40 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:44:40 dnsmasq[47908]: ignoring nameserver 127.0.0.1 - local interface
      Mar 29 08:44:40 dnsmasq[47908]: using nameserver 216.136.95.2#53
      Mar 29 08:44:40 dnsmasq[47908]: using nameserver 4.2.2.2#53
      Mar 29 08:44:40 dnsmasq[47908]: reading /etc/resolv.conf
      Mar 29 08:44:40 dnsmasq[47908]: compile time options: IPv6 GNU-getopt no-DBus I18N DHCP TFTP
      Mar 29 08:44:40 dnsmasq[47908]: started, version 2.55 cachesize 10000
      Mar 29 08:44:40 check_reload_status: updating dyndns lan
      Mar 29 08:44:40 kernel: em2: link state changed to UP
      Mar 29 08:44:40 check_reload_status: Linkup starting em2
      Mar 29 08:44:39 dnsmasq[15881]: exiting on receipt of SIGTERM
      Mar 29 08:44:37 kernel: em2: link state changed to DOWN
      Mar 29 08:44:37 check_reload_status: Linkup starting em2
      Mar 29 08:44:34 check_reload_status: Syncing firewall
      Mar 29 08:36:38 php: /pkg_edit.php: The command 'killall iperf' returned exit code '1', the output was 'No matching processes were found'
      Mar 29 08:35:03 php: /index.php: Successful webConfigurator login for user 'AGA' from 10.50.100.16
      Mar 29 08:35:03 php: /index.php: Successful webConfigurator login for user 'AGA' from 10.50.100.16
      Mar 29 07:51:02 printer: error cleared
      Mar 29 07:49:06 printer: offline or intervention needed
      Mar 29 06:56:10 printer: error cleared
      Mar 29 06:55:29 printer: offline or intervention needed
      Mar 29 06:24:39 printer: error cleared
      Mar 29 06:22:21 printer: offline or intervention needed
      Mar 28 22:34:12 dnsmasq[15881]: ignoring nameserver 127.0.0.1 - local interface
      Mar 28 22:34:12 dnsmasq[15881]: ignoring nameserver 127.0.0.1 - local interface
      Mar 28 22:34:12 dnsmasq[15881]: using nameserver 216.136.95.2#53
      Mar 28 22:34:12 dnsmasq[15881]: using nameserver 4.2.2.2#53
      Mar 28 22:34:12 dnsmasq[15881]: reading /etc/resolv.conf
      Mar 28 22:30:21 apinger: Starting Alarm Pinger, apinger(5592)
      Mar 28 22:30:21 check_reload_status: Reloading filter
      Mar 28 22:30:20 apinger: Exiting on signal 15.
      Mar 28 22:30:20 php: : ROUTING: setting default route to 10.51.200.1
      Mar 28 22:30:20 php: : rc.newwanip: on (IP address: 10.51.200.2) (interface: wan) (real interface: em1).
      Mar 28 22:30:20 php: : rc.newwanip: Informational is starting em1.
      Mar 28 22:30:14 check_reload_status: rc.newwanip starting em1
      Mar 28 22:30:14 php: : Hotplug event detected for wan but ignoring since interface is configured with static IP (10.51.200.2)

      Here is the dashboard:
      Name
      Version 2.0.1-RELEASE (i386)
      built on Mon Dec 12 18:24:17 EST 2011

      FreeBSD 8.1-RELEASE-p6

      Unable to check for updates.
      Platform pfSense  
      CPU Type Intel(R) Xeon(TM) CPU 3.00GHz
      Current: 750 MHz, Max: 3000 MHz  
      Uptime  
      Current date/time Thu Mar 29 14:00:49 PDT 2012
      DNS server(s) 127.0.0.1
      10.50.100.11
      4.2.2.2
      216.136.95.2

      Last config change Thu Mar 29 13:58:03 PDT 2012
      State table size  
      Show states  
      MBUF Usage 1282/25600  
      CPU usage      
      Memory usage      
      SWAP usage      
      Disk usage

      Interfaces
        WAN     10.51.200.2   1000baseT <full-duplex>LAN     10.50.100.1   1000baseT <full-duplex>DMZ     10.50.101.1   100baseTX <full-duplex>Gateways
      Name Gateway RTT Loss Status
      FPGW  10.51.200.1  0.365ms  0.0%  Online

      Here is the interface summary:

      Status: Interfaces  
      WAN interface (em1)  
      Status up  
      MAC address 00:1b:21:c7:15:7f  
      IP address 10.51.200.2    
      Subnet mask 255.255.255.0  
      Gateway FPGW 10.51.200.1  
      ISP DNS servers 127.0.0.1
      10.50.100.11
      4.2.2.2
      216.136.95.2

      Media 1000baseT <full-duplex>In/out packets 11461325/11458745 (5.43 GB/7.37 GB)  
      In/out packets (pass) 11458745/10676937 (5.43 GB/7.37 GB)  
      In/out packets (block) 2580/0 (140 KB/0 bytes)  
      In/out errors 0/0  
      Collisions 178

      LAN interface (em2)  
      Status up  
      MAC address 00:1b:21:90:37:e3  
      IP address 10.50.100.1    
      Subnet mask 255.255.255.0  
      Media 1000baseT <full-duplex>In/out packets 11282101/11278313 (7.75 GB/5.85 GB)  
      In/out packets (pass) 11278313/12018152 (7.74 GB/5.85 GB)  
      In/out packets (block) 3788/0 (1.30 MB/0 bytes)  
      In/out errors 0/0  
      Collisions 0

      DMZ interface (em0)  
      Status up  
      MAC address 00:1b:21:ca:b8:79  
      IP address 10.50.101.1    
      Subnet mask 255.255.255.0  
      Media 100baseTX <full-duplex>In/out packets 2107389/2107315 (889.02 MB/846.24 MB)  
      In/out packets (pass) 2107315/2119352 (889.02 MB/846.24 MB)  
      In/out packets (block) 74/0 (3 KB/0 bytes)  
      In/out errors 0/0  
      Collisions 0

      Here are the system tunables:

      Tunable Name Description Value
      debug.pfftpproxy  Disable the pf ftp proxy handler.  default (0)

      vfs.read_max  Increase UFS read-ahead speeds to match current state of hard drives and NCQ. More information here: http://ivoras.sharanet.org/blog/tree/2010-11-19.ufs-read-ahead.html  default (32)

      net.inet.ip.portrange.first  Set the ephemeral port range to be lower.  default (1024)

      net.inet.tcp.blackhole  Drop packets to closed TCP ports without returning a RST  default (2)

      net.inet.udp.blackhole  Do not send ICMP port unreachable messages for closed UDP ports  default (1)

      net.inet.ip.random_id  Randomize the ID field in IP packets (default is 0: sequential IP IDs)  default (1)

      net.inet.tcp.drop_synfin  Drop SYN-FIN packets (breaks RFC1379, but nobody uses it anyway)  default (1)

      net.inet.ip.redirect  Enable sending IPv4 redirects  default (1)

      net.inet6.ip6.redirect  Enable sending IPv6 redirects  default (1)

      net.inet.tcp.syncookies  Generate SYN cookies for outbound SYN-ACK packets  default (1)

      net.inet.tcp.recvspace  Maximum incoming/outgoing TCP datagram size (receive)  default (65228)

      net.inet.tcp.sendspace  Maximum incoming/outgoing TCP datagram size (send)  default (65228)

      net.inet.ip.fastforwarding  IP Fastforwarding  default (0)

      net.inet.tcp.delayed_ack  Do not delay ACK to try and piggyback it onto a data packet  default (0)

      net.inet.udp.maxdgram  Maximum outgoing UDP datagram size  default (57344)

      net.link.bridge.pfil_onlyip  Handling of non-IP packets which are not passed to pfil (see if_bridge(4))  default (0)

      net.link.bridge.pfil_member  Set to 0 to disable filtering on the incoming and outgoing member interfaces.  default (1)

      net.link.bridge.pfil_bridge  Set to 1 to enable filtering on the bridge interface  default (0)

      net.link.tap.user_open  Allow unprivileged access to tap(4) device nodes  default (1)

      kern.randompid  Randomize PID's (see src/sys/kern/kern_fork.c: sysctl_kern_randompid())  default (347)

      net.inet.ip.intr_queue_maxlen  Maximum size of the IP input queue  default (1000)

      hw.syscons.kbd_reboot  Disable CTRL+ALT+Delete reboot from keyboard.  default (0)

      net.inet.tcp.inflight.enable  Enable TCP Inflight mode  default (1)

      net.inet.tcp.log_debug  Enable TCP extended debugging  default (0)

      net.inet.icmp.icmplim  Set ICMP Limits  default (0)

      net.inet.tcp.tso  TCP Offload Engine  default (1)

      kern.ipc.maxsockbuf  Maximum socket buffer size  default (4262144)</full-duplex></full-duplex></full-duplex></full-duplex></full-duplex></full-duplex>

      1 Reply Last reply Reply Quote 0
      • M
        mklopfer
        last edited by

        Bump!

        Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.

        1 Reply Last reply Reply Quote 0
        • F
          feadin
          last edited by

          @mklopfer:

          Bump!

          Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.

          IMO you should try to really isolate the problem first, before trying to fix it. There are too many variables. Follow the problem step by step from the clients to the web server; take a look at the logs in the web server, check where is it connecting and how, which services is it using. You should also check the switches logs. Focus on isolating and understanding the problem before trying to do anything else.

          1 Reply Last reply Reply Quote 0
          • M
            mklopfer
            last edited by

            Thanks feadin,

            There is nothing of note on the web application logs, traces from client to server, or the switch logs.  The only thing that looked out of sorts was the entry I posted last from the state table.  The web server is http/https only for the client side and all packets either route to the local LAN or through the pfSense system and out to the WAN.  No additional information of merit is given.  We have checked DNS, switches, etc. in diagnostics by replacement followed by testing.  Nothing has helped. We can not recreate the problems seen in a test environment, and everything appears to work correctly when several users are on the web system at once for testing.  Once all the users come on then the problems become evident.    My suspicion of the NAT might be a false lead – this is why I am asking the community before I chase it.  Taking down the working routing system to something that doesn't work correctly really ticks off the users and causes substantial downtime, so I have to dry lab and plan everything before going live with a test.

            @feadin:

            @mklopfer:

            Bump!

            Just wanted to see if anyone thought this is a logical place to look before I take down the network for more maintenance this week.

            IMO you should try to really isolate the problem first, before trying to fix it. There are too many variables. Follow the problem step by step from the clients to the web server; take a look at the logs in the web server, check where is it connecting and how, which services is it using. You should also check the switches logs. Focus on isolating and understanding the problem before trying to do anything else.

            1 Reply Last reply Reply Quote 0
            • F
              feadin
              last edited by

              @mklopfer:

              Thanks feadin,

              There is nothing of note on the web application logs, traces from client to server, or the switch logs.  The only thing that looked out of sorts was the entry I posted last from the state table.  The web server is http/https only for the client side and all packets either route to the local LAN or through the pfSense system and out to the WAN.  No additional information of merit is given.  We have checked DNS, switches, etc. in diagnostics by replacement followed by testing.  Nothing has helped. We can not recreate the problems seen in a test environment, and everything appears to work correctly when several users are on the web system at once for testing.  Once all the users come on then the problems become evident.    My suspicion of the NAT might be a false lead – this is why I am asking the community before I chase it.  Taking down the working routing system to something that doesn't work correctly really ticks off the users and causes substantial downtime, so I have to dry lab and plan everything before going live with a test.

              What kind of connections and services use the webserver on it's server-side? Did you check those when problems start?
              If you cannot reproduce this on a lab, I would start testing right on the client computer when the problem starts. Testing connectivity between client and web server, then connectivity between web server and every service and/or host it uses; like databases, dns, wins, even broadcasts. Go step by step. No point on keeping this on a basic network level only, check other levels as well as they are all interdependent. Even if the problem is at a basic network level, checking other levels allows you to isolate it much faster. After you isolate the problem the solution will be easy. I see no point on trying possible solutions blindly.

              1 Reply Last reply Reply Quote 0
              • M
                mklopfer
                last edited by

                Thank you everyone for your help - the system has been running for 1 week with no user reported problems.  What I did was explicitly go to every 1:1 and port forward entry and disable NAT reflection.  In advanced I disabled every reference to NAT reflection.  All of the SYN:CLOSED entries in the state table dissipated after this.  I noticed a number of FIN WAIT 2 entries, to attempt to resolve this I used the advice from another thread and set the packet timeout to 1 second for the web server routing entry - it was disastrous for performance and I had to revert.  Despite a number of FIN WAIT2's in the state table, everything works fine now.

                1 Reply Last reply Reply Quote 0
                • P
                  podilarius
                  last edited by

                  Good old NAT reflection. This is why split DNS is used. Internal IP are served to LAN and external are served to WAN originating connections.
                  That is what it sounded like you where doing, but since you turned off NAT reflection, it seems that it was not.

                  1 Reply Last reply Reply Quote 0
                  • M
                    mklopfer
                    last edited by

                    @podilarius:

                    Good old NAT reflection. This is why split DNS is used. Internal IP are served to LAN and external are served to WAN originating connections.
                    That is what it sounded like you where doing, but since you turned off NAT reflection, it seems that it was not.

                    This is the strange thing–-dns on the inside resolved correctly for the webserver when we were still having problems----there must have been something hardcoded somewhere that caused the problem.  Potentially this is in the PFsense box itself---it would just continue to try and unsuccessfully NAT data - causing timeouts.  When the capability was disabled, with no other network changes, everything worked well

                    1 Reply Last reply Reply Quote 0
                    • P
                      podilarius
                      last edited by

                      Not sure why. If the DNS returned internal address (assuming they are on the same subnet) then the traffic should never have gotten to the firewall at all. If you were going to DMZ from a LAN for instance, then it would go through the firewall, but NAT reflection would not have much to do here. You could even switch to advanced outbound NAT and not NATed for that traffic at all, just pure firewall and routing.

                      1 Reply Last reply Reply Quote 0
                      • M
                        mklopfer
                        last edited by

                        @podilarius:

                        Not sure why. If the DNS returned internal address (assuming they are on the same subnet) then the traffic should never have gotten to the firewall at all. If you were going to DMZ from a LAN for instance, then it would go through the firewall, but NAT reflection would not have much to do here. You could even switch to advanced outbound NAT and not NATed for that traffic at all, just pure firewall and routing.

                        What it seemed like was happening was the web server was spending time trying to maintain dropped connections to the outside at the expense of inside connections - which should never touch the firewall.  All internal machines used an internal DNS server that specified the IP for the web server that was on the same subnet.  It looks like the symptoms we were seeing were indirectly related to the reflective NAT issue.  For some reason there were tons of connections between the server and itself trying to loop back over an external address–-my best guess is that something somewhere was hardcoded to talk over that IP.  But if that were the case, removing NAT reflection would not resolve the issue - it would still try and talk out and back and be blocked.  I'm still at a loss to the exact mechanism of the problem but any speculation to help others in the future is welcome.

                        1 Reply Last reply Reply Quote 0
                        • P
                          podilarius
                          last edited by

                          @mklopfer:

                          What it seemed like was happening was the web server was spending time trying to maintain dropped connections to the outside at the expense of inside connections - which should never touch the firewall.  All internal machines used an internal DNS server that specified the IP for the web server that was on the same subnet.  It looks like the symptoms we were seeing were indirectly related to the reflective NAT issue.  For some reason there were tons of connections between the server and itself trying to loop back over an external address–-my best guess is that something somewhere was hardcoded to talk over that IP.  But if that were the case, removing NAT reflection would not resolve the issue - it would still try and talk out and back and be blocked.  I'm still at a loss to the exact mechanism of the problem but any speculation to help others in the future is welcome.

                          My guess would be that the html/php/asp is telling the client to go to http://<externalip>/internalpage.html/php/asp instead of ./internalpage.html/php.asp and as a result you where getting essentially redirected to the external ip instead of it using the internal ip from DNS. This happens sometimes when your webpage needs to load data from another page. This is generally the wrong way to setup a website IMO.</externalip>

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.