Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Connection Drop after 10 Seconds, TCP, HTTP

    NAT
    5
    26
    8.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      MasterX-BKC- Banned
      last edited by

      Ok ill try to make this simple as can be, and hope someone has an answer for me.

      Synopsis:
      From inside our network, we host a database driven application that does a lot of cross referencing, of about 7 million database rows.  the queries take between 5-15 seconds depending on the type of report being generated.  Thus the http request to load the report takes 5-15 seconds to complete.

      The Problem we are having:
      The problem we have encountered, and tested extensively to be happenning in the PFSense itself is this.

      Any request over http that takes longer than 10 seconds to complete will timeout to the client if they are on the WAN side of the pfsense.  Anyone on the same network as the servers works fine.  The report generated is only ~600KB when the server sends it.

      How we have narrowed it to the PFSense:
      1.  All Clients on the same network as the servers work flawlessly.
      2.  Temporarily gave the servers a WAN IP each, and hooked them up outside the firewall via a switch, and remote clients can then access the reports fine.
      3.  Adjusted up, all the timeouts in apache, php, centos, and mysql, which made zero difference.
      4.  Created a super basic PHP file that simply said, sleep($secs); echo "boom";  And if $secs is set to 10 or more, the connection from WAN clients times out endlessly, but local clients work fine with even a 60 second sleep.
      5.  Added a call to the PHP file to add boom to a text file after the sleep to see if the script runs to its end when wan clients timeout, and it does, but the returned data never gets to any client if the time is longer than 10 seconds.
      6.  Installed NGINX to see if it made a difference, but the same issue persists.

      Specifications:
      PFsense 2.3.2-RELEASE-p1 (amd64) Virtualized.
      No squid, no url filtering, no dansguardian, no HAVP, etc.
      Added Packages: NMAP, CRON, Open-VM-Tools, OpenVPN Client Exporter.

      WAN Settings:
      Static IP, No pppo*, etc.  MTU: 1500, No VLANing.
      We have a /28 and a /29 of IPv4 Addresses.
      We are dual-stacked with IPv6, we have a /64
      GBIT Fiber using media converter into WAN port of pfsense, This is at a datacenter so actual speed to internet is ~1GBPS

      LAN Settings:
      We have not enabled IPv6 internally yet, the local network is IPv4 exclusive for now.
      192.168.x.1/24  Servers are .3, .4, .5, and .6, 4 being the webserver.
      workstation pc is .205 on the same network.
      No proxys or anything are setup, DNS is to googles 8.8.8.8

      Legend for Graph Below:
      WS = Web Server, CentOS 6.8, Apache, PHP, nothing else.  6 Core Xeon, 32 GB RAM
      DB = Database Server, Ubuntu 14.04, MariaDB(MySQL), minimal install, 8 Core Xeon, 48 GB RAM
      PC1 = Client PCs across the internet, tested on win7 x64, chrome and firefox.
      PC2 = Local workstation on same network with the web and DB servers. Windows 7 X64, Chrome, and Firefox.

      Topology:

      
            PC1
             |
      (Cloud/Internet)
             |
          PFSense
             |
       (GBIT Switch)
            /|\
           / | \
         DB WS  PC2
      
      

      What we have tried to remedy the problem inside pfsense:
      1.  Setting state timeout on Rules page entries to 60+ on the rule pertaining to this connection.
      2.  Changed Firewall Optimization to Conservative, Also tried high latency.
      3.  Changing the virtual NIC type from VMX3 to E1000, no change.

      1 Reply Last reply Reply Quote 0
      • N
        Nullity
        last edited by

        I wonder if it's a VM-related problem.

        Please correct any obvious misinformation in my posts.
        -Not a professional; an arrogant ignoramous.

        1 Reply Last reply Reply Quote 0
        • M
          MasterX-BKC- Banned
          last edited by

          So i did some more experimentation and i think i know exactly what is happenning, but i know not how to fix it.

          I made a php file to test with, that waits X seconds between counting up each number.

          So for each second, up to 30, when you load the page, it sends 1, 2, 3, 4, etc…  and buffering is off, so the numbers actually showup 1 by 1.

          With a delay of 1 second, it counts up to 30 just fine, each time it adds a number, i see a packet come from the webserver to the client.

          With a delay up to 8, it works the same, works fine essentially.

          Once the delay between numbers is 9 or higher it gets flaky, it will stop counting between anywhere from 4-8, and never count any higher, and no packets are coming in from the webserver anymore, but i see the php instance on server is still counting just fine.  PFsense has stopped passing the packets outbound.

          It seems to be the time between packets getting to 9-10 seconds terminates the connection in PFsense's eyes.  It gives up waiting, and ceases passing those packets.

          Is there a timeout somewhere that says if 9-10 seconds pass without a packet, terminate this connection, or terminate this state????

          The reports the webserver generates can take up to 15-20 seconds, so this is where the issue is hurting.  Local clients to the server work fine.

          1 Reply Last reply Reply Quote 0
          • M
            MasterX-BKC- Banned
            last edited by

            OK so based on my research, i found 2 threads with simular issues.

            https://forum.pfsense.org/index.php?topic=102175.0

            https://forum.pfsense.org/index.php?topic=51423.0

            Basically, i just want to let the outgoing HTTP traffic go out, even if its state has disappeared, expired, etc…

            webserver is 192.168.1.2, and as it is answering requests from external systems, its source port will always be 80.

            What rules do i need to create to accomplish this?

            For the moment i created a rule on LAN, to pass, tcp flags: any, State type: none.

            I suspect there is more to it than that.  i believe i also need a floating rule, but im not sure on the specifics.

            1 Reply Last reply Reply Quote 0
            • M
              MasterX-BKC- Banned
              last edited by

              Switching the Firewall Optimization to High-Latency improves the problem, but it still times out occasionally.

              Is there a way to manually adjust the Firewall Optimization just for outgoing source port 80 connections, to say double whatever the high-latency option provides???

              1 Reply Last reply Reply Quote 0
              • M
                MasterX-BKC- Banned
                last edited by

                i would really hate to have to use one of my support tickets to solve what should be such a simple rudimentary, tho very thinly documented issue.

                1 Reply Last reply Reply Quote 0
                • johnpozJ
                  johnpoz LAYER 8 Global Moderator
                  last edited by

                  There is not timer that would be for 10 seconds.

                  https://doc.pfsense.org/index.php/Advanced_Setup

                  
                  [2.3.2-RELEASE][root@pfsense.local.lan]/root: pfctl -st    
                  tcp.first                   120s                           
                  tcp.opening                  30s                           
                  tcp.established           86400s                           
                  tcp.closing                 900s                           
                  tcp.finwait                  45s                           
                  tcp.closed                   90s                           
                  tcp.tsdiff                   30s                           
                  udp.first                    60s                           
                  udp.single                   30s                           
                  udp.multiple                 60s                           
                  icmp.first                   20s                           
                  icmp.error                   10s                           
                  other.first                  60s                           
                  other.single                 30s                           
                  other.multiple               60s                           
                  frag                         30s                           
                  interval                     10s                           
                  adaptive.start            58800 states                     
                  adaptive.end             117600 states                     
                  src.track                     0s                           
                  [2.3.2-RELEASE][root@pfsense.local.lan]/root:              
                  
                  

                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                  If you get confused: Listen to the Music Play
                  Please don't Chat/PM me for help, unless mod related
                  SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                  1 Reply Last reply Reply Quote 0
                  • N
                    Nullity
                    last edited by

                    To be clear, this is a fully open TCP connection that loses state after ~30 seconds?

                    If so, there seems to be a problem. No sane default timeout would ever be that low, so I doubt changing any of them would help.

                    Have you done a packet capture or monitored the states table?

                    Please correct any obvious misinformation in my posts.
                    -Not a professional; an arrogant ignoramous.

                    1 Reply Last reply Reply Quote 0
                    • M
                      MasterX-BKC- Banned
                      last edited by

                      i have monitored the state table and i did the packet capture before, here is how it happens.

                      a client connects to the webserver via a browser to request a report.

                      the server answers back and begins generating the report.

                      if it takes longer than ~10 seconds to generate, the server sends the report, but pfsense blocks it from going out, because its closed the state/connection.

                      client spins forever untill they timeout, not knowing the report was sent to them, because pfsense blocked it.

                      1 Reply Last reply Reply Quote 0
                      • N
                        Nullity
                        last edited by

                        @MasterX-BKC-:

                        For the moment i created a rule on LAN, to pass, tcp flags: any, State type: none.

                        State type: none? You sure you want to do that?

                        I'd be very hesitant to start changing things since, by default, things should be working fine, keeping states for ~24 hours. If you start playing with a bunch of options you may run into many unforeseen problems later.

                        Please correct any obvious misinformation in my posts.
                        -Not a professional; an arrogant ignoramous.

                        1 Reply Last reply Reply Quote 0
                        • M
                          MasterX-BKC- Banned
                          last edited by

                          @Nullity:

                          @MasterX-BKC-:

                          For the moment i created a rule on LAN, to pass, tcp flags: any, State type: none.

                          State type: none? You sure you want to do that?

                          I'd be very hesitant to start changing things since, by default, things should be working fine, keeping states for ~24 hours. If you start playing with a bunch of options you may run into many unforeseen problems later.

                          Actually i got it to work finally, using an unusual combination of settings strangely enough.

                          On the Rule corresponding to the NAT policy for port 80 inbound, i went under advanced and did the following:
                          State timeout 60
                          TCP Flags any
                          state type sloppy

                          I tried those options individually, and it seems to require them all for some reason, but in addition i also changed the following under
                          System > Advanced > Firewall NAT
                          TCP First: 60
                          TCP Openning: 60
                          TCP Established: 60 - Tested again and discovered this one has no effect on the issue, works great with it set empty again.
                          Other First: 60

                          I doubt all of these need to be set this way, but im afraid to touch it as its now working flawlessly to generate the reports, they are working fine and to prove it, i even added a extra 30 second delay into the report generator to cause them to take nearly 50 seconds to complete.

                          and with these settings, even a 50 second report generating delay still works perfectly.

                          Im sure an admin, or someone else familiar could direct me to the better way to achieve these same results…..

                          interestingly i first tryed just TCP established: 60, but that wasnt enough to allow it to work either.....

                          UPDATE:  TCP Established seems to not be involved, turning it off didnt break it.

                          My test file is here:  http://pfmon.black-knights.org/test.php
                          Without the options set, it will count to 4-6 and then the connection stops working and hangs, with the settings above, it counts and processes all the way to completion.

                          1 Reply Last reply Reply Quote 0
                          • N
                            Nullity
                            last edited by

                            @MasterX-BKC-:

                            UPDATE:  TCP Established seems to not be involved, turning it off didnt break it.

                            Turning it off defaults it to 86400 seconds or smaller/larger depending on the "Firewall Optimization" setting, I think.

                            You can run the "pftctl -st" command to see what it's set to.

                            Please correct any obvious misinformation in my posts.
                            -Not a professional; an arrogant ignoramous.

                            1 Reply Last reply Reply Quote 0
                            • johnpozJ
                              johnpoz LAYER 8 Global Moderator
                              last edited by

                              "someone else familiar could direct me to the better way to achieve these same results….."

                              There should be no reason why you have to edit such settings.  Did you take a look at pftop when your connections where active to see what the timeouts where in real time for your states??

                              Shouldn't that have been first place to look for such an issue?

                              An intelligent man is sometimes forced to be drunk to spend time with his fools
                              If you get confused: Listen to the Music Play
                              Please don't Chat/PM me for help, unless mod related
                              SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                              1 Reply Last reply Reply Quote 0
                              • D
                                doktornotor Banned
                                last edited by

                                Indeed, these hacks digging holes into your setup are just horrible and absolutely should not be required for anything.

                                1 Reply Last reply Reply Quote 0
                                • M
                                  MasterX-BKC- Banned
                                  last edited by

                                  They were not required before when i was using a Cisco 7507 at the gateway, when i moved this system where i have pfsense is when the issue first came around, but it was handlable and only intermittent untill the reports grew in size.

                                  doktornotor, the fact that im looking for a better way to do this, in of itself denotes that im aware this is not ideal, so your post was not called for, if you arent going to contribute, please move along.

                                  @johnpoz:

                                  There should be no reason why you have to edit such settings.  Did you take a look at pftop when your connections where active to see what the timeouts where in real time for your states??

                                  I agree pftop would be able to help narrow the issue, if it were not for the fact that this network hosts 7 servers, a total of 27 websites.  The one server the issue occurs on hosts 8 such sites, all on the same ports using apache virtualhosts if your familiar with it.(its not virtualization related)  The number of states at peak times has hit 450,000.

                                  This isnt a small 1 off network, this is at a datacenter, with a LOT of traffic, and the server in question being a 12 core(24 thread), 144 GB RAM monster box that handles MySQL for all the other servers as well as internet based systems using https apis.

                                  not your average john boy setup to host a personal webpage from his basement on a extra pc.

                                  1 Reply Last reply Reply Quote 0
                                  • chpalmerC
                                    chpalmer
                                    last edited by

                                    My test file is here:  http://pfmon.black-knights.org/test.php

                                    I don't suppose you would share your code so I could test here eh?

                                    Curious if you have tried 1:1 NAT in favor of port forwarding?    ???

                                    Triggering snowflakes one by one..
                                    Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

                                    1 Reply Last reply Reply Quote 0
                                    • M
                                      MasterX-BKC- Banned
                                      last edited by

                                      all that file does is:

                                      while($i <= 30)
                                      echo $1
                                      $i = $i + 1;
                                      sleep(11);

                                      it just sends numbers every 11 seconds to see if the connection is still alive.

                                      if the browser counts all the way to 30, then the issue is fixed.  if it stops for more than 11 seconds then its died.

                                      1 Reply Last reply Reply Quote 0
                                      • johnpozJ
                                        johnpoz LAYER 8 Global Moderator
                                        last edited by

                                        "The number of states at peak times has hit 450,000."

                                        So maybe your running into state exhaustion and pfsense is killing off the idle ones?

                                        "The one server the issue occurs "

                                        So you have other servers serving up stuff behind pfsense and this sort of thing doesn't happen with them?  Why don't you isolate out this box or try and duplicate on test..

                                        Dok is pointing out that what your doing is not a good idea, and that is very much so a valid contribution to the thread.. If someone like dok says its a bad idea - then its a BAD Idea!!  And I agree what your doing is hack that should not have to be done…  You got something else going on, what your doing is hiding the actual problem.

                                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                                        If you get confused: Listen to the Music Play
                                        Please don't Chat/PM me for help, unless mod related
                                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                        1 Reply Last reply Reply Quote 0
                                        • D
                                          doktornotor Banned
                                          last edited by

                                          I really hate to state the obvious again, but – have you tried this with a physical machine?

                                          1 Reply Last reply Reply Quote 0
                                          • M
                                            MasterX-BKC- Banned
                                            last edited by

                                            the issue is solved, if it was a virtualization related issue i would not have solved it by changing the timeout of pfsense.

                                            I think the source of the issue is this.

                                            PFSense terminates sessions that are openning, if the machine behind pfsense doesnt respond within 10 seconds, period.

                                            When apache/php is doing a large report processing job, it can take between 2 seconds for a small report, and 15-20 seconds for a large report.

                                            if there was a problem in the virtualization, it would be affecting more than this 1 program.

                                            This is not your average situation, this is a workload the likes of which you may not have seen before.

                                            I agree this is not an ideal fix, but please doktornotor, please explain why this is a bad idea to you, from a technical standpoint, so maybe i can see your thought process for this assumption.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.