Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    New 502 Bad Gateway

    Scheduled Pinned Locked Moved 2.4 Development Snapshots
    281 Posts 67 Posters 205.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      @john_galt:

      After scrolling through all 6,000 lines of the system.log file I see several of these lines they appear to be at 1 minute intervals

      Oct 12 05:43:10 pfSense kernel: sonewconn: pcb 0xfffff80008c430f0: Listen queue overflow: 193 already in queue awaiting acceptance (708 occurrences)
      

      For that, check the current value of kern.ipc.soacceptqueue  (run "sysctl kern.ipc.soacceptqueue"). It's probably at the default of 128. Set that to 1024 or higher (e.g. 4096), make a tunable for that under System > Advanced, Tunables tab.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • john_galtJ
        john_galt
        last edited by

        jimp

        Indeed it was set to 128. I added the tunable and set it to 4096 and applied. I'm doing this via openVPN connection. Do I need to reboot it?

        Doug

        Doug

        1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          No need to reboot, that is one that can be changed at run-time.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • john_galtJ
            john_galt
            last edited by

            Okay. Will keep an eye on it and thanks.

            Doug

            1 Reply Last reply Reply Quote 0
            • S
              seanr22a
              last edited by

              Now the SG-2440 boxes stopped working as well  :( it lasted 6 days before I got 502 error.  Now all of them are back running 2.3.4p1 and  XMLRPC Sync is working as well. I will wait for the final release before I give 2.4 a new try.

              1 Reply Last reply Reply Quote 0
              • M
                MaxPF
                last edited by

                Same thing here this morning. I had to force a reboot. I also increased sysctl kern.ipc.soacceptqueue to 4096 as suggested. While I was at it, I updated to latest build and it now says 2.4 RELEASE.

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.

                  pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • B
                    BreeOge
                    last edited by

                    @jimp:

                    Lots of port 1344, do you have squid+clamav active as well? Can you try shutting that off?

                    Also that netstat -x output is too big to put inline, you should attach that in a .txt file

                    Locked up again, without squid+clamav installed..

                    
                    # /usr/sbin/swapinfo -h
                    Device          1K-blocks     Used    Avail Capacity
                    /dev/label/swap0  33554428       0B      32G     0%
                    #
                    
                    
                    
                    # /usr/bin/top | /usr/bin/head -n7
                    last pid: 44796;  load averages:  0.10,  0.13,  0.12  up 0+14:58:53    12:32:05
                    88 processes:  1 running, 85 sleeping, 2 stopped
                    
                    Mem: 475M Active, 827M Inact, 1187M Wired, 832M Buf, 13G Free
                    Swap: 32G Total, 32G Free
                    
                    
                    
                    # /usr/bin/netstat -Ln
                    Current listen queue sizes (qlen/incqlen/maxqlen)
                    Proto Listen                           Local Address
                    Netgraph sockets
                    Type  Recv-Q Send-Q Node Address   #Hooks
                    ctrl       0      0 [46b]:            0
                    ctrl       0      0 [468]:            0
                    ctrl       0      0 [44b]:            0
                    ctrl       0      0 [443]:            0
                    ctrl       0      0 [3f9]:            0
                    ctrl       0      0 [3b0]:            0
                    ctrl       0      0 [367]:            0
                    ctrl       0      0 [31e]:            0
                    ctrl       0      0 [2d5]:            0
                    ctrl       0      0 [28c]:            0
                    ctrl       0      0 [243]:            0
                    ctrl       0      0 [1fb]:            0
                    ctrl       0      0 [1b0]:            0
                    ctrl       0      0 [11]:             0
                    ctrl       0      0 [5]:              0
                    unix  0/0/5                            /var/run/dpinger_WAN_DHCP~70.178.196.154~70.178.196.1.sock
                    unix  0/0/5                            /var/run/dpinger_Steve_Telephone~192.168.16.2~10.10.10.2.sock
                    unix  0/0/5                            /var/run/dpinger_Raymond_Telephone~192.168.16.2~10.10.12.2.sock
                    unix  0/0/5                            /var/run/dpinger_Kevin_Telephone~192.168.16.2~10.10.11.2.sock
                    unix  0/0/5                            /var/run/dpinger_Cisco_Router~192.168.16.2~192.168.16.201.sock
                    unix  0/0/4                            /var/run/devd.pipe
                    unix  0/0/30                           /var/run/check_reload_status
                    unix  193/0/128                        /var/run/php-fpm.socket
                    unix  0/0/4                            /var/run/devd.seqpacket.pipe
                    
                    

                    pastbin /usr/bin/netstat -xn

                    https://pastebin.com/ZFujW9Kp

                    1 Reply Last reply Reply Quote 0
                    • P
                      pyrodex
                      last edited by

                      @jimp:

                      For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.

                      pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.

                      I have never run squid+clamav and experienced it with only pfBlocker. I removed it the other day to restore usability and haven't had a lockup since.

                      1 Reply Last reply Reply Quote 0
                      • B
                        BreeOge
                        last edited by

                        Here is the logs from another system, i removed the Squid+Clamav as well on this system. it locked up as well

                        
                        # /usr/sbin/swapinfo -h
                        Device          1K-blocks     Used    Avail Capacity
                        /dev/gptid/d2a5a9dd-7e41-11e7-b   3684016       0B     3.5G     0%
                        
                        
                        
                        # /usr/bin/top | /usr/bin/head -n7
                        last pid: 33447;  load averages:  0.08,  0.10,  0.08  up 0+12:43:35    12:46:18
                        113 processes: 1 running, 110 sleeping, 2 stopped
                        
                        Mem: 689M Active, 1655M Inact, 945M Wired, 650M Buf, 4548M Free
                        Swap: 3598M Total, 3598M Free
                        
                        
                        
                        # /usr/bin/netstat -Ln
                        Current listen queue sizes (qlen/incqlen/maxqlen)
                        Proto Listen                           Local Address
                        Netgraph sockets
                        Type  Recv-Q Send-Q Node Address   #Hooks
                        ctrl       0      0 [ea1]:            0
                        ctrl       0      0 [e92]:            0
                        ctrl       0      0 [e73]:            0
                        ctrl       0      0 [e6b]:            0
                        ctrl       0      0 [e66]:            0
                        ctrl       0      0 [e]:              0
                        ctrl       0      0 [5]:              0
                        unix  0/0/80                           /tmp/mysql.sock
                        unix  0/0/5                            /var/run/dpinger_WAN_DHCP~70.178.22.158~70.178.22.1.sock
                        unix  0/0/4                            /var/run/devd.pipe
                        unix  0/0/30                           /var/run/check_reload_status
                        unix  193/0/128                        /var/run/php-fpm.socket
                        unix  0/0/4                            /var/run/devd.seqpacket.pipe
                        
                        

                        /usr/bin/netstat -xn = See file attached.

                        netstat-xn.txt

                        1 Reply Last reply Reply Quote 0
                        • jimpJ
                          jimp Rebel Alliance Developer Netgate
                          last edited by

                          @BreeOge:

                          
                          # /usr/bin/netstat -Ln
                          unix  193/0/128                        /var/run/php-fpm.socket
                          
                          
                          
                          [...]
                          tcp4       0      0 127.0.0.1.8081         192.168.16.73.43834         0      0      0      0  65700  65700      1   2048      0      0 525600 525600    0.00    0.00    0.00    0.00    0.00 3797.36
                          [...]
                          fffff8000d6e7960 stream   1116      0                0 fffff8000d617a50                0                0 /var/run/php-fpm.socket
                          fffff8000d617a50 stream      0      0                0 fffff8000d6e7960                0                0
                          [...]
                          
                          

                          So there are ~190+ things stuck doing a PHP operation, and the same number of stuck connections hitting the dnsbl daemon. The only thing pfBlocker does with lighty is run /usr/local/www/pfblockerng/www/index.php

                          So something in that file is getting stuck and making those pile up.  Probably its file lock operation, maybe something isn't giving up a lock and everything else is stuck waiting.

                          Try editing /usr/local/www/pfblockerng/www/index.php and commenting out or removing the whole "Increment DNSBL Alias Counter" block and see if it makes a difference. Keep a backup so you can put it back later if there is no change.

                          Someone should probably bring this to bbcan's attention in the meantime.

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • B
                            BreeOge
                            last edited by

                            Done, let ya know if it crashes again.

                            Thank you for helping resolve this issue.  Me and many of the people here thank you for your time on this.

                            1 Reply Last reply Reply Quote 0
                            • M
                              MaxPF
                              last edited by

                              @jimp:

                              For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.

                              pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.

                              The problem with that, at least in my case, is that the console is unusable either locally or remotely. There is no menu, just a black screen and no matter what command I try nothing happens. Even CTRL+C just show ^C.

                              Running 2.4 Release now. See how it goes.

                              1 Reply Last reply Reply Quote 0
                              • jimpJ
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                @MaxPF:

                                The problem with that, at least in my case, is that the console is unusable either locally or remotely. There is no menu, just a black screen and no matter what command I try nothing happens.

                                Try Ctrl-Z and then run /bin/tcsh

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • john_galtJ
                                  john_galt
                                  last edited by

                                  BreeOge,

                                  Could you post the edits you made to the index.php file please? I found it but am unsure what to comment out.

                                  Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.

                                  Doug

                                  Doug

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    @john_galt:

                                    Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.

                                    Knowing what we know now, that is not surprising. The kern.ipc.soacceptqueue tunable is only for TCP, and this is a unix socket queue overflowing. There isn't a tunable for that, IIRC it's set by whatever sets up the socket (php-fpm in this case). But increasing that wouldn't solve the problem, only hide it longer.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • B
                                      BreeOge
                                      last edited by

                                      @john_galt:

                                      BreeOge,

                                      Could you post the edits you made to the index.php file please? I found it but am unsure what to comment out.

                                      Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.

                                      Doug

                                      The file is at this location

                                      /usr/local/www/pfblockerng/www/index.php

                                      cd /usr/local/www/pfblockerng/www/
                                      
                                      cp index.php index.old = do this so you have a copy of the original before you remove the section.
                                      

                                      Edit index.php with your favorite editor, and remove this section at the bottom.

                                      if (!empty($pfb_query)) {
                                      	// Increment DNSBL Alias Counter
                                      	$dnsbl_info = '/var/db/pfblockerng/dnsbl_info';
                                      	if (($handle = @fopen("{$dnsbl_info}", 'r')) !== FALSE) {
                                      		flock($handle, LOCK_EX);
                                      		$pfb_output = @fopen("{$dnsbl_info}.bk", 'w');
                                      		flock($pfb_output, LOCK_EX);
                                      		// Find line with corresponding DNSBL Aliasname
                                      		while (($line = @fgetcsv($handle)) !== FALSE) {
                                      			if ($line[0] == $pfb_query) {
                                      				$line[3] += 1;
                                      			}
                                      			@fputcsv($pfb_output, $line);
                                      		}
                                      		@fclose($pfb_output);
                                      		@fclose($handle);
                                      		@rename("{$dnsbl_info}.bk", "{$dnsbl_info}");
                                      	}
                                      }
                                      
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • john_galtJ
                                        john_galt
                                        last edited by

                                        Thanks BreeOge

                                        Doug

                                        1 Reply Last reply Reply Quote 0
                                        • ?
                                          A Former User
                                          last edited by

                                          @AhnHEL:

                                          I have one box using the ZFS file structure, the other is using UFS, both using pfBlockerNG.  The ZFS is rock solid, and the UFS one gets the Bad Gateway after some time.  Wondering if that is a possible reason why two similar boxes with similar settings exhibit different behavior using the same snapshot and same packages.

                                          Both running 20171009 Snapshots for 2.4.0

                                          Just a thought

                                          It would seem ZFS and pfBlockerNG play more nicely than UFS filesystem; before the jump to 2.4.0 Release. Reinstalled under ZFS and uploaded my configuration file from just before I performed the reinstalled and it's been running solid on both of my boxes that were affected. Normally it would last 20 minutes before I got the gateway error but not an error in the logs in sight. Previously I saw the line that stated "Listen queue overflow: 193 already in queue awaiting acceptance (x occurrences)".

                                          1 Reply Last reply Reply Quote 0
                                          • AhnHELA
                                            AhnHEL
                                            last edited by

                                            Just reinstalled myself changing from UFS to ZFS filesystem, using the same 20171009 snapshot.  Wouldn't last ten minutes before, but has been up without error for 24 hours now.

                                            Never used Squid or ClamAV.  Only using pfBlockerNG.

                                            AhnHEL (Angel)

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.