New 502 Bad Gateway
-
After scrolling through all 6,000 lines of the system.log file I see several of these lines they appear to be at 1 minute intervals
Oct 12 05:43:10 pfSense kernel: sonewconn: pcb 0xfffff80008c430f0: Listen queue overflow: 193 already in queue awaiting acceptance (708 occurrences)
For that, check the current value of kern.ipc.soacceptqueue (run "sysctl kern.ipc.soacceptqueue"). It's probably at the default of 128. Set that to 1024 or higher (e.g. 4096), make a tunable for that under System > Advanced, Tunables tab.
-
jimp
Indeed it was set to 128. I added the tunable and set it to 4096 and applied. I'm doing this via openVPN connection. Do I need to reboot it?
Doug
-
No need to reboot, that is one that can be changed at run-time.
-
Okay. Will keep an eye on it and thanks.
-
Now the SG-2440 boxes stopped working as well :( it lasted 6 days before I got 502 error. Now all of them are back running 2.3.4p1 and XMLRPC Sync is working as well. I will wait for the final release before I give 2.4 a new try.
-
Same thing here this morning. I had to force a reboot. I also increased sysctl kern.ipc.soacceptqueue to 4096 as suggested. While I was at it, I updated to latest build and it now says 2.4 RELEASE.
-
For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.
pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.
-
Lots of port 1344, do you have squid+clamav active as well? Can you try shutting that off?
Also that netstat -x output is too big to put inline, you should attach that in a .txt file
Locked up again, without squid+clamav installed..
# /usr/sbin/swapinfo -h Device 1K-blocks Used Avail Capacity /dev/label/swap0 33554428 0B 32G 0% #
# /usr/bin/top | /usr/bin/head -n7 last pid: 44796; load averages: 0.10, 0.13, 0.12 up 0+14:58:53 12:32:05 88 processes: 1 running, 85 sleeping, 2 stopped Mem: 475M Active, 827M Inact, 1187M Wired, 832M Buf, 13G Free Swap: 32G Total, 32G Free
# /usr/bin/netstat -Ln Current listen queue sizes (qlen/incqlen/maxqlen) Proto Listen Local Address Netgraph sockets Type Recv-Q Send-Q Node Address #Hooks ctrl 0 0 [46b]: 0 ctrl 0 0 [468]: 0 ctrl 0 0 [44b]: 0 ctrl 0 0 [443]: 0 ctrl 0 0 [3f9]: 0 ctrl 0 0 [3b0]: 0 ctrl 0 0 [367]: 0 ctrl 0 0 [31e]: 0 ctrl 0 0 [2d5]: 0 ctrl 0 0 [28c]: 0 ctrl 0 0 [243]: 0 ctrl 0 0 [1fb]: 0 ctrl 0 0 [1b0]: 0 ctrl 0 0 [11]: 0 ctrl 0 0 [5]: 0 unix 0/0/5 /var/run/dpinger_WAN_DHCP~70.178.196.154~70.178.196.1.sock unix 0/0/5 /var/run/dpinger_Steve_Telephone~192.168.16.2~10.10.10.2.sock unix 0/0/5 /var/run/dpinger_Raymond_Telephone~192.168.16.2~10.10.12.2.sock unix 0/0/5 /var/run/dpinger_Kevin_Telephone~192.168.16.2~10.10.11.2.sock unix 0/0/5 /var/run/dpinger_Cisco_Router~192.168.16.2~192.168.16.201.sock unix 0/0/4 /var/run/devd.pipe unix 0/0/30 /var/run/check_reload_status unix 193/0/128 /var/run/php-fpm.socket unix 0/0/4 /var/run/devd.seqpacket.pipe
pastbin /usr/bin/netstat -xn
https://pastebin.com/ZFujW9Kp
-
For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.
pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.
I have never run squid+clamav and experienced it with only pfBlocker. I removed it the other day to restore usability and haven't had a lockup since.
-
Here is the logs from another system, i removed the Squid+Clamav as well on this system. it locked up as well
# /usr/sbin/swapinfo -h Device 1K-blocks Used Avail Capacity /dev/gptid/d2a5a9dd-7e41-11e7-b 3684016 0B 3.5G 0%
# /usr/bin/top | /usr/bin/head -n7 last pid: 33447; load averages: 0.08, 0.10, 0.08 up 0+12:43:35 12:46:18 113 processes: 1 running, 110 sleeping, 2 stopped Mem: 689M Active, 1655M Inact, 945M Wired, 650M Buf, 4548M Free Swap: 3598M Total, 3598M Free
# /usr/bin/netstat -Ln Current listen queue sizes (qlen/incqlen/maxqlen) Proto Listen Local Address Netgraph sockets Type Recv-Q Send-Q Node Address #Hooks ctrl 0 0 [ea1]: 0 ctrl 0 0 [e92]: 0 ctrl 0 0 [e73]: 0 ctrl 0 0 [e6b]: 0 ctrl 0 0 [e66]: 0 ctrl 0 0 [e]: 0 ctrl 0 0 [5]: 0 unix 0/0/80 /tmp/mysql.sock unix 0/0/5 /var/run/dpinger_WAN_DHCP~70.178.22.158~70.178.22.1.sock unix 0/0/4 /var/run/devd.pipe unix 0/0/30 /var/run/check_reload_status unix 193/0/128 /var/run/php-fpm.socket unix 0/0/4 /var/run/devd.seqpacket.pipe
/usr/bin/netstat -xn = See file attached.
-
# /usr/bin/netstat -Ln unix 193/0/128 /var/run/php-fpm.socket
[...] tcp4 0 0 127.0.0.1.8081 192.168.16.73.43834 0 0 0 0 65700 65700 1 2048 0 0 525600 525600 0.00 0.00 0.00 0.00 0.00 3797.36 [...] fffff8000d6e7960 stream 1116 0 0 fffff8000d617a50 0 0 /var/run/php-fpm.socket fffff8000d617a50 stream 0 0 0 fffff8000d6e7960 0 0 [...]
So there are ~190+ things stuck doing a PHP operation, and the same number of stuck connections hitting the dnsbl daemon. The only thing pfBlocker does with lighty is run /usr/local/www/pfblockerng/www/index.php
So something in that file is getting stuck and making those pile up. Probably its file lock operation, maybe something isn't giving up a lock and everything else is stuck waiting.
Try editing /usr/local/www/pfblockerng/www/index.php and commenting out or removing the whole "Increment DNSBL Alias Counter" block and see if it makes a difference. Keep a backup so you can put it back later if there is no change.
Someone should probably bring this to bbcan's attention in the meantime.
-
Done, let ya know if it crashes again.
Thank you for helping resolve this issue. Me and many of the people here thank you for your time on this.
-
For anyone still seeing the problem after updating to 2.4.0-RELEASE, please gather the info I asked for in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 before rebooting the firewall and also supply a full list of installed packages that are running.
pfBlocker is mentioned a lot, but at least in the output shown so far, squid+clamav appeared to be more likely at fault.
The problem with that, at least in my case, is that the console is unusable either locally or remotely. There is no menu, just a black screen and no matter what command I try nothing happens. Even CTRL+C just show ^C.
Running 2.4 Release now. See how it goes.
-
The problem with that, at least in my case, is that the console is unusable either locally or remotely. There is no menu, just a black screen and no matter what command I try nothing happens.
Try Ctrl-Z and then run /bin/tcsh
-
BreeOge,
Could you post the edits you made to the index.php file please? I found it but am unsure what to comment out.
Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.
Doug
-
Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.
Knowing what we know now, that is not surprising. The kern.ipc.soacceptqueue tunable is only for TCP, and this is a unix socket queue overflowing. There isn't a tunable for that, IIRC it's set by whatever sets up the socket (php-fpm in this case). But increasing that wouldn't solve the problem, only hide it longer.
-
BreeOge,
Could you post the edits you made to the index.php file please? I found it but am unsure what to comment out.
Also the changed I made to the tunable kern.ipc.soacceptqueue did not stop the crash.
Doug
The file is at this location
/usr/local/www/pfblockerng/www/index.php
cd /usr/local/www/pfblockerng/www/
cp index.php index.old = do this so you have a copy of the original before you remove the section.
Edit index.php with your favorite editor, and remove this section at the bottom.
if (!empty($pfb_query)) { // Increment DNSBL Alias Counter $dnsbl_info = '/var/db/pfblockerng/dnsbl_info'; if (($handle = @fopen("{$dnsbl_info}", 'r')) !== FALSE) { flock($handle, LOCK_EX); $pfb_output = @fopen("{$dnsbl_info}.bk", 'w'); flock($pfb_output, LOCK_EX); // Find line with corresponding DNSBL Aliasname while (($line = @fgetcsv($handle)) !== FALSE) { if ($line[0] == $pfb_query) { $line[3] += 1; } @fputcsv($pfb_output, $line); } @fclose($pfb_output); @fclose($handle); @rename("{$dnsbl_info}.bk", "{$dnsbl_info}"); } }
-
Thanks BreeOge
-
I have one box using the ZFS file structure, the other is using UFS, both using pfBlockerNG. The ZFS is rock solid, and the UFS one gets the Bad Gateway after some time. Wondering if that is a possible reason why two similar boxes with similar settings exhibit different behavior using the same snapshot and same packages.
Both running 20171009 Snapshots for 2.4.0
Just a thought
It would seem ZFS and pfBlockerNG play more nicely than UFS filesystem; before the jump to 2.4.0 Release. Reinstalled under ZFS and uploaded my configuration file from just before I performed the reinstalled and it's been running solid on both of my boxes that were affected. Normally it would last 20 minutes before I got the gateway error but not an error in the logs in sight. Previously I saw the line that stated "Listen queue overflow: 193 already in queue awaiting acceptance (x occurrences)".
-
Just reinstalled myself changing from UFS to ZFS filesystem, using the same 20171009 snapshot. Wouldn't last ten minutes before, but has been up without error for 24 hours now.
Never used Squid or ClamAV. Only using pfBlockerNG.