New 502 Bad Gateway



  • I really appreciate the help doktornotor, and I understand where your coming from and why your giving me the solutions you are suggesting.  I guess I have not explained it well enough though.  I understand that its being queued with ingress connections, but the problem I have is why does it only happen when I updated on the 3rd to the newest 2.4.0 version and not before, system load has not changed, actually if anything system load is currently low.  Problem only showed up on 2.4.1 and on 2.4.0 after Tuesdays update.  Before that I never had an issue with Overloaded ingress connections even with pfblocker installed.  I have had the system with a heaver load than it currently has and never received these errors.  I have tried the settings you suggested mine are currently

    kern.ipc.somaxconn=4096

    and

    backlog= 4096

    and I still have the issue.

    However when I removed pfblocker the issue stopped, but its never been an issue before, and like I said its actually low load at the current time.  It only raised its head after the 2.4.0 update on the 3rd or on the 2.4.1 updates.  Before then even on the 2.4.0 I had no issues even during high loads.  That is why I posted in the 2.4 Development Snapshots, because it only showed up after an update, something changed to cause the problem.  All I did was update to newest version, no load change or anything else. Update then issue showed up.  Again I do appreciate your help, but I guess I will have to wait till more are having the issue.  I know they will have issues, I have 2 boxes set up, one only has one to two users at any given time so super low load and even it started after the update.

    Thank you for your assistance. Sorry for taking your time.

    BreeOge



  • I'm seeing the same thing - 502 errors. I'm running three sites all of them worked perfect until 2.4.0-RC changed from bsd 11.0 to bsd 11.1 (a few days ago)
    Strange thing it's only one site that have problems. The problem site is the site with most users/traffic.

    This morning I removed pfBlocker on the problem site and so far everything good. It's to much of a coincidence that the problem started when the move was made from bsd 11 to 11.1 a couple of days ago.

    I had to remove squid on all three sites (XMLRPC Sync stopped working giving a constant flow of errors) and now Pfblocker is removed on one site after the introduktion of bsd 11.1.



  • Yea, it happens more on the one that has higher traffic, but if you leave it even on the low traffic ones they will eventually get it as well, just takes longer. Its like its not releasing something, and keeps building till it locks up..

    I removed all my PFblockers too.. Interesting side note as well

    I have 3 systems, 2 of them have Pfblocker, squid, and squidguard installed. Those are the ones getting the 502.  I have another one set up, with only Squid, and Pfblocker, and Squidguard disabled.  It has yet to get a 502 Bad gateway.  It seems to be something with the 11.1 and either Pfblocker or Squidgard or all working together, but if you  have Squid and Squidguard and no pfblocker it does not happen, same goes with Squid and Pfblocker with no Squidguard.  Its only when all 3 are present.

    Seanr22a do you have those 3 installed on all your systems or a mix?



  • @BreeOge:

    Seanr22a do you have those 3 installed on all your systems or a mix?

    I have two SG-2440 and one SG-8860. They all have Snort, Squid and Pfblocker. The configuration is identical at these sites so I use XMLRPC Sync in all three packages.
    At the two sites still running Pfblocker I have moved the IP blocklists  to Snort and only using DNSbl in the Pfblocker package. I will wait a couple of days if its still fine I will install Pfblocker using only DNSbl at the main site as well and give it a try. My problem with squid is the XMLRPC Sync but all this started to happen with bsd 11.1

    To me it looks like it's running out of resources of some kind. I have a lot memory and diskspace available when it happens so its something else.



  • I have a lot memory and diskspace available when it happens so its something else.

    Same here..

    The release notes for 11.1 are here https://www.freebsd.org/releases/11.1R/relnotes.html#kernel

    Lots of changed for the kernel.  I suspect one of these are the cause, due to it crashed the kernel but I could be wrong on that.  Also I have no idea what would cause it.  just something to look at.

    I know it looks like the system is being overloaded like doktornotor said, but hard to believe a system that never overloaded before the update now overloads constantly with the same load or less.  Does not make sense.



  • Hmm I'm not running into any issues on my test 2.4.1 box, but to be fair, I'm not running Snort of pfBNG. Hope this doesn't turn out to be a nasty bug in the kernel that could delay 2.4.x getting out the door. Seems like every time 2.4 is about to drop something else comes up. Been 542 days since 2.3-RELEASE :)



  • @luckman212:

    Hmm I'm not running into any issues on my test 2.4.1 box, but to be fair, I'm not running Snort of pfBNG. Hope this doesn't turn out to be a nasty bug in the kernel that could delay 2.4.x getting out the door. Seems like every time 2.4 is about to drop something else comes up. Been 542 days since 2.3-RELEASE :)

    Yea it is for sure related to the 11.1 Kernal, PfblockerNG, or Squidguard.  Remove one or the other (pfblockerNG, or SquidGuard) and issue does not exist.

    Did a test, I had one system working for over 24 hours, since I removed PfblockeNG and no 502, put it back in and in less than 2 hours, crash.

    I have one system with PfblockerNG installed, and squidguard disabled, no issues at all.  Enabled Squidguard, and crash in less than 2 hours.

    So the issue lies between PfblockerNG, Squidguard and the Kernal. I would suspect, but i am no expert.



  • Started having the same issue since the 2.4RC switch from 11.0 to 11.1. I am running pfBlocker.



  • @MaxPF:

    Started having the same issue since the 2.4RC switch from 11.0 to 11.1. I am running pfBlocker.

    Do you have Squidguard also?



  • @BreeOge:

    @MaxPF:

    Started having the same issue since the 2.4RC switch from 11.0 to 11.1. I am running pfBlocker.

    Do you have Squidguard also?

    No, I don't have squidguard. It hasn't happened yet with 2.4.0.r.20171005.0827.



  • My system that has a SUPER low load finally 502 Bad gateway.. So that rules out anything to do with SquidGuard, it is an issue with Pfblocker, and 11.1 kernal.



  • @BreeOge:

    My system that has a SUPER low load finally 502 Bad gateway.. So that rules out anything to do with SquidGuard, it is an issue with Pfblocker, and 11.1 kernal.

    I got my 502 problems sorted out but not in the way I want …

    FW1: SG-2440 4Gb memory, 20Gb disk running 2.4.0.r.20171006.2203 - Snort, PFblocker, Squid
    FW2: SG-2440 4Gb memory, 20Gb disk running 2.4.0.r.20171006.2203 - Snort, PFblocker, Squid
    FW3: SG-8660 8Gb memory, 40Gb disk running 2.3.4-RELEASE-p1  - Snort, PFblocker, Squid

    The above setup works perfect, no 502 errors. I want to run the SG-8660 on 2.4 as well but it simply don't work. I get 502 errors and after a while it stops to respond completely. I don't get any 502 errors on the SG-2440 running 2.4 and they have been up since October 6. Note that they run the same packages.

    The SG-8660 have a 8-core CPU, more memory and disk but can't run 2.4 with these packages. This firewall have lot more users/traffic during the weeks but I did a test this weekend when the office is closed so no users/traffic - it just sits idle. It takes 3-4 hours and I get 502 error and 1-2 hours later it stops respond completely.

    Now i reinstalled it with 2.3.4P1 and everything have been fine for two days :)

    Only bad thing is that I use XMLRPC Sync for all packages between these firewalls (all run identical setup), one of the SG-2440 is the master and the other ones slaves. That was working with all FW on 2.4 but now there is a mix of 2.3.4 and 2.4 and XMLRPC Sync don't seem to be able to handle that.


    Communications error occurred

    Exception calling XMLRPC method merge_installedpackages_section #1 : Unknown method @ 2017-10-10 08:54:31
    Exception calling XMLRPC method merge_installedpackages_section #1 : Unknown method @ 2017-10-10 08:54:32
    Exception calling XMLRPC method exec_php #3 : Incorrect parameters passed to method: Signature permits 2 parameters but the request had 1 @ 2017-10-10 08:54:33
    Exception calling XMLRPC method exec_php #3 : Incorrect parameters passed to method: Signature permits 2 parameters but the request had 1 @ 2017-10-10 08:54:34

    [EDIT]
    Forgot to write that I get the same sync error no matter if it's snort, PFblocker or squid - they all give the same error message so I don't think it have to do with the packages.



  • Its something with Pfblocker and the new 2.4.0 & 2.4.1.  2.3 works fine.  I  can uninstall pfblockerng and my systems stay up for days, weeks, even longer.. if I put pfblockerng in.. it locks in less than 12 hours.. or less.  The 502 is just a symptom of the issue, but its the symptom you notice first.  Nginix is not directly related to the issue, its the system locking up internally, and not executing commands.  Causes everything to go nuts.



  • Strange thing is that I run PFblocker on the SG-2440 boxes without issues. They have been up since I installed the 1006.2203 release …. 4 days now. Same install on the SG-8660 locks up in a couple of hours.  ???



  • Just happened again. Here are my symptoms:

    • Web GUI inaccessible
    • ssh and local console do no show the menu and do not accept any input
    • external openvpn clients cannot connect (I have an inboud rule that uses an country alias created in pfblocker)

    Everything else keeps working, internet access, outbout vpn tunnels, port forward from outside to internal servers, DNS resolver, FW rules, vlan routing, etc.

    I just updated to the latest build and I hope this gets fixed in the final release.



  • I was seeing the same thing, I upgraded to the October 9th snapshot, and did a "pkg upgrade -f" at the shell prompt and all is well again.

    Statement retracted.

    Took a day but yes, eventually got the bad gateway again.



  • @seanr22a:

    Strange thing is that I run PFblocker on the SG-2440 boxes without issues. They have been up since I installed the 1006.2203 release …. 4 days now. Same install on the SG-8660 locks up in a couple of hours.  ???

    On the SG-2440 boxes did you do a fresh install or update, and vise versa for the SG-6880?

    @AhnHEL:

    I was seeing the same thing, I upgraded to the October 9th snapshot, and did a "pkg upgrade -f" at the shell prompt and all is well again.

    I am on the 20171009.1758 and still having issues. Depending on your load will depend on how fast it takes, I have one that can run for 4 days before it locks, but I have another than locks in less than 12 hours.  I have also done the "pkg upgrade -f" and "pkg update -f" on all systems, to rule out maybe a bad file.  Still persists.

    I hate to say it, but I am glad others are having issues.  I knew I wasn't crazy like what was suggested.  ::)



  • Hi besides using pfBlocker, do (some of?) you guys also use xmlrpc feature?
    Can you check that there are 4 active php-fpm processes 'running' ?

    I found that if i sync pfSense settings several times in a period of a few seconds then sometimes the sync process bails out as soon as it needs to 'print' some output, adding ignore_user_abort and setting no time limit should allow it to always finish and remove the placed lock file..

    Maybe some of you can try if this patch can solve/reduce some of these issues?: https://github.com/pfsense/pfsense/pull/3848
    To apply it, either add the 2 lines manually to the changed file /usr/local/www/xmlrpc.php or apply it through patches package..



  • @PiBa:

    Hi besides using pfBlocker, do (some of?) you guys also use xmlrpc feature?
    Can you check that there are 4 active php-fpm processes 'running' ?

    I found that if i sync pfSense settings several times in a period of a few seconds then sometimes the sync process bails out as soon as it needs to 'print' some output, adding ignore_user_abort and setting no time limit should allow it to always finish and remove the placed lock file..

    Maybe some of you can try if this patch can solve/reduce some of these issues?: https://github.com/pfsense/pfsense/pull/3848
    To apply it, either add the 2 lines manually to the changed file /usr/local/www/xmlrpc.php or apply it through patches package..

    I dont use the xmlrpc feature on any of my boxes..  I don't know if the other guys do, they may..



  • Having the same issue here. I've even tried using "pkg upgrade -f" in the shell but nothing. I can sign in as a user an execute commands at user level no problem but as admin I'll get stuck with the command screen unable to load as everyone else has a problem with. Shutdown won't even commence and I have to do a forced shutdown in order to restart. I'm on the latest build on my home box but I have another box with this issue running on the weekend release. Removed squid/squidguard, I'm also not using xmlrpc feature but am using pfBlockerNG.



  • @BreeOge:

    On the SG-2440 boxes did you do a fresh install or update, and vise versa for the SG-6880?

    All three came from 2.3.4-RELEASE-p1. I de-installed all packages on all three, upgraded to 2.4 and installed the packages again. I only have 502 problems with the SG-8860. Now when the 8660 is downgraded to 2.3.4-RELEASE-p1 I have problems with the XMLRPC Sync because of the mixed 2.4 and 2.3.4 setup.


  • Banned




  • Rebel Alliance Developer Netgate

    The bug report and the thread here are bit unclear and contradictory now.

    Is this happening on current 2.4.0 snapshots or only 2.4.1?

    What error messages are in the system log, if any, when it happens?

    Does it only happen with pfBlocker installed, or is there some other way to trigger it? Captive Portal, perhaps? Does it require using a specific pfBlocker feature like DNSBL?

    If it's the max connections error, increasing kern.ipc.soacceptqueue should fix it (somaxconn no longer exists, it has a convenience alias but using it is quirky)



  • @jimp:

    The bug report and the thread here are bit unclear and contradictory now.

    Is this happening on current 2.4.0 snapshots or only 2.4.1?

    What error messages are in the system log, if any, when it happens?

    Does it only happen with pfBlocker installed, or is there some other way to trigger it? Captive Portal, perhaps? Does it require using a specific pfBlocker feature like DNSBL?

    If it's the max connections error, increasing kern.ipc.soacceptqueue should fix it (somaxconn no longer exists, it has a convenience alias but using it is quirky)

    Its happening on 2.4.0 after the Oct 3 update or around that. it happened on 2.4.1 before that.  I was on 2.4.1 and it started happening, so I reloaded to 2.4.0 and issue was resolved till after the Oct 3 update then it showed its head there as well.  Others on here are still on the 2.4.1 and it is still present as well.  It effects both 2.4.0 and 2.4.1. From what I can tell it only happens if pfBlockerNG is installed,  The 502 Bad gateway is a result of the processes not finishing.  It happens on both my systems with captive portal and without captive portal.  So CP has nothing to do with it.  As for features of pfBlocker, I have tested a bit with that, but nothing conclusive.

    I have kern.ipc.soacceptqueue set at 4096 and it still happens, Also it happens even on systems that have low connections.  I have one system I use just for home, with a super low connection rate, and it happens as well, takes longer but still will eventually get the error.

    I have searched the system logs, and it shows nothing about whats causing the error itself.  It just starts filling up with the Connections errors.  But when you SSH in, it will not load the gui, you have to CTRL-Z to get a prompt.  Once you have a prompt, you can do Nano, or VI but that is about the extent of it. It will not show the directory you are in, and if you try to restart PHP it will lock up on restart and you have to CTRL-Z again.  I have found that you can surf the net just fine when its happening, however SOME pages will fail, but not all.  In general its still good to go, but it really affects the web Gui, and any CP due to it locks up the internal system.  But Lan, Wan still seem to operate just fine for the most part.

    If you need any more information please let me know. I have one system that will lock up in less than 12 hours after I enable pfBlocker,  I have one that takes up to 4 days, and another between 2 to 3 days.

    Thank you
    BreeOge


  • Rebel Alliance Developer Netgate

    I need specifics. Exact error messages, a list of features enabled in pfBlocker, etc. It's nice to know that it appears isolated there, so the impact isn't too large, but there isn't enough to go by here yet.



  • Hi,

    I'm having the same problem. I'd be glad to help but not sure where to look. The only error I get is when trying to connect to the GUI. The web browser appears to time out and the message "Bad Gateway nginx". If I try to use the serial console port it is also unresponsive. My only recourse is to power cycle the device.

    I have looked at the logs but to be honest I wouldn't know what to look for. Like I said I'd be glad to help if given some direction as to what to provide.

    Doug


  • Banned

    @jimp:

    Is this happening on current 2.4.0 snapshots or only 2.4.1?

    You know it'd be a whole lot easier to trace if you stopped the ridiculous "top secret" kernel commits. Annoying, stupid, plus no credits for screaming about open-source solutions and playing retarded games like this.



  • @jimp:

    I need specifics. Exact error messages, a list of features enabled in pfBlocker, etc. It's nice to know that it appears isolated there, so the impact isn't too large, but there isn't enough to go by here yet.

    Ill get that info for you sir..



  • @doktornotor:

    @jimp:

    Is this happening on current 2.4.0 snapshots or only 2.4.1?

    You know it'd be a whole lot easier to trace if you stopped the ridiculous "top secret" kernel commits. Annoying, stupid, plus no credits for screaming about open-source solutions and playing retarded games like this.

    Well, you know your comments about it being overloaded was annoying, stuipd, because it was the first thing I tried before I even posted about the issue.  But I didn't say anything negative to you about it, thought maybe I had missed something.  I hadn't but I didn't slam you for the suggestion, because all we want is to find the fix.  You want to say its got nothing to do with the kernel, yet it only happened when the kernel was updated.  So not to ridiculous as you say,  It may not be the Kernel itself, its probably something else, but that is as close as we have to go on since it effects multiple things.  You concentrate on one thing, and refuse to read what everyone else says its also doing.  It's probably a fix that can be done in pfBlockernNG to conform to 11.1.  We are trying to get to the solution, what are you doing?  Complaining and giving us old threads that we have already tried.

    Definition of insanity = doing the same thing over and over again and expecting different results.

    We appreciate you trying to help, but having the i'm better than you attitude is not helping anyone or this problem.  So in the future do not say anything is ridiculous if your not assisting in TRACING the issue.  I would be happy to do any tracing to find the issue, like me and other have stated.  Nothing in the System.log, nothing in the php log.  If you have any suggestions on other logs that could help track down the issue please post them, I will try them in a heartbeat.  But don't call anyone or anything ridiculous, that is no different than you saying you are a know it all.  All the suggestions you suggested were Kernel suggestions.  I thought you knew what you was talking about so thats why we say Kernel.

    Thank you, now lets please resolve this issue without belittling people.



  • @BreeOge:

    @jimp:

    I need specifics. Exact error messages, a list of features enabled in pfBlocker, etc. It's nice to know that it appears isolated there, so the impact isn't too large, but there isn't enough to go by here yet.

    Ill get that info for you sir..

    My config for pfBlocker

    General Default settings.
    DSNBL Default settings
    I have 2 DSNBL Feeds, mostly for ads, and Malware
    I have an IPv4 list, block Malware and torrents and sip attacks.
    No Reputation
    No Geoip locations selected

    The only Error messages we get are 502 Bad Gateway, but we can not access SSH without closing out process with CTRL-Z, and commands such as find, ect do not work. they just hang.

    Thank you

    Can someone else that is also having this problem give the pfBlocker settings, maybe we can find a common element.



  • I will when I get home.

    Doug


  • Banned

    @BreeOge:

    It's probably a fix that can be done in pfBlockernNG to conform to 11.1.  We are trying to get to the solution, what are you doing?  Complaining and giving us old threads that we have already tried.

    No, it cannot. pfBlockerNG is merely using the pf firewall, that's it. It's using lighttpd as a 1x1 px webserver, not even nginx. It's using Unbound as resolver to redirect the requests to that webserver. It's doing absolutely nothing that should cause any box to hang and become unresponsive.

    @BreeOge:

    Thank you, now lets please resolve this issue without belittling people.

    Sorry to have upset you. The only cases of the 502 Bad Gateway I (and pretty much anyone else) have seen been caused by simply not enough processes to serve the nginx/php-fpm requests (and/or exhausting the connections limit altogether). Now I'll need to get a crystal ball for cases when someone goes on secret commit spree that eventually makes a giant kaboom with pretty much every core package out there.

    The devs can perhaps assist with tracing, meanwhile I'm simply once again annoyed by the course taken here. We've had this some ~3 years ago when the repos have been taken offline altogether, accompanied by some giant noise about trademark violations. It's getting old. Waste of time. Annoying. Disrespectful to people who've been contributing to pfSense, even after that CLA/copyright assignment/re-licensing nonsense etc. People wanting to build this thing for development purposes get absolutely zero assistance whatsoever and are being deliberately sabotaged by the so-called build scripts.

    Open-source touted all over the website, and all you get is this. Getting on a tipping point again here.


  • Rebel Alliance Developer Netgate

    @BreeOge:

    DSNBL Default settings
    I have 2 DSNBL Feeds, mostly for ads, and Malware

    Approximately how many clients do you have on your local network that hit DNSBL? Any idea how busy it usually is?

    @BreeOge:

    The only Error messages we get are 502 Bad Gateway, but we can not access SSH without closing out process with CTRL-Z, and commands such as find, ect do not work. they just hang.

    Most likely you just need to start a proper shell, try running /bin/tcsh



  • @jimp:

    @BreeOge:

    DSNBL Default settings
    I have 2 DSNBL Feeds, mostly for ads, and Malware

    Approximately how many clients do you have on your local network that hit DNSBL? Any idea how busy it usually is?

    @BreeOge:

    The only Error messages we get are 502 Bad Gateway, but we can not access SSH without closing out process with CTRL-Z, and commands such as find, ect do not work. they just hang.

    Most likely you just need to start a proper shell, try running /bin/tcsh

    I have one site that has currently has

    LAN 192.168.1.175 192.168.1.199 = 5 users
    LAN 192.168.1.5 192.168.1.174 = 19 users

    This site will lock up in less than 12 hours.

    Another site

    LAN 192.168.1.41 192.168.1.50 = 3 users
    LAN 192.168.1.5 192.168.1.40 = 4 users

    It will take up to 24 hours.

    and another

    Interface Pool Start Pool End # of leases in use
    LAN 192.168.16.50 192.168.16.75 = 5 users

    This one will take up to 24 to 48 hours. However since the last update 2.4.0.r.20171009.1758 it has been happening more frequent around every 12 hours itself.

    I currently have the pfBlockerNG uninstalled on the top 2 because they are used for apartments, and didn't want to bug the tenants when it locks up the CP becomes unresponsive.  However for testing purpose I do re-install, and let it run with any changes for testing.  I can usually get a crash within 6 to 12 hours. or less sometimes.

    Just a note the amount of users, has not changed from before to after this issue started.  Load has remained the same.

    Thank you, I will test the /bin/tcsh and see if that gives me a good shell again.  I will test that now on the top 2, will have a result back after it crashes.

    Thank you, any logs you would like to see, please let me know I will post as much as I can.



  • @doktornotor:

    [
    The devs can perhaps assist with tracing, meanwhile I'm simply once again annoyed by the course taken here. We've had this some ~3 years ago when the repos have been taken offline altogether, accompanied by some giant noise about trademark violations. It's getting old. Waste of time. Annoying. Disrespectful to people who've been contributing to pfSense, even after that CLA/copyright assignment/re-licensing nonsense etc. People wanting to build this thing for development purposes get absolutely zero assistance whatsoever and are being [url=https://forum.pfsense.org/index.php?topic=109089.0]deliberately sabotaged by the so-called build scripts.

    Open-source touted all over the website, and all you get is this. Getting on a tipping point again here.

    I never knew about that, I do not have any bad intentions for this.  Sorry if I came off that way, I was not aware.


  • Rebel Alliance Developer Netgate

    If it is related to memory or a connection or network queue, then in particular the output of these could be helpful:

    /usr/bin/netstat -Ln
    /usr/bin/netstat -xn
    /usr/sbin/swapinfo -h
    /usr/bin/top | /usr/bin/head -n7
    /bin/ps uxawwd
    /usr/bin/sockstat
    
    

    Attach the output in a text file as it will be too large to put inline on a forum post.



  • I got home and my pfSense GUI and Serial Console was unresponsive. As stated earlier from a client point of view everything worked, wired and wireless connections in and outbound.

    My setup is is almost the same as BreeOge's, with even less users. Only me with a lot of gadgets. When I left for work at 6am I had just restarted and everything worked. When I got home at 4pm it was in the reported condition.

    2.4.0-RC (amd64)
    built on Mon Oct 09 17:58:12 CDT 2017
    FreeBSD 11.1-RELEASE-p1

    What else can I provide?

    Edit…. I also have openVPN with one user.

    Doug



  • I have one box using the ZFS file structure, the other is using UFS, both using pfBlockerNG.  The ZFS is rock solid, and the UFS one gets the Bad Gateway after some time.  Wondering if that is a possible reason why two similar boxes with similar settings exhibit different behavior using the same snapshot and same packages.

    Both running 20171009 Snapshots for 2.4.0

    Just a thought



  • AhnHEL

    I don't know but I had planned to do a fresh install of my UFS pfSense to ZFS this weekend and restore the same config.

    Doug


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy