502 Bad Gateway (nginx) after Update to 2.3



  • Hello!

    First at all the new Design since Ver. 2.3 looks great!

    But since i update to this Version i have the Error "502 Bad Gateway (nginx) in the Browser if i try to connect to the Firewalls GUI. This error comes about after 3-5 days. A restart solve the problem. But this is not the best way!

    Can anybody help me how to fix this problem.

    THX



  • Same here.

    And now after the upgrade I get some strange errors on the webinterface (see attachment).
    Already tried to reinstall the systempackages with Diagnostics > Backup & Restore > Reinstall Packages but it seems it's not functioning too.




  • Also experiencing the 'Bad Gateway' issue since the update.

    Occurring here almost daily and having to repeatedly reboot the system to clear it is not an acceptable option.



  • There are several threads about error 502/504 . For example, https://forum.pfsense.org/index.php?topic=110116.0



  • You don't have to reboot, option 16 at the console will bring it back.

    Not that that's acceptable, we're working on fixing any issues there. The one most people seem to be encountering is:
    https://redmine.pfsense.org/issues/6177



  • Thank you for this info.

    Is it possible to force the option 16 ofer ssh?

    When i log in my pfsense over SSH i just see the consol input window without any options to choose (1,2,3,…)




  • @billyjp:

    Thank you for this info.

    Is it possible to force the option 16 ofer ssh?

    When i log in my pfsense over SSH i just see the consol input window without any options to choose (1,2,3,…)

    service nginx restart
    


  • I had the same problem on clean install of 2.3. Restarting PHP-FPM and webConfigurator didn't help, had to reboot.



  • @billyjp:

    Is it possible to force the option 16 ofer ssh?

    /etc/rc.php-fpm_restart

    @Jailer:

    service nginx restart

    might give you: "Cannot 'restart' nginx. Set nginx_enable to YES in /etc/rc.conf or use 'onerestart' instead of 'restart'."

    And "rc.restart_webgui" gives you option 11 "Restart webConfigurator" from console.



  • service commands don't do anything, don't run them. If you don't get the console menu, you're not logging in as root. You'll need the sudo package, then run sudo /etc/rc.initial



  • You learn something new every day.
    Thanks!



  • Disabling the automatic dashboard auto-update check at /system_update_settings.php seems to mitigate the issue for now.

    edit: i'm getting more and more the feeling that the issue is somehow ipv6 related..  still investigating.



  • Thanks for this tipp.

    Can anyone approve this? I dont want to change to much important on my Main Firewall. But the nginx error comes no every 3 days  :-[



  • Just want to chime in since I feel the 2.3.1 release is imminent

    I got this 502 Bad Gateway nginx error for the first time today.  Had never seen it before.  I am actually running a 2.3.2 snapshot according to the dash, based on 2.3.1.a.20160516.0651.  I have been tracking the dev branch - Not sure how I landed on 2.3.2 but I assume that whatever fix was in for this should have been in there.  So I don't believe that this is fixed. yet.

    I do have the auto-update check enabled on my dashboard.

    I did leave the dashboard page "up" in a browser window overnight so I guess it was sitting there for a long time refreshing every so often.

    I do have the IPSEC dashboard widget turned on, with 2 tunnels that are both "UP"

    This is on an SG2440

    My logs were filled to the brim with the error below:

    nginx: 2016/05/18 07:54:28 [error] 50536#0: *6944 connect() to unix:/var/run/php-fpm.socket failed (61: Connection refused) while connecting to upstream, client: 2604:2000:xxxx:xx::116e, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket:", host: "r1.xxx.xxx:8888"
    
    

    I run my webconfigurator on HTTP port 8888

    Not sure if that gives any clues

    Screenshot below



  • So after having this issue multiple times a day I dug and dug and dug and found the thing that fixed it for me. This will apply to you if, like me you had been leaving the main firewal page open with the widgets. This error appears to be caused by a widget. In my case I had to close my OpenVPN status widget and it stopped killing PHP and apparently that kills the whole thing until you reboot. This may or may not apply to you but it's easy to test and not a terrible workaround for now.



  • I think the IPSec widget also causes this.  Not sure why that is.  Interesting info- not really a fix but it's a workaround.  Sadly, at least for me those 2 widgets are among the most useful ones to see at a glance.



  • Make sure to retry all the widgets after upgrading to 2.3.1-RELEASE.
    Then this can move forward if there are more issues reported.
    The problems are related to the widgets doing updates every 10 seconds or so, asking the nginx server to do stuff. nginx gives the PHP to back-end PHP processes to do the work. If those things get delayed (or hung) too much then all the PHP processes get busy and nginx will have to give up.



  • Got the 502 Bad Gateway error with 2.3.1  :( :( :(



  • Has anyone been able to figure out what is happening with this.  I get this every couple of days and the only resolution is to take the 20 minute walk to physically restart the firewall. 
    I can log in using ssh but the majority of the commands either give an error on execution or don't work.  When I log in I don't get a menu.  I start the menu with /etc/rc.initial
    After that if I use options 11 or 16 I get errors
    Restarting webConfigurator…Error: cannot open /var/etc/nginx-webConfigurator.conf in system_generate_nginx_config().

    Fatal error: Call to undefined function pfSense_interface_listget() in /etc/inc/interfaces.inc on line 80
    PHP ERROR: Type: 1, File: /etc/inc/interfaces.inc, Line: 80, Message: Call to undefined function pfSense_interface_listget()
    Fatal error: Call to undefined function gettext() in /etc/inc/rrd.inc on line 60
    PHP ERROR: Type: 1, File: /etc/inc/rrd.inc, Line: 60, Message: Call to undefined function gettext()

    Killing php-fpm
    pkill: signalling pid 737: Operation not permitted
    /etc/rc.php-fpm_restart: cannot create /tmp/php_errors.txt: Permission denied

    Found XMLRPC lock. Removing.
    rm: /tmp/xmlrpc.lock: Operation not permitted

    Starting php-fpm
    [ERROR] unable to bind listening socket for address '/var/run/php-fpm.socket': Address already in use (48)
    [ERROR] FPM initialization failed

    If I use option 5 to try to restart the system I get the prompt asking me to continue but the system does not reboot.

    I have also tried /etc/rc.initial.reboot
    Again I get the prompt asking to proceed but the system does not reboot.

    This is costing me too much time.



  • Just upgrade to 2.3.1_1 which was released today.



  • I had the same issue even with 2.3.1-1. I dont have the openvpn widget open. I shut down the PFblockerNG and Snort widgets to see if they are the culprit. I am left with system information and interfaces.



  • since upgrading from 2.2 to 2.3.1 I've been getting "502 Bad Gateway" error at least once a week which I correct by starting PHP-FPM, but this is getting very annoying especially because it causes some of my sessions to drop, does any one know of any permanent fix?

    2.3.1-RELEASE-p1
    IPsec
    OPenVPN
    PfBlockerNG




  • Same here. Will try tomorrow without the ipsec widget for reference.



  • I'm also seeing the 502 Bad Gateway error. I'm running 2.3.1-RELEASE-p1 (i386 nanobsd) as a direct update from v 2.2.6, hardware is a Soekris net6501. Installed packages are Network UPS Tools v2.3.0 and openvpn-client-export v1.3.8.

    When it hangs up I log in via SSH and choose the 16) Restart PHP-FPM item from the text interface. As per others on this and other threads I have removed the IPsec widget from the dashboard to see if that helps.



  • I have received the 502 Bad Gateway error after upgrading to 2.3.1 Release.
    I have the IPSec widget open. I will have to restart the firewall after working hours today, and disable the widget and see if that solves anything on our end.



  • I am still getting this error even after upgrading to 2.3.1_1
    It has happened a couple of times now.  I have also noticed that when it happens and I SSH in to the CLI I have to run sudo rc.initial to get the menu up.  (If I don't run sudo none of the commands will work).  I can then reset PHP and get access.



  • @gordc:

    I am still getting this error even after upgrading to 2.3.1_1
    It has happened a couple of times now.  I have also noticed that when it happens and I SSH in to the CLI I have to run sudo rc.initial to get the menu up.  (If I don't run sudo none of the commands will work).  I can then reset PHP and get access.

    If you SSH and login as an ordinary user (not root), then the menu is not displayed - that is normal. As you say, you have to sudo (to become root) and run the rc.initial script (the menu).



  • OK.  But how about the 502 error.  It was my understanding that 2.3.1_1 was supposed to fix that problem



  • @gordc:

    OK.  But how about the 502 error.  It was my understanding that 2.3.1_1 was supposed to fix that problem

    I believe there are still possibly some cases where the IPsec widget is doing back-end requests, those hang (or take a long time) and make all the PHP processes busy.
    If you have the IPsec widget enabled on the dashboard, then remove it. Report back if that stops the problem.



  • @hekmel:

    I have received the 502 Bad Gateway error after upgrading to 2.3.1 Release.
    I have the IPSec widget open. I will have to restart the firewall after working hours today, and disable the widget and see if that solves anything on our end.

    After restart I have disabled the IPsec widget and the error has stayed away. Lets hope it continues to stay away



  • Getting 500 error here
    https://help.comodo.com



  • I haven't seen this error before today - here's the background.  My old NetGate Alix box died and I replaced it with a new box and installed 2.3.1-RELEASE-P5 with the WAN port connected to my office LAN, installed AutoConfigBackup and pulled the old config file off the server.  I setup the new interfaces and had no problems at all - there are no other packages installed, no VPN etc - it's a basic, single WAN firewall with a few custom rules and two separate LANs - I've been running on a 10 year old Alix so nothing fancy at all.  Everything went really smoothly - until I took it home and installed it.

    For some reason (probably a different MAC address) the firewall is not pulling a DHCP address from the to the COX cable modem - I was able to log in just fast enough to see that once, but otherwise - I'd guess 95% of the time - I get the 502 Bad Gateway (nginx) error message when I try to access the GUI via the LAN with the cable modem connected.  The error goes away if I reboot with the WAN disconnected, I can access the LAN interface if I disconnect the cable modem, so I wonder if the problem is related to something in the firewall seeing the WAN port "up" but not actually passing any data.



  • @edmund Perhaps the cable modem is giving a (private) IP address/CIDR that matches/overlaps with the LAN subnet?

    Although I realize that if you are using an old config from the Alix that was working, that should not have been the case.



  • @phil.davis:

    @edmund Perhaps the cable modem is giving a (private) IP address/CIDR that matches/overlaps with the LAN subnet?

    Although I realize that if you are using an old config from the Alix that was working, that should not have been the case.

    My experience with cable modems has been that a DHCP request appears to cause them to serve the assigned IP address if the requesting device has a MAC address recognized by the modem.  That was they way that it appeared to be working previously with pfSense displaying the actually cable company IP address in the WAN status.

    I suspect that this is just a configuration issue - what I found interesting here is that I'm getting the 502 bad gateway error (to be expected since the WAN was not serving an address) and it's causing me to be locked out of pfSense until I disconnect the WAN.

    I think that my next step is to return the new box to the factory configuration and set it up again from scratch to ensure that there are no Alix specific switches in effect.



  • I did a factory default reset and started the setup again.  Something seems to be very wrong with pfSense - I'm seeing an average CPU utilization of 25% with no traffic on a 4 core box with two cores running at 100% - see the attached picture.  The WAN gateway appears to be dropping up to 80% of the packets - yet switching from the pfSense box to a Linksys router gives my about 70M/10M on a speedtest - it's not the modem or connection that's causing the problem.

    With this new setup I've completely disabled IPv6 (at least as far as I can tell) and the 502/504 Bad Gateway messages have stopped although pfSense still shows the gateway as down on the widget.  Also unbound crashes a lot - you can see each CPU running it's own copy of unbound - is that normal?

    After four hours with no progress I think it's probably time to wipe the disk and start again from scratch.




  • I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems.  The rest of my issues here probably stem from my futile attempts to "fix" the hardware problem with changes to the software settings.

    The lesson is - just because it's got four pairs doesn't make it a CAT6 cable.



  • Experiencing the same problem as the OP.

    I'm in the process of setting-up a brand-new pfSense firewall. I have two IBM x3550 servers in an HA configuration. New install using 2.3.2. All my interfaces, except the SYNC interface, are VLAN interfaces.

    Almost immediately I started encountering the "502 Bad Gateway (nginx)" error in my web browser. The pattern I've seen is it's always preceded by changes to interfaces, and before it locks-up fully with the 502, I consistently get a crash report with numerous errors like "PHP Fatal error:  Call to undefined function pfSense_interface_listget() in /etc/inc/interfaces.inc on line 80".

    The PHP error mostly happens on the backup node after I make changes to the primary node and it syncs to the backup. The primary node produces the PHP errors less often, and locks-up with the 502 error very rarely.

    I'm curious: is this problem recognized and fixed in version 2.3.3?

    Thanks!



  • I am also hitting this issue I think. Running 2.3.2; now getting 502. Restarting PHP-FPM did not resolve this. Will reboot tonight, disable IPSec widget and report back.

    Having a hanging check_reload_status pocess:

    
      PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
      293 root          1 123   20 31176K 15508K CPU1    1 131:47 100.00% check_reload_status
    
    

    Update
    After forcefully terminating check_reload_status I could salvage the web GUI, though many services seem to be in a broken state

    Update 2
    Rebooted the firewall. One of our VDSL modems died witch caused a lot of resyncs. Swapped it for a good one. Might be related?

    @edmund:

    I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems.



  • Adding to the pain.  Tonight my cable modem went up and down a few times, and pfsense went nutty :) I have gateway monitoring disabled on all interfaces. In the logs I see

    • The link go down

    • check_reload_status kick off

    • Reloading filter

    • link come back up

    • xinet Starting reconfiguration

    • rc.newwanip

    Then and error:

    
    Oct  5 01:45:22 pfSense php-cgi: rc.banner: PHP ERROR: Type: 1, File: /etc/inc/interfaces.inc, Line: 80, Message: Call to undefined function pfSense_interface_listget()
    
    

    Next

    • The cable modem sets a default IP 192.168.100.20 which kicked off  check_reload_status: Reloading filter

    • xinet Starting reconfiguration

    • rc.newwanip which in turn kicking off

    • Dynamic DNS and OpenVPN that errored out because it did not have a public IP yet.

    • WAN receives a public IP and the services start, if I wait the GUI will come back, or restarting the web and php-fpm services via the SSH menu options 11 and 16

    The "Starting reconfiguration process" happened 16 between 1:44:58 and 01:58:20, then it received a public IP.  During the cycle check_reload_status was at 100% CPU on a PCEngines APU2 Quad Core, I could not get in via GUI only SSH.  The above error happened 16 times, each time it cycled.

    Unless I missed an option did not see a way to delay the Starting reconfiguration process, looking at the source to see what I can find.

    Comments?

    Tony



  • Came here to cry about the same problem.  2.3.2_1.  I have the OpenVPN widget on my dash but not the IPSEC widget.  Is there something I can patch manually to stop this for now?