502 Bad Gateway (nginx) after Update to 2.3

phil.davis

I am still getting this error even after upgrading to 2.3.1_1
It has happened a couple of times now. I have also noticed that when it happens and I SSH in to the CLI I have to run sudo rc.initial to get the menu up. (If I don't run sudo none of the commands will work). I can then reset PHP and get access.

If you SSH and login as an ordinary user (not root), then the menu is not displayed - that is normal. As you say, you have to sudo (to become root) and run the rc.initial script (the menu).

gordc

OK. But how about the 502 error. It was my understanding that 2.3.1_1 was supposed to fix that problem

phil.davis

@gordc:

OK. But how about the 502 error. It was my understanding that 2.3.1_1 was supposed to fix that problem

I believe there are still possibly some cases where the IPsec widget is doing back-end requests, those hang (or take a long time) and make all the PHP processes busy.
If you have the IPsec widget enabled on the dashboard, then remove it. Report back if that stops the problem.

hekmel

@hekmel:

I have received the 502 Bad Gateway error after upgrading to 2.3.1 Release.
I have the IPSec widget open. I will have to restart the firewall after working hours today, and disable the widget and see if that solves anything on our end.

After restart I have disabled the IPsec widget and the error has stayed away. Lets hope it continues to stay away

aGeekhere

Getting 500 error here
https://help.comodo.com

edmund

I haven't seen this error before today - here's the background. My old NetGate Alix box died and I replaced it with a new box and installed 2.3.1-RELEASE-P5 with the WAN port connected to my office LAN, installed AutoConfigBackup and pulled the old config file off the server. I setup the new interfaces and had no problems at all - there are no other packages installed, no VPN etc - it's a basic, single WAN firewall with a few custom rules and two separate LANs - I've been running on a 10 year old Alix so nothing fancy at all. Everything went really smoothly - until I took it home and installed it.

For some reason (probably a different MAC address) the firewall is not pulling a DHCP address from the to the COX cable modem - I was able to log in just fast enough to see that once, but otherwise - I'd guess 95% of the time - I get the 502 Bad Gateway (nginx) error message when I try to access the GUI via the LAN with the cable modem connected. The error goes away if I reboot with the WAN disconnected, I can access the LAN interface if I disconnect the cable modem, so I wonder if the problem is related to something in the firewall seeing the WAN port "up" but not actually passing any data.

phil.davis

@edmund Perhaps the cable modem is giving a (private) IP address/CIDR that matches/overlaps with the LAN subnet?

Although I realize that if you are using an old config from the Alix that was working, that should not have been the case.

edmund

@phil.davis:

@edmund Perhaps the cable modem is giving a (private) IP address/CIDR that matches/overlaps with the LAN subnet?

Although I realize that if you are using an old config from the Alix that was working, that should not have been the case.

My experience with cable modems has been that a DHCP request appears to cause them to serve the assigned IP address if the requesting device has a MAC address recognized by the modem. That was they way that it appeared to be working previously with pfSense displaying the actually cable company IP address in the WAN status.

I suspect that this is just a configuration issue - what I found interesting here is that I'm getting the 502 bad gateway error (to be expected since the WAN was not serving an address) and it's causing me to be locked out of pfSense until I disconnect the WAN.

I think that my next step is to return the new box to the factory configuration and set it up again from scratch to ensure that there are no Alix specific switches in effect.

edmund

I did a factory default reset and started the setup again. Something seems to be very wrong with pfSense - I'm seeing an average CPU utilization of 25% with no traffic on a 4 core box with two cores running at 100% - see the attached picture. The WAN gateway appears to be dropping up to 80% of the packets - yet switching from the pfSense box to a Linksys router gives my about 70M/10M on a speedtest - it's not the modem or connection that's causing the problem.

With this new setup I've completely disabled IPv6 (at least as far as I can tell) and the 502/504 Bad Gateway messages have stopped although pfSense still shows the gateway as down on the widget. Also unbound crashes a lot - you can see each CPU running it's own copy of unbound - is that normal?

After four hours with no progress I think it's probably time to wipe the disk and start again from scratch.

Capture.PNG_thumb

edmund

I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems. The rest of my issues here probably stem from my futile attempts to "fix" the hardware problem with changes to the software settings.

The lesson is - just because it's got four pairs doesn't make it a CAT6 cable.

marklar

Experiencing the same problem as the OP.

I'm in the process of setting-up a brand-new pfSense firewall. I have two IBM x3550 servers in an HA configuration. New install using 2.3.2. All my interfaces, except the SYNC interface, are VLAN interfaces.

Almost immediately I started encountering the "502 Bad Gateway (nginx)" error in my web browser. The pattern I've seen is it's always preceded by changes to interfaces, and before it locks-up fully with the 502, I consistently get a crash report with numerous errors like "PHP Fatal error: Call to undefined function pfSense_interface_listget() in /etc/inc/interfaces.inc on line 80".

The PHP error mostly happens on the backup node after I make changes to the primary node and it syncs to the backup. The primary node produces the PHP errors less often, and locks-up with the 502 error very rarely.

I'm curious: is this problem recognized and fixed in version 2.3.3?

Thanks!

helge000

I am also hitting this issue I think. Running 2.3.2; now getting 502. Restarting PHP-FPM did not resolve this. Will reboot tonight, disable IPSec widget and report back.

Having a hanging check_reload_status pocess:


  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
  293 root          1 123   20 31176K 15508K CPU1    1 131:47 100.00% check_reload_status

Update
After forcefully terminating check_reload_status I could salvage the web GUI, though many services seem to be in a broken state

Update 2
Rebooted the firewall. One of our VDSL modems died witch caused a lot of resyncs. Swapped it for a good one. Might be related?

@edmund:

I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems.

tonymorella

Adding to the pain. Tonight my cable modem went up and down a few times, and pfsense went nutty :) I have gateway monitoring disabled on all interfaces. In the logs I see

The link go down
check_reload_status kick off
Reloading filter
link come back up
xinet Starting reconfiguration
rc.newwanip

Then and error:


Oct  5 01:45:22 pfSense php-cgi: rc.banner: PHP ERROR: Type: 1, File: /etc/inc/interfaces.inc, Line: 80, Message: Call to undefined function pfSense_interface_listget()

The cable modem sets a default IP 192.168.100.20 which kicked off check_reload_status: Reloading filter
xinet Starting reconfiguration
rc.newwanip which in turn kicking off
Dynamic DNS and OpenVPN that errored out because it did not have a public IP yet.
WAN receives a public IP and the services start, if I wait the GUI will come back, or restarting the web and php-fpm services via the SSH menu options 11 and 16

The "Starting reconfiguration process" happened 16 between 1:44:58 and 01:58:20, then it received a public IP. During the cycle check_reload_status was at 100% CPU on a PCEngines APU2 Quad Core, I could not get in via GUI only SSH. The above error happened 16 times, each time it cycled.

Unless I missed an option did not see a way to delay the Starting reconfiguration process, looking at the source to see what I can find.

Comments?

Tony

luckman212

Came here to cry about the same problem. 2.3.2_1. I have the OpenVPN widget on my dash but not the IPSEC widget. Is there something I can patch manually to stop this for now?

edmund

My solution has been to disable auto-negotiation on the WAN interface and fix the pfSense WAN interface at 100baseTX full-duplex - the has completely solved the issues for my home setup since my cable connection is only 65/10 Mb on a good day.

It's my suspicion that the issues are caused by auto-negotiate failing in some subtle way - however I'm running pfSense on a Chinese made board so I can't be certain the the actual NIC is really made by Intel as it claims. I'm open-minded about this - I don't see these problems on another pfSense box at work in an identical WAN configuration on an SG-4860.

luckman212

Hmm I don't know about that- i'm having the issue on an intel (sg2440). Can't set to 100fdx because the WAN is 300mbit cable.

edmund

You could try just setting the link speed instead of auto-negotiating it - it was my suspicion that auto-negotiate was failing.

khaled

please choose 16
Restart php-fpm
and try agine

igpit

I just experienced this with 2.3.3-RELEASE-p1 !

Today when I want to check the admin web page I get the 502 error.

Running option 16 from console solved the issue. I thought this was fixed by now?

igpit

It just happened again. "Restart php-fpm" solved it, but there is definitely some bug.