502 Bad Gateway (nginx) after Update to 2.3

edmund

I did a factory default reset and started the setup again. Something seems to be very wrong with pfSense - I'm seeing an average CPU utilization of 25% with no traffic on a 4 core box with two cores running at 100% - see the attached picture. The WAN gateway appears to be dropping up to 80% of the packets - yet switching from the pfSense box to a Linksys router gives my about 70M/10M on a speedtest - it's not the modem or connection that's causing the problem.

With this new setup I've completely disabled IPv6 (at least as far as I can tell) and the 502/504 Bad Gateway messages have stopped although pfSense still shows the gateway as down on the widget. Also unbound crashes a lot - you can see each CPU running it's own copy of unbound - is that normal?

After four hours with no progress I think it's probably time to wipe the disk and start again from scratch.

Capture.PNG_thumb

edmund

I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems. The rest of my issues here probably stem from my futile attempts to "fix" the hardware problem with changes to the software settings.

The lesson is - just because it's got four pairs doesn't make it a CAT6 cable.

marklar

Experiencing the same problem as the OP.

I'm in the process of setting-up a brand-new pfSense firewall. I have two IBM x3550 servers in an HA configuration. New install using 2.3.2. All my interfaces, except the SYNC interface, are VLAN interfaces.

Almost immediately I started encountering the "502 Bad Gateway (nginx)" error in my web browser. The pattern I've seen is it's always preceded by changes to interfaces, and before it locks-up fully with the 502, I consistently get a crash report with numerous errors like "PHP Fatal error: Call to undefined function pfSense_interface_listget() in /etc/inc/interfaces.inc on line 80".

The PHP error mostly happens on the backup node after I make changes to the primary node and it syncs to the backup. The primary node produces the PHP errors less often, and locks-up with the 502 error very rarely.

I'm curious: is this problem recognized and fixed in version 2.3.3?

Thanks!

helge000

I am also hitting this issue I think. Running 2.3.2; now getting 502. Restarting PHP-FPM did not resolve this. Will reboot tonight, disable IPSec widget and report back.

Having a hanging check_reload_status pocess:


  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
  293 root          1 123   20 31176K 15508K CPU1    1 131:47 100.00% check_reload_status

Update
After forcefully terminating check_reload_status I could salvage the web GUI, though many services seem to be in a broken state

Update 2
Rebooted the firewall. One of our VDSL modems died witch caused a lot of resyncs. Swapped it for a good one. Might be related?

@edmund:

I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface - after replacing the WAN -> modem cable with a CAT6 cable it's connecting and finding the interface without problems.

tonymorella

Adding to the pain. Tonight my cable modem went up and down a few times, and pfsense went nutty :) I have gateway monitoring disabled on all interfaces. In the logs I see

The link go down
check_reload_status kick off
Reloading filter
link come back up
xinet Starting reconfiguration
rc.newwanip

Then and error:


Oct  5 01:45:22 pfSense php-cgi: rc.banner: PHP ERROR: Type: 1, File: /etc/inc/interfaces.inc, Line: 80, Message: Call to undefined function pfSense_interface_listget()

The cable modem sets a default IP 192.168.100.20 which kicked off check_reload_status: Reloading filter
xinet Starting reconfiguration
rc.newwanip which in turn kicking off
Dynamic DNS and OpenVPN that errored out because it did not have a public IP yet.
WAN receives a public IP and the services start, if I wait the GUI will come back, or restarting the web and php-fpm services via the SSH menu options 11 and 16

The "Starting reconfiguration process" happened 16 between 1:44:58 and 01:58:20, then it received a public IP. During the cycle check_reload_status was at 100% CPU on a PCEngines APU2 Quad Core, I could not get in via GUI only SSH. The above error happened 16 times, each time it cycled.

Unless I missed an option did not see a way to delay the Starting reconfiguration process, looking at the source to see what I can find.

Comments?

Tony

luckman212

Came here to cry about the same problem. 2.3.2_1. I have the OpenVPN widget on my dash but not the IPSEC widget. Is there something I can patch manually to stop this for now?

edmund

My solution has been to disable auto-negotiation on the WAN interface and fix the pfSense WAN interface at 100baseTX full-duplex - the has completely solved the issues for my home setup since my cable connection is only 65/10 Mb on a good day.

It's my suspicion that the issues are caused by auto-negotiate failing in some subtle way - however I'm running pfSense on a Chinese made board so I can't be certain the the actual NIC is really made by Intel as it claims. I'm open-minded about this - I don't see these problems on another pfSense box at work in an identical WAN configuration on an SG-4860.

luckman212

Hmm I don't know about that- i'm having the issue on an intel (sg2440). Can't set to 100fdx because the WAN is 300mbit cable.

edmund

You could try just setting the link speed instead of auto-negotiating it - it was my suspicion that auto-negotiate was failing.

khaled

please choose 16
Restart php-fpm
and try agine

igpit

I just experienced this with 2.3.3-RELEASE-p1 !

Today when I want to check the admin web page I get the 502 error.

Running option 16 from console solved the issue. I thought this was fixed by now?

igpit

It just happened again. "Restart php-fpm" solved it, but there is definitely some bug.

weehooey

Have same issue on 2.3.4
Restarted PHP-FPM restored GUI and OpenVPN
Removed IPsec widget from dashboard, hopefully will help

AlexMex

Hello,

I'm getting the 502 bad gateway too. I have just installed pfsense 2.3.4.
I start getting the issue after setting up four VLANs on my OPT1 interface. Using option 16 Restart PHP-FPM is working sometimes on first shot but more frequently after second or third attempt.
CPU usage was arround 1.5%

Today I install SQUID and activate transparent proxy mode on my LAN and four OPT1 VLANs.
After I login in webconfigurator, I cannot access any page. I immediately fall on the nginx 502 bad gateway error. I have tried to use option 16 as before but it does not work anymore. Only a reboot of pfSense box allow me to login again in the web configurator.

After several reboots I saw that sometimes the OPT1 appears down and the VLANs are up, , sometimes OPT1 and VLANs are down, sometimes everything is up as expected :-[
Each time CPU usage increase to 100% and then in a few seconds I got the 502 message.

I have made some tests and when I disconnect cable on OPT1, issue does not happen.
I have checked option "Do not forward traffic to Private Address Space (RFC 1918) destinations" and plug in the cable on opt1. It lloks like I am back to initial situation now.

It fails less often but is still annoying since I have to reboot the box to continue my configuration.

If you have any suggestion with this I will really apreciate. For now, I will stay on pfSense 2.2

costasppc

Ι can confirm the issue with 2.3.4-RELEASE (amd64). When the gui is not accessible, OpenVPN users cannot login. When I used option 16, the users could login again.

I disabled the widgets mentioned before (although I need the OpenVPN widget…) and see what's happen.

Best regards

Kostas

edmund

@edmund:

I believe that the root of all my problems has been an auto-negotiate failure on the WAN interface …

Update: I'm convinced that my problems all stem from using a cheap Chinese "pfSense" system that I purchased on Amazon - it was about half the price of a comparable Netgate unit but it has continuously generated errors on the WAN interface and has never managed to auto-negotiate the link speed. Recently I had the cable company at the house after complaining that the cable speed (at 10Mbs) was too low - I'm paying for 150Mbs. After about an hour of trying everything and failing to fix the problem I put a switch in between the cable modem and the firewall wan port - which boosted the speed to about 55Mbs.

I have replaced the "el cheapo" Chinese box with a Netgate box - auto-negotiated works, the cable modem instantly supports 1Gbs and I'm not getting 160Mbs on the cable connection. I'm no longer seeing any errors on the WAN interface.

hdejongh

sunday i updated 6 firewalls from 2.3 to 2.4.
All suddenly are showing 502 bad gateway problems.
Besides that 2 of them become unresponsive after around 20 hours.
Its hard to access the firewall then but the one time i got lucky i can see that the memory is completely full.
So i doubled the memory and still same problem occur.

Only way to solve it is rebooting the firewall

btw, all are vm's
2 of them were a complete new install with a config restore (i had to go from 32bits to 64 bits).

My physical pfsense's are not infected…

jahonix

Search in this forum or on redmine.pfsense.org - this was seen before.
IIRC it may come from an associated CD/DVD drive to your pfSense VM. Get rid of that and it might work. Other problem I don't remember off of my head is discussed on redmine.

rightnow

@hdejongh:

sunday i updated 6 firewalls from 2.3 to 2.4.
All suddenly are showing 502 bad gateway problems.
Besides that 2 of them become unresponsive after around 20 hours.
Its hard to access the firewall then but the one time i got lucky i can see that the memory is completely full.
So i doubled the memory and still same problem occur.

Only way to solve it is rebooting the firewall

btw, all are vm's
2 of them were a complete new install with a config restore (i had to go from 32bits to 64 bits).

My physical pfsense's are not infected…

Exactly same problem here! Firewall dies, and 502 bad gateway.

mikael.andre

Hello,

I experienced the same problem on 2.4.0 release.
I have no access to my firewall by :

SSH
WebGUI
Console

BUT, I still continue to surf on the Internet… Very strange...
This issue occurs at the end of 20 hours uptime.
It's hardening to identify the root cause.
The only way to resolve this problem is rebooting my hardware appliance.
Once the reboot process is done, there is no event logs...

I have the following widget on my dashboard :

System Informations
NTP Status
SMART Status
pfBlockerNG
OpenVPN
Gateways
Interfaces
Interfaces Statistics
Traffic Graphs
Services status
Firewall logs
Thermal sensors

Heres my hardware configuration :

MotherBoard : APU2C4
CPU : AMD Embedded G series GX-412TC, 1 GHz quad Jaguar core with 64 bit and AES-NI support
RAM : 4 GByte DDR3-1333 DRAM
Storage Type : mSATA SSD
Storage Size : 120GB
Ethernet ports : 3 x 1Gbit/s
Wireless card : WLE200NX with two antenna