504 Gateway Time-out (nginx) - 2.3 upgrade on 2x APU1



  • I have quite strange a phenomena and don't know exactly where to start troubleshooting.

    Updated an APU1 yesterday afternoon from 2.2.6 to 2.3 with Nano install on an SD card.
    Today the unit wasn't accessible via HTTPS (504 Gateway Time-out). Doesn't route also but still hands out DHCP leases to clients.

    To get to production again I installed my config.xml to a different APU1 which I just updated to 2.3 from 2.3RC. This device has an mSATA disk.
    Well, now, nearly 10h after working flawlessly the second device goes down as well. Cannot login via HTTPS or ssh from LAN. I can, however, ssh via an OPT interface - but HTTPS doesn't work there as well.

    When I had console access via serial I tried to restart the webConfigurator and rebooted the device. No change.
    Reverting back to HTTP didn't help either, I still get redirected to HTTPS, even after a reboot.

    WAN is PPPoE on re0_vlan7, LAN and other OPT interfaces are re1_vlan10 / 20 / 30 / 40 / 50 / 60.
    Installed package was Backup/Restore only.

    Anyone got an idea? I'm a bit clueless.



  • The remaining HTTPS redirect is most likely cached by your browser, they're pesky about that.

    Only thing I'm aware of there is if the system doesn't have Internet access, update checks can pile up and hang the GUI, resulting in a 504.
    https://redmine.pfsense.org/issues/6177

    Any indication it's having trouble checking for updates?



  • HTTPS redirect: maybe, i'm unsure if I checked this with a different browser as well.

    System has Internet access most all the time. I always listen to streaming radio and it just went on.

    The first APU going down did not have troubles checking for updates (I remember having looked at it). I don't know about the alternative unit but since it received the same config and was a drop-in replacement I doubt it being different.

    Any idea why I can ssh from OPT1 and not from LAN, both being VLANs on the same trunk? Rules surely don't permit it.



  • Sorry, Chris, from the console I could see that pkg update indeed doesn't work.
    ssh from Lan is affected as well, vie OPT1 it's ok. Strange…



  • I think I have the same situation here.

    After a few days pfSense has basically stopped responding to clients (from any interface). I can't access it via HTTPS (nginx error 504). It doesn't respond to OpenVPN clients. It doesn't reply on ping.

    But it's still passing traffic through it. Clients in the LAN can still access Internet. Clients from the Internet can still access every e-mail server and website located in the LAN.

    It is a single-LAN multi-WAN system with PPPoE on all WAN OPT1 OPT2 etc. pfSense v.2.3-RELEASE (i386), upgraded from 2.2.6. The system is a virtual machine on IBM x3650 host server running VMware ESXi v. 4.1.

    EDIT: PPPoE on OPT1 OPT2 etc, but DHCP on WAN.

    EDIT 2: it still replies to ping and responds to IPsec remote peers (which are also pfSense v.2.3).

    EDIT 3: sometimes nginx reports error 502 (Bad Gateway).

    EDIT 4: it still responds to SSH connection. Tested on remote pfSense 2.3 devices that stopped responding.



  • It can be fixed by restarting PHP-FPM (console command #16), but not permanently. After a few hours it stops responding to HTTPS (nginx error 504 or 502) again.

    EDIT: I think I've found the root cause here: https://forum.pfsense.org/index.php?topic=110070.0. It's the IPsec widget.



  • Could be.
    In vanilla mode it runs flawlessly, my restored config has the IPsec widget in dashboard. I'll try to remove that and test again (when I have the time to).



  • I had this problem (not exactly but same effect). I tried updating to the 2.3.1 snapshot and problem persisted. I removed every widget (as I saw there were multiple widgets along with IPSec that was being logged), rebooted, waited it out for a couple minutes, and slowly re added widgets. IPSec did pop back up with an nginx error but removed it again and just waited a couple hours longer. As far as I can tell my system running fine for the past  +12 hours (and it usually breaks even during the day despite it waiting till midnight to break.

    My post is here if you want to see what me and others were getting on this issue.

    https://forum.pfsense.org/index.php?topic=110121.0

    Edit: Seems it came back to haunt me once again, if you haven't done the above yet try to also clear your browser cache. It would seem the IPSec widget isn't playing well but I have reduced the amount of times this problem occurs (as far as Im experiencing so far). But it seems like its varying in occurrences, widget, and when it brings the webGUI down.


  • Banned

    Look also here discussing the same problem! https://forum.pfsense.org/index.php?topic=110121.0



  • This was likely fixed in either 2.3.1 or 2.3.1_1 depending on which instance of the issue is responsible.



  • @cmb:

    This was likely fixed in either 2.3.1 or 2.3.1_1 depending on which instance of the issue is responsible.

    Yes, I can confirm that this issue is resolved (at least I haven't noticed the problem arising lately). Latest updates have been doing wonders  :)



  • Hello

    This issue is NOT resolved in version 2.3.1_1 . I contantly gett 502 BAD gateway after some time of usage.
    I have removed IPSEC gadgets etc and the problem still persists.
    I have to reboot the pfsense everyday because of this.

    Best regards
    Toby


  • Banned

    There's no need to reboot, simply restart PHP-FPM and the webconfigurator from the shell menu. Plus you are two releases behind.



  • Hello

    Running 2.3.2 P1 , sorry i did not check. But anyway the error is still there. I have it on several units

    /Toby



  • @toby-rdc:

    Hello

    Running 2.3.2 P1 , sorry i did not check. But anyway the error is still there. I have it on several units

    /Toby

    If you can please run this command, and try to capture the output; when the issue happens ps uxawww either by ssh, or the local terminal. Install pstree for an even better way to find the issue. One of the devs instructed me to do this in order to see what was the cause.

    pkg install pstree
    rehash
    pstree



  • I have the same issue here with version:

    2.3.2-RELEASE-p1 (amd64)
    built on Tue Sep 27 12:13:07 CDT 2016
    FreeBSD 10.3-RELEASE-p9

    After some hours I get the "504 Gateway Time-out" or Bad Gateway error.



  • You are on a totally different version. Nano i386 vs amd64.
    Did you restart PHP-FPM from console/ssh? What was the result?


  • Banned

    About ~100% of cases this is fixed by Restart PHP-FPM + Restart webConfigurator from console. (Not that it'd make me love the nginx thing, or the pkg's stupidity of being absolutely unable to work offline.)

    And disable the updates checking on dashboard, plus definitely do NOT add the installed packages widget.



  • I know the version thing, I only want to say, that the issue is still alive :D

    -Restart PHP-FPM is working every time, but its not really usable to do this so often

    But I try to disable the updates checking thing. Installed packages widget is not installed.

    Thanks :)



  • @crisdavid:

    If you can please run this command, and try to capture the output; when the issue happens ps uxawww either by ssh, or the local terminal. Install pstree for an even better way to find the issue. One of the devs instructed me to do this in order to see what was the cause.

    pkg install pstree
    rehash
    pstree

    Don't bother installing that. ps forest/tree format under bsd:

    ps auxdww

    Gives more info on all processes.
    ("w" 's are for long line wrapping - 'd' is for forest view (tree) )

    https://www.freebsd.org/cgi/man.cgi?ps(1)



  • I tried that thing with uncheck updates - didn't work. After some days the http request timed out because ngin didn't respond



  • There was a problem with the IPsec widget a while back. If you have that on the dashboard then try to remove it.



  • bump

    Setup :

    • 2.3.2-RELEASE-p1 (i386)
    • 2013 ALIX Engine
    • 4G NanoBSD
    • bare install, no-addons

    I have been trying to taccle down similar issues as those mentionned in this topic :

    • nginx timeouts ( 504, 502 … ) or slowness

    • nginx timeouts in syslog

    • traffic still going trough, only the UI is impacted

    My appliance is coupled w/ a 100Mbps DSL line ( which is well below the maximum troughput it can handle ), and I started noticying that the nginx issues were happening at high troughput ( >= 90Mbps ).

    I disabled traffic shaping at first ( as it can be quite CPU intensive ), but it didn't change the behaviour.

    The moment the troughput drops to values like 20Mbps, the UI becomes responsive again ( no slowness, no timeouts ).

    What can I do to solve these problems ?  :-\



  • @craymore:

    bump
    ..Alix..
    What can I do to solve these problems ?

    Bump in place ?

    First off: Install a fast/excellent CF like a Transcend Industrial grade.
    Then start with a 2.3 from scratch i.s.o. upgrading from a 2.2
    Set the CF to read/write all the time.
    2.3 works fine with an Alix.



  • @hda:

    First off: Install a fast/excellent CF like a Transcend Industrial grade.
    Then start with a 2.3 from scratch i.s.o. upgrading from a 2.2
    Set the CF to read/write all the time.
    2.3 works fine with an Alix.

    • SLC SD card w/ high i/o installed
    • last "from scratch" install was done on the initial 2.3 release ( and it's irrelevant since the NanoBSD upgrade procedure is equivalent to an initial install w/ config copy only )
    • RW mode enabled ( it's now a NanoBSD default setting )

  • Banned

    @craymore:

    I started noticying that the nginx issues were happening at high troughput ( >= 90Mbps ).
    I disabled traffic shaping at first ( as it can be quite CPU intensive ), but it didn't change the behaviour.
    The moment the troughput drops to values like 20Mbps, the UI becomes responsive again ( no slowness, no timeouts ).

    What can I do to solve these problems ?  :-\

    Get an adequate hardware to handle the load. These things are simply EOL. Having issues even when updating due to lack of RAM, especially if you hit some bigger package update (python, perl comes to mind.)



  • @craymore:

    The moment the troughput drops to values like 20Mbps, the UI becomes responsive again ( no slowness, no timeouts ).
    What can I do to solve these problems ?  :-\

    So if the load is too high, which it is, and you want to keep the Alix, you could then restrict your 100-line to 50 or 25 with a managed switch.



  • @doktornotor:

    These things are simply EOL. Having issues even when updating due to lack of RAM, especially if you hit some bigger package update (python, perl comes to mind.)

    EOL ?!  :o

    These ALIX appliances are still being sold and are rated for a 250Mbp/s Firewall throughput, which, if I'm right, should be more than enough for what I do with it  ???

    EDIT : looking carefully @ the specs, I noticed that, while the FW t.p. is rated at 250Mbp/s, the port-to-port t.p. is rated at 85Mbp/s, so that might be a clue indeed ( altough I don't fully understand what it means compared to the FW t.p. )


  • Banned

    @craymore:

    @doktornotor:

    These things are simply EOL. Having issues even when updating due to lack of RAM, especially if you hit some bigger package update (python, perl comes to mind.)

    EOL ?!  :o

    Yeah, these are certainly EOLed as far as pfSense is concerned (together with rest of x86 arch). x86 is gone with 2.4. I mean, recycle them to run LEDE/OpenWrt or something similar on it and they can keep running for lot more years, but pfSense/FreeBSD is getting a pain to run on these.



  • @doktornotor:

    @craymore:

    @doktornotor:

    These things are simply EOL. Having issues even when updating due to lack of RAM, especially if you hit some bigger package update (python, perl comes to mind.)

    EOL ?!  :o

    Yeah, these are certainly EOLed as far as pfSense is concerned (together with rest of x86 arch). x86 is gone with 2.4. I mean, recycle them to run LEDE/OpenWrt or something similar on it and they can keep running for lot more years, but pfSense/FreeBSD is getting a pain to run on these.

    For details of will no longer be supported from 2.4 onwards: https://forum.pfsense.org/index.php?topic=121255.0


Log in to reply