Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    New 502 Bad Gateway

    2.4 Development Snapshots
    67
    281
    197.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pppfsense
      last edited by

      Yes, that was my feeling after seeing pfSense 'try' to reboot after it got to that state.
      The reboot takes several times longer than usual and you can see how it tries to sync vnodes and it simply times out!

      This points to a big and important LOCK somewhere, or simply reaching max number of processes or running out of memory.

      I am very surprised that this was not caught in testing: Many, many people run pfBlockerNG, Suricata/Snort and Squid. That should be a basic configuration to be tested.
      Yes, it takes traffic and some time to manifest, but any decent QA dept. needs to have, beyond load producing tools, monitoring tools to watch for memory leaks and process status (I did SW QA a few years ago).

      I imagine the pfSense Team does have all that, but the facts are that after a few spotless releases, we come back to insufficient testing for some standard, widely-used, configurations.

      I have customers to support and when they pay you for their network to be up and for everything to work as promised, the time you can spend chasing this stuff, both the releases, as the forums, is time that I can use for many other better things, and instead of me getting paid to test, fix, reboot or babysit the firewall, I would prefer for them to pay for a solution that somebody else already babysat and tested properly:

      For not a lot of money ($200 to $800 a year), you can buy a different solution that can give you almost all the features than fSense (and some much better, like reporting, managed IPS, virus, ads/malware blocking):
      Untangle, which I have used longer (since 2010) than pfSense (since 2012), has never, ever gave me these problems, actually, no issues at all, and their support, while I was a non-paying customer, was very good and really helped me when I had a VLAN question.

      Of course I will continue using pfSense, but probably not for a big enough customer that needs a 'bullet-proof' 24/7/365, no-excuses, solution.

      My peace (and reputation) is worth more than the few hundred dollars I can make by baby-sitting a router…

      @SimonSAU:

      This is more of an info post to help try and sort out the issue.

      I also had the Bad Gateway error after the 2.4.0 and 2.4.1 updates. pfBlockerNG is installed and running GeoIP and DNSBL parts only, with some periodic updates (essentially Pi-Hole). The pfsense system runs in a VM on XenServer (7.1, I believe).

      What I found interesting was that I'm monitoring the firewall with Observium and the graphs are attached. (All of the same unit, same timeline, I just had to take 2 screenshots as the page is long.) Noting the graphs are 1 day / 7 days / 4 weeks / 1 year.

      You can clearly see the 'spike' to crash/reboot time on the graphs, in both the running processes and the memory usage (etc)… the first spike is after the 2.4.0 install, with the 2.4.1 install coming immediately after the 'crash' of the 2.4.0 install. Then over a week running fine on 2.4.1... then processes ramp up again to crash point.

      I could get to the console on the 2.4.1 box today but selecting 'reboot' from the console menu basically just hung the box... after 15mins it needed a 'force reboot' power cycle.

      I'll be keeping a close eye on the firewall's health.. as well as this forum thread.

      Happy to try and help debug this issue. It seems to me that something is 'triggering' the process madness and that doesn't seem to be a change (in my case) as the system ran for over a week without any involvement from me.

      1 Reply Last reply Reply Quote 0
      • S
        steky9
        last edited by

        @steky9:

        This is still happening to me on 2.4.1 and the latest PfBlocker. Took 8 days from reboot for the 502's to start and all SSH connections to fail, and approx 1 more day after that for all traffic to be dropped. Needed to get it back asap so don't have logs.

        Happened again late last night. This time got the logs requested

        https://pastebin.com/GMZG8B6H

        1 Reply Last reply Reply Quote 0
        • P
          PiBa
          last edited by

          @steky9:

          Happened again late last night. This time got the logs requested
          https://pastebin.com/GMZG8B6H

          What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

          1 Reply Last reply Reply Quote 0
          • BBcan177B
            BBcan177 Moderator
            last edited by

            @PiBa:

            @steky9:

            Happened again late last night. This time got the logs requested
            https://pastebin.com/GMZG8B6H

            What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

            Yes I saw this too on other machines where this is occurring… I wish I could find the trigger for it... Lets see if anyone chimes in that they have TrafficTotals pkg installed, and maybe try to disable the selected Interfaces in that pkg to see what that does...

            "Experience is something you don't get until just after you need it."

            Website: http://pfBlockerNG.com
            Twitter: @BBcan177  #pfBlockerNG
            Reddit: https://www.reddit.com/r/pfBlockerNG/new/

            1 Reply Last reply Reply Quote 0
            • S
              steky9
              last edited by

              @PiBa:

              @steky9:

              Happened again late last night. This time got the logs requested
              https://pastebin.com/GMZG8B6H

              What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

              Yes, status_traffic_totals is installed.

              1 Reply Last reply Reply Quote 0
              • S
                steky9
                last edited by

                @BBcan177:

                @PiBa:

                @steky9:

                Happened again late last night. This time got the logs requested
                https://pastebin.com/GMZG8B6H

                What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

                Yes I saw this too on other machines where this is occurring… I wish I could find the trigger for it... Lets see if anyone chimes in that they have TrafficTotals pkg installed, and maybe try to disable the selected Interfaces in that pkg to see what that does...

                I didn't really pay much/any attention to its output, so I've uninstalled it to see if it makes any difference. Have checked and after uninstall there's no instance of vnstat running.

                1 Reply Last reply Reply Quote 0
                • P
                  PiBa
                  last edited by

                  vnstat as used by TrafficTotals is normally started by a cron job every 5 minutes.. So somehow it doesn't finish within that time and another process is started..
                  I don't think its the cause of trouble by itself, but it might help find what is..

                  It could be interesting to know why vnstat is apparently 'hanging'.. perhaps output of truss when starting it manually, or lsof could help find that out.. The output files and results of these commands could help find a reason or direction to dig further, preferably combined with the other commands previously requested..:

                  
                  lsof > /root/lsof_truss.log
                  truss -dfo /root/vnstat_truss.log vnstat -u
                  
                  cat /root/lsof_truss.log | grep vnstat
                  
                  

                  That truss command may hang just like the other vnstat processes though.. Keep the log, then 'killall vnstat' and run the truss command again to a second logfile. Check if it hangs again, and maybe compare the last parts of both vnstat_truss.log files.. or upload em on the forum or perhaps a pm.?.

                  lsof might need to be installed.. 'pkg install lsof'
                  Also for those with TrafficTotals installed and active monitoring (and alerting?), please try and gather the info as soon as possible after there is >1 vnstat process running.

                  Sorry for asking again for 'more info', but without a reproduction, or this kind of trouble on my own machines, and afaik still unknown root cause it cannot be easily solved.. Just trying to help get to the root cause..  8)

                  p.s. i'm just a pfSense-user (and package developer though usually not of pfB)..

                  1 Reply Last reply Reply Quote 0
                  • G
                    gsmornot
                    last edited by

                    Back to 502 Bad Gateway every 24 hours. (roughly) I am on the latest 2.4.2 release. I guess for now, DNSBL has to be turned off it stops my ability to reach any sites not already in cache.

                    1 Reply Last reply Reply Quote 0
                    • A
                      akong
                      last edited by

                      I also show bad 502.I have upgrade 2.4.1 and latest pfblockerng.What's this problem?

                      1 Reply Last reply Reply Quote 0
                      • S
                        steky9
                        last edited by

                        @steky9:

                        @BBcan177:

                        @PiBa:

                        @steky9:

                        Happened again late last night. This time got the logs requested
                        https://pastebin.com/GMZG8B6H

                        What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

                        Yes I saw this too on other machines where this is occurring… I wish I could find the trigger for it... Lets see if anyone chimes in that they have TrafficTotals pkg installed, and maybe try to disable the selected Interfaces in that pkg to see what that does...

                        I didn't really pay much/any attention to its output, so I've uninstalled it to see if it makes any difference. Have checked and after uninstall there's no instance of vnstat running.

                        Well that didn't fix it. Same thing happened Thursday night, only got to take the logs off it now

                        https://pastebin.com/xeQPS9eq

                        1 Reply Last reply Reply Quote 0
                        • M
                          martial
                          last edited by

                          Thanks really helpful topic for this issue

                          1 Reply Last reply Reply Quote 0
                          • K
                            kyvpn
                            last edited by

                            @BBcan177:

                            @PiBa:

                            @steky9:

                            Happened again late last night. This time got the logs requested
                            https://pastebin.com/GMZG8B6H

                            What strikes me as odd here (and maybe unrelated to pfBlocker) is the 182 running 'vnstat' processes.. A possible source would be from TrafficTotals package, can you confirm you have got that installed?

                            Yes I saw this too on other machines where this is occurring… I wish I could find the trigger for it... Lets see if anyone chimes in that they have TrafficTotals pkg installed, and maybe try to disable the selected Interfaces in that pkg to see what that does...

                            I had TrafficTotals running and would get the 502 after 3-4 days, removed it based on this post and I'm up 8days22hrs so far.

                            2.4.1, pfBlockerNG 2.1.2_1

                            AsRock J3455B-ITX
                            SanDisk SSD PLUS 120GB (SDSSDA-120G-G26)
                            Intel I340-T4 Gigabit Adapter w/ Silver Heat Sink 49Y4242

                            1 Reply Last reply Reply Quote 0
                            • A
                              akong
                              last edited by

                              I has remove Status_Traffic_Totals package.But it's always show bad 502 up to 4days.I don't know what is this problem.

                              1 Reply Last reply Reply Quote 0
                              • S
                                steky9
                                last edited by

                                If anything removing TrafficTotals has made things worse rather than better. I only rebooted on Sunday night to get management back, and now 48 hours later I'm getting the 502's again. Maybe its just a freak occurence, I don't know, but as is somethings badly broken.

                                1 Reply Last reply Reply Quote 0
                                • P
                                  PiBa
                                  last edited by

                                  From what i 'think' happens TrafficTotals does not cause the problem. But might experience the same symptom as pfBlocker..

                                  For those using TrafficTotals and experiencing the problem it would be nice to get some information in 'why' the vnstat process hangs during a set of actions mostly unrelated to what pfBlocker does as they might have the same root cause for 'hanging'.. So truss, lsof and possibly gdb output would be nice to see especially as its a easy separated process unlike pfBlocker/lighttpd/php chain thats harder to run separately. Also it would be good to see if vnstat still hangs when gathering stats for less and or different interfaces.. Both while there are already hanging vnstat processes, gather details, and after killing all those processes try if it still hangs. (without rebooting..)

                                  Yes 'something' is broken. but only a +1 wont help fix it.. i think its already high on the attention list of BBcan177, and i'm interested in this topic as well and probably a few others that passively 'monitor' this thread, but i doubt it can be properly fixed without detailed information and a good understanding of why the problem happens.

                                  Please gather information some about running vnstat processes (when more than 1 starts running.) for those that have TrafficTotals installed. Preferably on pfSense 2.4.2 though not sure if that would help for anything..

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    steky9
                                    last edited by

                                    I'm happy to help if any more command output is required. I haven't rebooted it yet since the 502's started again on Tuesday. Given the day thats in it, I'll leave it be unless it starts dropping traffic.

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      Corbinm3
                                      last edited by

                                      We were also having issues with 502 Bad Gateway, 2.4.0 release, PFBlockerNG, Snort, OpenVPN Exporter…I noticed our issues after enabling DNS Resolver. Maybe it is just a coincidence but not even 30 minutes after turning that on and we got 502 errors. Rebooted, disabled PFBlockerNG, same problem the following day. Rebooted again, switched back to DNS forwarder and uninstalled PFBlockerNG, been stable since. Without the DNS resolver enabled we were stable with PFBlockerNG installed for over a week. After enabling the DNS Resolver, we went down quick. I'll keep this forum up to date if we do go down again and I have to rule out the DNS Resolver as the culprit, but for now, that's what it looks like from our end.

                                      Edit: Forgot to post this is on Netgate hardware, can't remember which one though and I'm not in the office.

                                      1 Reply Last reply Reply Quote 0
                                      • P
                                        PiBa
                                        last edited by

                                        For those willing to give some new code a try i have made a few changes to the 'file locking' code of pfBlockerNG.. :)
                                        Could some of you try if the changes made improve things?

                                        https://github.com/PiBa-NL/FreeBSD-ports/commit/1766713b26c8f388ad6e7909b2e971f7d74cdfea

                                        Changes are as following:

                                        • include globals.inc so the /tmp/ folder is know to be used for placing lock files instead of the root /
                                        • dont try and lock a resource handle with try_lock as a 'resource-(descriptive)-name' is expected
                                        • use 1 lock around the stats file re-writing code, having 2 locks for the same piece of code is not needed.
                                        • remove the force_unlock called on a 'Resource #10' which wasn't used to create a lock anyhow..

                                        It should be possible to apply the patch with systempatches package.
                                        To add a new patch that way press add then fill in:
                                        Description: pfBlocker_dnsbl_statsfile_locking
                                        File: https://github.com/PiBa-NL/FreeBSD-ports/commit/1766713b26c8f388ad6e7909b2e971f7d74cdfea.patch
                                        PathStrip: 4
                                        Base: /
                                        IgnoreWhitespace: Checked
                                        AutoApply: Unchecked

                                        Save, Fetch, Apply

                                        A message should show "Patch applied successfully".

                                        To revert it should be possible to just press 'Revert' which appears after the patch is applied.. If all fails, reinstall pfBlocker package "pkg install -f pfSense-pkg-pfBlockerNG"

                                        Edit :
                                        FYI: pfBlocker 2.1.2_2 includes this patch.

                                        1 Reply Last reply Reply Quote 0
                                        • D
                                          dstroot
                                          last edited by

                                          Hmm…  test output is not inspiring confidence.  Patch test output:

                                          /usr/bin/patch --directory=/ -f -p4 -i /var/patches/5a17809ef2593.patch --check --reverse --ignore-whitespace
                                          
                                          Hmm...  Looks like a unified diff to me...
                                          The text leading up to this was:
                                          --------------------------
                                          |From 1766713b26c8f388ad6e7909b2e971f7d74cdfea Mon Sep 17 00:00:00 2001
                                          |From: PiBa-NL
                                          |Date: Fri, 24 Nov 2017 01:37:34 +0100
                                          |Subject: [PATCH] pfBlockerNG, implement proper locking of dnsbl_info file to
                                          | avoid possible corruption
                                          |
                                          |–-
                                          | .../usr/local/pkg/pfblockerng/pfblockerng.inc      | 30 ++++++++-----------
                                          | .../files/usr/local/www/pfblockerng/www/index.php  | 35 ++++++++++------------
                                          | 2 files changed, 27 insertions(+), 38 deletions(-)
                                          |
                                          |diff --git a/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc b/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc
                                          |index 0fddd745065b..c6379b8dab38 100644
                                          |--- a/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc
                                          |+++ b/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfblockerng.inc
                                          --------------------------
                                          Patching file usr/local/pkg/pfblockerng/pfblockerng.inc using Plan A...
                                          Hunk #1 failed at 2500.
                                          1 out of 1 hunks failed while patching usr/local/pkg/pfblockerng/pfblockerng.inc
                                          Hmm...  The next patch looks like a unified diff to me...
                                          The text leading up to this was:
                                          --------------------------
                                          |diff --git a/net/pfSense-pkg-pfBlockerNG/files/usr/local/www/pfblockerng/www/index.php b/net/pfSense-pkg-pfBlockerNG/files/usr/local/www/pfblockerng/www/index.php
                                          |index 0b864797146e..8992f4a0342f 100644
                                          |--- a/net/pfSense-pkg-pfBlockerNG/files/usr/local/www/pfblockerng/www/index.php
                                          |+++ b/net/pfSense-pkg-pfBlockerNG/files/usr/local/www/pfblockerng/www/index.php
                                          --------------------------
                                          Patching file usr/local/www/pfblockerng/www/index.php using Plan A...
                                          Hunk #1 failed at 28.
                                          Hunk #2 failed at 71.
                                          2 out of 2 hunks failed while patching usr/local/www/pfblockerng/www/index.php
                                          done
                                          
                                          1 Reply Last reply Reply Quote 0
                                          • P
                                            PiBa
                                            last edited by

                                            dstroot, are you running latest/unmodified pfBlockerNG 2.1.2_1 version? On that version, the patch above should apply cleanly.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.