Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    New 502 Bad Gateway

    Scheduled Pinned Locked Moved 2.4 Development Snapshots
    281 Posts 67 Posters 199.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      A Former User
      last edited by

      @AhnHEL:

      I have one box using the ZFS file structure, the other is using UFS, both using pfBlockerNG.  The ZFS is rock solid, and the UFS one gets the Bad Gateway after some time.  Wondering if that is a possible reason why two similar boxes with similar settings exhibit different behavior using the same snapshot and same packages.

      Both running 20171009 Snapshots for 2.4.0

      Just a thought

      It would seem ZFS and pfBlockerNG play more nicely than UFS filesystem; before the jump to 2.4.0 Release. Reinstalled under ZFS and uploaded my configuration file from just before I performed the reinstalled and it's been running solid on both of my boxes that were affected. Normally it would last 20 minutes before I got the gateway error but not an error in the logs in sight. Previously I saw the line that stated "Listen queue overflow: 193 already in queue awaiting acceptance (x occurrences)".

      1 Reply Last reply Reply Quote 0
      • AhnHELA
        AhnHEL
        last edited by

        Just reinstalled myself changing from UFS to ZFS filesystem, using the same 20171009 snapshot.  Wouldn't last ten minutes before, but has been up without error for 24 hours now.

        Never used Squid or ClamAV.  Only using pfBlockerNG.

        AhnHEL (Angel)

        1 Reply Last reply Reply Quote 0
        • john_galtJ
          john_galt
          last edited by

          Thanks for that report AhnHEL. I plan on doing the same thing tomorrow morning.

          Doug

          Doug

          1 Reply Last reply Reply Quote 0
          • B
            BreeOge
            last edited by

            So there are ~190+ things stuck doing a PHP operation, and the same number of stuck connections hitting the dnsbl daemon. The only thing pfBlocker does with lighty is run /usr/local/www/pfblockerng/www/index.php

            So something in that file is getting stuck and making those pile up.  Probably its file lock operation, maybe something isn't giving up a lock and everything else is stuck waiting.

            Try editing /usr/local/www/pfblockerng/www/index.php and commenting out or removing the whole "Increment DNSBL Alias Counter" block and see if it makes a difference. Keep a backup so you can put it back later if there is no change.

            So far, I have not had a crash since I removed that section.  Been 21 hours, and still running strong. Looks like jimp found the issue.  Now the question is what does it effect and why is it in there.

            If it works good on ZFS and not UFS, this also makes some sense, as the error didn't show up on 2.4.0 till it was updated to BSD 11.1 from 11.0.  So something must have changed in the file system workings, and UFS doesn't like the file locking now that pfBlockerNG uses.

            1 Reply Last reply Reply Quote 0
            • luckman212L
              luckman212 LAYER 8
              last edited by

              Clicked on the link to read the latest post and got this…

              joke? Lol. Happy Friday!

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                @BreeOge:

                So far, I have not had a crash since I removed that section.  Been 21 hours, and still running strong. Looks like jimp found the issue.  Now the question is what does it effect and why is it in there.

                If it works good on ZFS and not UFS, this also makes some sense, as the error didn't show up on 2.4.0 till it was updated to BSD 11.1 from 11.0.  So something must have changed in the file system workings, and UFS doesn't like the file locking now that pfBlockerNG uses.

                That's possible. It looks like it's trying to keep some stats about what was hit, but it's using a plain text file to do it. IMO that should be an sqlite database and not a plain text CSV file but not having looked at the rest of the related code I'm not sure what changing that would entail, or if anything absolutely relies on that being plain text. I've been told that bbcan is aware though and he's looking into it. That could also explain 502/php issues people have had in the past with the pfblocker widget.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • B
                  BreeOge
                  last edited by

                  I have a PM going with BBcan, I pointed him to our findings a few min ago..

                  I am just glad we seem to be narrowing this down.

                  1 Reply Last reply Reply Quote 0
                  • luckman212L
                    luckman212 LAYER 8
                    last edited by

                    Following this loosely, but I really appreciate your guys' diligence on tracking this down. Happy hunting

                    1 Reply Last reply Reply Quote 0
                    • john_galtJ
                      john_galt
                      last edited by

                      I just reinstalled with 2.4.0-R. Went with ZFS this time. Restored config and reset pfBlockerNG.
                      I also set the tunable kern.ipc.soacceptqueue back to it's original value of 128. Have fingers
                      crossed.

                      I might add that reinstalling and restoring from a config file was one of the least painful things
                      I've done in a while. Well done pfSense team!!

                      Doug

                      Doug

                      1 Reply Last reply Reply Quote 0
                      • H
                        ha11oga11o
                        last edited by

                        Hello all,

                        i do have err 502 Bad Gateway also and i filtered log at mine unit with relevant data. It might be usefull

                        https://pastebin.com/et5HvbpT

                        Also, mine unit need 2-3 reboots in row to be able to access GUI. At same instance when i reboot it i refresh GUI page and it loads forever.

                        Kind regards.

                        1 Reply Last reply Reply Quote 0
                        • B
                          BreeOge
                          last edited by

                          @ha11oga11o:

                          Hello all,

                          i do have err 502 Bad Gateway also and i filtered log at mine unit with relevant data. It might be usefull

                          https://pastebin.com/et5HvbpT

                          Also, mine unit need 2-3 reboots in row to be able to access GUI. At same instance when i reboot it i refresh GUI page and it loads forever.

                          Kind regards.

                          you can SSH in and hit Ctrl-Z and then run /bin/tcsh this will give you a shell back.

                          Then you can run reboot from the console/SSH.

                          Currently you have 3 choices

                          1. Remove PFblocker till BBcan177 can get the effected coded fixed.  He is working on it, so I wouldn't expect it will be that long.
                          2. Reinstall with the ZFS file system.
                          3. Keep rebooting till update comes out.

                          Also you can remove the affected code reported in this thread, it is a temp fix, and will affect the widget reporting.  But it does allow it to run without issues.

                          1 Reply Last reply Reply Quote 0
                          • B
                            BreeOge
                            last edited by

                            I want to give a big thank you to all the pfSense Team that helped with this issue, and BBcan177.  Me and everyone are very thankful for the time and effort you put in to help us figure out this issue.

                            1 Reply Last reply Reply Quote 0
                            • john_galtJ
                              john_galt
                              last edited by

                              Here Here!

                              Doug

                              1 Reply Last reply Reply Quote 0
                              • M
                                musicwizard
                                last edited by

                                I have the same problem.

                                I run Pfblocker and Snort.

                                i did remove the code and it seemed to work, but after like 20 min or so i got the same problem again. So im looking to do a complete reinstall and put back a backup i made before i went to 2.4.0.

                                when i use the old kernel on 2.4.0 all is fine.

                                1 Reply Last reply Reply Quote 0
                                • jimpJ
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by

                                  @Music:

                                  I have the same problem.

                                  I run Pfblocker and Snort.

                                  i did remove the code and it seemed to work, but after like 20 min or so i got the same problem again. So im looking to do a complete reinstall and put back a backup i made before i went to 2.4.0.

                                  when i use the old kernel on 2.4.0 all is fine.

                                  Did you gather the information requested in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 when it was stopped? You may have been hitting a different issue entirely.

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 0
                                  • M
                                    musicwizard
                                    last edited by

                                    @jimp:

                                    @Music:

                                    I have the same problem.

                                    I run Pfblocker and Snort.

                                    i did remove the code and it seemed to work, but after like 20 min or so i got the same problem again. So im looking to do a complete reinstall and put back a backup i made before i went to 2.4.0.

                                    when i use the old kernel on 2.4.0 all is fine.

                                    Did you gather the information requested in https://forum.pfsense.org/index.php?topic=137103.msg753994#msg753994 when it was stopped? You may have been hitting a different issue entirely.

                                    Funny thing is i did have the information  in a txt file was still copying some information out of putty. And my cat stept on the power connector of my computer which is lying on the floor. so i lost that file.  :(

                                    So i just did a clean install and used the recover config and installed it on ZFS now. Only have pfblocker running atm and no problems yet.

                                    1 Reply Last reply Reply Quote 0
                                    • B
                                      belt9
                                      last edited by

                                      @BreeOge:

                                      Currently you have 3 choices

                                      1. Remove PFblocker till BBcan177 can get the effected coded fixed.  He is working on it, so I wouldn't expect it will be that long.
                                      2. Reinstall with the ZFS file system.
                                      3. Keep rebooting till update comes out.

                                      Just checking in as my system was unreachable and it looks like this is the culprit, I just updated to 2.4.0-RELEASE form a month old snapshot last night.

                                      I'm not sure about #2 helping you, as I've got a ZFS raidz2 install and this still happened to me. I've disabled DNSBL until a fix comes out, BBCan177 is top notch. I'm sure the fix will be available shortly if it's at all feasible.

                                      1 Reply Last reply Reply Quote 0
                                      • M
                                        musicwizard
                                        last edited by

                                        well since i have a clean reinstall if 2.4.0 with zfs and im running both Pfblocker and snort i haven't had a " lockup"  of the webgui etc.

                                        Uptime 15 Hours 14 Minutes 49 Seconds

                                        Al tho i did get a few Updates from BBcan :) because im using the test/beta version.

                                        It goes seem that 2.4.0 using more Cpu tho.

                                        1 Reply Last reply Reply Quote 0
                                        • T
                                          TheNarc
                                          last edited by

                                          With respect to the workaround that involves editing /usr/local/www/pfblockerng/www/index.php, I am not a PHP programmer, so I may be speaking out of turn, but I noticed that in the relevant code section two locks are acquired and never released:

                                          
                                                  if (!empty($pfb_query)) {
                                          	// Increment DNSBL Alias Counter
                                          	$dnsbl_info = '/var/db/pfblockerng/dnsbl_info';
                                          	if (($handle = @fopen("{$dnsbl_info}", 'r')) !== FALSE) {
                                          		flock($handle, LOCK_EX);
                                          		$pfb_output = @fopen("{$dnsbl_info}.bk", 'w');
                                          		flock($pfb_output, LOCK_EX);
                                          
                                          		// Find line with corresponding DNSBL Aliasname
                                          		while (($line = @fgetcsv($handle)) !== FALSE) {
                                          			if ($line[0] == $pfb_query) {
                                          				$line[3] += 1;
                                          			}
                                          			@fputcsv($pfb_output, $line);
                                          		}
                                          
                                          		@fclose($pfb_output);
                                          		@fclose($handle);
                                          		@rename("{$dnsbl_info}.bk", "{$dnsbl_info}");
                                          	}
                                          }
                                          

                                          Referring to the PHP documentation for the flock function, apparently this used to be okay because the locks were implicitly released when the file handles were closed, but that is no longer so starting from version 5.3.2:  https://secure.php.net/manual/en/function.flock.php

                                          That said, I don't know if the update from 2.3.4 to 2.4.0 happened to jump from a pre-5.3.2 version of PHP to a post-5.3.2 version.  But regardless, I assume that explicitly releasing locks couldn't hurt anything.  Also, the return value of one of the two fopen calls is checked but the other is not.  I'm going to try the following modification on my system, which is still UFS, and will report back:

                                          
                                          if (!empty($pfb_query)) {
                                          	// Increment DNSBL Alias Counter
                                          	$dnsbl_info = '/var/db/pfblockerng/dnsbl_info';
                                          	if (($handle = @fopen("{$dnsbl_info}", 'r')) !== FALSE) {
                                          		if(flock($handle, LOCK_EX)) {
                                                   if (($pfb_output = @fopen("{$dnsbl_info}.bk", 'w')) !== FALSE) {
                                                      if(flock($pfb_output, LOCK_EX)) {
                                                         // Find line with corresponding DNSBL Aliasname
                                                         while (($line = @fgetcsv($handle)) !== FALSE) {
                                                            if ($line[0] == $pfb_query) {
                                                               $line[3] += 1;
                                                            }
                                                            @fputcsv($pfb_output, $line);
                                                         }
                                                         flock($pfb_output, LOCK_UN);
                                                      }
                                                      @fclose($pfb_output);
                                                   }
                                                   flock($handle, LOCK_UN);
                                                  }
                                          	@fclose($handle);
                                          	@rename("{$dnsbl_info}.bk", "{$dnsbl_info}");
                                          	}
                                          }
                                          
                                          

                                          And sorry for the lousy formatting; I tried pasting several times and couldn't manage to get the indentation non-wonky.

                                          1 Reply Last reply Reply Quote 0
                                          • T
                                            TheNarc
                                            last edited by

                                            It looks like pfSense has been using PHP 5.3.2 since around version 2.1 (the change log for 2.1 https://doc.pfsense.org/index.php/2.1_New_Features_and_Changes just says "PHP to 5.3.x").  So technically as far back as then all locks acquired with PHP's flock should have been manually released.  Nevertheless, maybe some other change in 2.4.0 resulted in the lack of these explicit releases causing trouble where they had not before.  I have two pfSense machines running 2.4.0 with pfBlockerNG and my modification now so I'll just see how they fare.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.