21.02-p1 really fix the issue on SG-3100?
-
Hey,
I read through the whole Article what causes the Problem and applied the 21.02-p1 update yesterday to a SG-3100. Before the Patch I applied the mentioned fix to limit the System to 1 CPU in /boot/loader.conf.
After the Patch I removed the hw.ncpu=1 from the loader.conf and rebooted. It doesn't took long and the system was in the "bug state" again. Not Forwarding any traffic, pf stuck again.
So what actually changed that p1?
Was it just a quick workaround by adding hw.ncpu=1 to the loader.conf? This would explain why it stuck again after I remove it.
For me currently I still cannot run the SG-3100 without the hw.ncpu=1 limit.
Thanks for any Ideas / Insights.
-
@solarizde I have similar problem, though by now I have the workaround down cold. after an upgrade or reboot, no traffic is passed through my 3100 and the webGUI is unavailable. However, I can ssh into the console, restart webConfigurater (option 11), then access the site, rerun the initial setup wizard ("System" -> "Setup Wizard") choosing all of the default options and entering my existing admin password as the "new" password, then on reload the whole thing starts working beautifully. Not sure why this works but it's extremely reliable for me. Thinking about selling my 3100 on craigslist and getting a unifi gw. :(
-
Would both of you mind doing me a favor? I'm chasing a bug in the Snort package that is impacting SG-3100 devices only.
Look in your system log (under STATUS > SYSTEM LOGS) and see if you find any messages logged about a Signal 11 crash from
php
orphp-fpm
.Also, would you mind listing any installed packages you are running?
Thanks,
Bill -
stephenw10 said in pfSense Plus and SG-3100:
No, it's a real fix. See: https://reviews.freebsd.org/D28821
-
Here is a screenshot from my dashboard of installed packages. During the upgrade, I removed pfBlockerNG, which is the only one I had installed, and which was disabled in any case.
I do see signal 11 errors in the log:
Feb 17 20:55:30 kernel pid 390 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 21:10:38 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 21:46:12 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 22:50:18 kernel pid 27679 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 23:08:17 kernel pid 30028 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 10:56:00 kernel pid 401 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 11:11:08 kernel pid 401 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 11:22:52 kernel pid 400 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 02:11:29 kernel pid 54431 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:18:04 kernel pid 589 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:18:20 kernel pid 32405 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:27:51 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:33:24 kernel pid 60772 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) -
@bmeeks Do you think this is a php package bug or another kernel lock bug? I don't use snort or suricata but do use pfblockerNG and now that I've seen this issue, I think holding off on 21.02_1 too until it's isolated.
-
@rsherwood_va said in 21.02-p1 really fix the issue on SG-3100?:
Here is a screenshot from my dashboard of installed packages. During the upgrade, I removed pfBlockerNG, which is the only one I had installed, and which was disabled in any case.
I do see signal 11 errors in the log:
Feb 17 20:55:30 kernel pid 390 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 21:10:38 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 21:46:12 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 22:50:18 kernel pid 27679 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 17 23:08:17 kernel pid 30028 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 10:56:00 kernel pid 401 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 11:11:08 kernel pid 401 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 23 11:22:52 kernel pid 400 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 02:11:29 kernel pid 54431 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:18:04 kernel pid 589 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:18:20 kernel pid 32405 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:27:51 kernel pid 399 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 26 06:33:24 kernel pid 60772 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped)Thank you for the info. That confirms my suspicion.
-
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
@bmeeks Do you think this is a php package bug or another kernel lock bug? I don't use snort or suricata but do use pfblockerNG and now that I've seen this issue, I think holding of on 21.02_1 too until it's isolated.
I'm leaning towards it being a PHP bug on 32-bit ARM hardware. Especially since it seems several packages are impacted in similar ways. Whatever this problem is, it's not caused by the latest 21.02_1 update (nor is that update likely to fix it). It looks like something that came in with FreeBSD-12.2/STABLE.
-
I'm also affected by it..
Feb 25 19:52:38 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Removing pfblockerNG-devel completely, solved the problem for me.
I don't use Snort/Suricata -
Almost the same here, I was using pfBlockerNG (non-dev).
Tested both: cpu restriction on/off.Disabling pfBlockerNG (it's still installed) solved the problem.
CPU restriction is NOT necessary. -
@bmeeks said in 21.02-p1 really fix the issue on SG-3100?:
messages logged about a Signal 11 crash
Here we go, not so many but still there.
SG-3100, 21.02-RELEASE-p1Packages:
Avahi, Cron, iperf, openvpn-client-export, pfBlockerNG-dev, Service_WatchdogFeb 26 04:31:19 pfSense kernel: pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 26 07:35:22 pfSense kernel: pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 26 07:44:38 pfSense kernel: pid 374 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
I currently try over night with 21.02-RELEASE-p1 fixed to 1 CPU by loader.conf and pfBlocker enabled. If I enable pfBlocker without the CPU Limit in loader.conf pf stop forwarding traffic within half an hour.
-
@bmeeks Could you please post the FreeBSD PHP bug link and the pfSense tracking bug here for reference so that we can follow?
I think I'm going to hold off until this bug is fixed before upgrading to 21.02_x unless the PHP package can be fixed as part of a package update.
-
@solarizde said in 21.02-p1 really fix the issue on SG-3100?:
@bmeeks said in 21.02-p1 really fix the issue on SG-3100?:
messages logged about a Signal 11 crash
Here we go, not so many but still there.
SG-3100, 21.02-RELEASE-p1Packages:
Avahi, Cron, iperf, openvpn-client-export, pfBlockerNG-dev, Service_WatchdogFeb 26 04:31:19 pfSense kernel: pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 26 07:35:22 pfSense kernel: pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 26 07:44:38 pfSense kernel: pid 374 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
Thanks! These reports will, I hope, help make the case the problem is really in the PHP engine and not the packages themselves. Snort, Suricata, Unbound and pfBlockerNG-dev are all triggering Signal 11 crashes. And Snort, Suricata and pfBlockerNG-dev are all doing so in the PHP engine. You really should never be able to crash PHP itself.
-
@solarizde Remove pfblockerNG, not sure if only disabling it would solve this issue.
I opened a TAC ticket with Netgate #INC-76936, and they said: "dev knows about this already, and there's some work to be done beforehand as well. Mainly, the following is somewhat of a per-requisite:https://redmine.pfsense.org/issues/5413 "
So, they are working on it..
They did a great job with 21.02p1 and I know that they will do it again.. -
@mcury said in 21.02-p1 really fix the issue on SG-3100?:
So, they are working on it..
That redmine Issue is related to a DNS service interrupion which is bad too, but not as bad as the Sig11 on pf.
This is the better place:
redmine.pfsense.org/issues/11444@mcury said in 21.02-p1 really fix the issue on SG-3100?:
Remove pfblockerNG
Sure this will "fix" the Crash but I want to figure out in which case it really happens. If it happen with 1 CPU Disabled too it is not so much reltated to the memory baricade bug.
-
@solarizde Thanks, I thought that one was already closed, they reopened.
Nice to see that they are taking this seriously :) -
@solarizde said in 21.02-p1 really fix the issue on SG-3100?:
Sure this will "fix" the Crash but I want to figure out in which case it really happens. If it happen with 1 CPU Disabled too it is not so much reltated to the memory baricade bug.
I'm kind of a noob regarding this technical stuff, cores and such.. but what I understood is that one CPU is trying to read in memory while the other is still writing to it, basically some kind of sync issue between cores, but again, I'm noob and maybe got it all wrong :)
-
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
PHP bug link
Here is another about the PHP crash/signal 11 on SG-3100.
21.02-p1 fixed a different locking issue in the kernel on SG-3100s.
-
@teamits Perfect. Thank you.
I'm a bit more concerned about some of the other open issues cited here in earlier posts. One bug has been open for 5 years; I hope it's not a dependency.
I was joking the other day with my ex-NSCP coworkers that there's an open Thunderbird (Mozilla) issue in Bugzilla that's been open 20 YEARS and still isn't fixed. https://bugzilla.mozilla.org/show_bug.cgi?id=92165
"[Free software] is only 'free' if your time has no value." - jwz (he was talking about linux)
-
@mcury Basically the 3100 fix was to address a missing "memory barrier" instruction on the arm7 platform.
Since modern CPUs can execute instructions out of order to speed execution, there are times where a process needs to guarantee that all previous instructions are complete (and not being executed still in parallel or out-of-order). This is usually to prevent a race/deadlock condition.
More info here: https://en.wikipedia.org/wiki/Memory_barrier
-
FYI there are two new redmine bugs to track the behavior being seen. Both are related to the FreeBSD php bug.
https://redmine.pfsense.org/issues/11466 "Snort exit with sig 11 on SG-3100"
https://redmine.pfsense.org/issues/11551 "SG-3100 with pfBlockerNG doesn't pass traffic"
This MAY be the tracking bug for the php crash at it was a recent report with FreeBSD 12.1 but the new pfSense 21.02 is using FreeBSD 12.2. The last comment asks if it indeed is a continuing issue on 12.2:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244049
-
Some observations during the Weekend:
hw.ncpu=unset, all non default Packages diabled = Stable running 16h without problems
hw.ncpu=unset, pfBlocker-dev and avahi enabled = crash after 1-6h most frequent after pfBlocker update run
hw.ncpu=1, pfBlocker-dev and avahi enabled = stable now since ~15h -
@solarizde said in 21.02-p1 really fix the issue on SG-3100?:
Some observations during the Weekend:
hw.ncpu=unset, all non default Packages diabled = Stable running 16h without problems
hw.ncpu=unset, pfBlocker-dev and avahi enabled = crash after 1-6h most frequent after pfBlocker update run
hw.ncpu=1, pfBlocker-dev and avahi enabled = stable now since ~15hIdentical experience for me on SG-3100, if pfBlocker and two processors enabled then lockup after 6-10hrs. Altering config to 1 cpu has now given me 4 days of stable run time
-
@shadtheman Im also running since Sunday with 2 CPU but pfBlocker disabled, no crash.
-
ok it's defenitly still something wrong with PHP. Yesterday I enabled pfBlocker again, and even running on hw.ncpu = 1 it crashed again:
Mar 6 11:39:21 pfSense syslogd: exiting on signal 15 Mar 6 16:03:29 pfSense kernel: pid 357 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Mar 7 04:30:00 pfSense syslogd: exiting on signal 15 Mar 7 04:31:18 pfSense kernel: pid 374 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Mar 7 09:19:46 pfSense syslogd: exiting on signal 15
I will now go to 2 CPUs and disable all packages leaving my pfSense cripled :(
-
I upgraded my SG-3100 to 21.02_1 and pfB-DEVEL _15 this week and I have ZERO php signal 11 messages in my logs. Everything is running smoothly.
You might try upgrading with no snort, suricata and pfB and then re-add them in a default config one by one, then start layering config changes and watching.
-
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
I upgraded my SG-3100 to 21.02_1 and fsB-DEVEL _15 this week and I have ZERO php signal 11 messages in my logs. Everything is running smoothly.
You might try upgrading with no snort, suricata and pfB and then re-add them in a default config one by one, then start layering config changes and watching.
Did you reboot after installing pfblockerng?
-
@mcury I did but only to check on another bug of unbound not restarting after the update of pfB-devel. I've opened a bug on that issue. Unbound starts properly on boot.
https://redmine.pfsense.org/issues/11632
-
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
unbound not restarting after the update of pfB-devel
The package maintainer has posted this is a pfSense issue. I can’t find it right now but IIRC it was timing in the package installation. That said it may be occasional as I’ve had it not work a couple times and then one yesterday started fine. The post was in one of the early pfBlocker 3.0.0 version posts I think, or around then. Just check and start after update.
-
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
@mcury I did but only to check on another bug of unbound not restarting after the update of pfB-devel. I've opened a bug on that issue. Unbound starts properly on boot.
https://redmine.pfsense.org/issues/11632
hm, so pfb 3.0.0_15 is working for you.. Are there other users here that are also running pfblokerng 3.0.0_15 successfully ?
Are you running with default configuration in pfblocker?
-
re: unbound not starting:
https://forum.netgate.com/topic/159094/pfblockerng-v3-0-0_6-update/4
and
https://redmine.pfsense.org/issues/11398
Short answer: check and start it after updating pfBlocker.@mcury said in 21.02-p1 really fix the issue on SG-3100?:
other users here that are also running pfblokerng 3.0.0_15 successfully
We haven't upgraded any SG-3100s but have several in service at our clients so I've been keeping an eye on it. From the various redmine bug reports (at least some linked above) it seems like php-fpm crashes during certain functions (e.g. preg_match) in certain code configurations. My take is it's not a pfBlocker or Snort or Suricata coding issue, it's PHP crashing and that's not going to be very fixable in a package update. Maybe we get lucky and it can be worked around, but it has been a few weeks already. So my personal advice would be for anyone with a 3100 to just be patient and plan to not update for a while, and set System/Update to "previous stable version" if any packages need to be installed or updated, so it doesn't try to install 2.5 packages and dependencies.
-
Upgraded PfBlocker to 2.1.4_25 (just become available) 30 hours ago and have been running happily with both processors enabled for this time, fingers crossed.
https://github.com/pfsense/FreeBSD-ports/commit/b336bf5010920047bf4f607e3b2dfe4d56d9d79f#diff-154b33468fc170ed5c2281d7908ea8f9dc318193eea329feaf5a1df09a4d9da4
-
Hi,
I have upgraded to the 21.02-RELEASE-p1 (arm) a while ago. I tried installing surricata, snort, zeek, they all crash after a while. I do also see the bug with the php. Every 15 minutes like clockwork it seems to crash. I do not get lock ups on the firewall, it seems to run mostly fine other than the crashed of php.
Is there any known issue for this? Or am I replying on an old thread?
-
To clarify the above, I am no longer running snort, surricata or zeek. I am running pfBlocker however.
I have also noticed the php feeling rather sluggish, but that could just be me being impatient. -
@nokkief said in 21.02-p1 really fix the issue on SG-3100?:
Is there any known issue for this? Or am I replying on an old thread?
This is the thread about that. See the Redmine links to the bug reports above, e.g.:
@lohphat said in 21.02-p1 really fix the issue on SG-3100?:
https://redmine.pfsense.org/issues/11466 "Snort exit with sig 11 on SG-3100"
Basically PHP on the 3100 has some issues with certain functions that crashes PHP. re: every 15 minutes, do you have something scheduled for every 15 minutes?
-
@teamits Yeah, but the #11444 is about it halting the firewall. I had that before, but not after the latest patch. This one (issue 11444) also has been marked as resolved.
The issue however remains for the php crashing for me. I do not recall anything being scheduled every 15 minutes. pfBlocker updates at intervals of at least 4 hours or more. it says it is a kernel dump. Could it be wireguard? I just realized it is running, although I have no clients connected to it right now.
-
@teamits
Oh, I am blind. I now see the other issue mentioned, apologies