Snort won't start after upgrade to 21.02 on SG-3100
-
i am experiencing the same issue
-
Signal 10 means the executing binary attempted a non-aligned memory access. This issue is unique to the ARM processor used in the SG-3100 appliance. It is actually caused by the ARM compiler choosing to use a speedier, but not always backwards compatible, binary instruction for loading data into a CPU register.
Been through this multiple times. The issue becomes one of finger-pointing between developers of old software such as Snort and the writers of modern compilers such as
llvm
used to compile ARM machine code on FreeBSD. While the root issue is the incorrect use of unsafe type-casting by the C source code in Snort, the practical case is that every Intel CPU on the planet (all x86 and x86-64 chips) will auto-fixup non-aligned memory access and never skip a beat nor will they abort the executing code. True there is a slight speed penalty for the auto-fixup, but it's a fair trade-off when compared to just aborting the running program.The writers of the ARM compiler (and the ARM chip developers) decided to not always implement auto-fixup like Intel chips. Instead, when a piece of old C code attempts to access something in memory not on word-aligned boundaries, the ARM CPU will throw a bus error and abort the running code. While this is the "proper" thing to do, it really throws a monkey wrench in things because the same code works just fine on Intel hardware.
So where does that leave us? Stuck with a situation where certain older code (like Snort and sometimes Suricata and other popular pfSense packages) works just fine on Intel hardware but will randomly crash and burn with Signal 10 errors on the ARM hardware used in some of the Netgate appliances. Because the ARM market is so small, and the amount of work required to fix the legacy code is so large, none of the upstream maintainers (meaning the Snort folks) are really interested in investing the time to fix the old C source code. So that's it. You are basically stuck with Snort and some other packages not working correctly on ARM hardware when compiled with the
llvm
compiler tools.There was a workaround in place in the older pfSense packages repository that caused the ARM versions of Snort to be compiled with debugging enabled. This stopped the
llvm
compiler from choosing that "non-optimal" register load instruction. Perhaps that setting did not get migrated to the new 21.02 package repo compiler ??My advice, for what it's worth- either abandon Snort and Suricata on ARM hardware; or abandon the ARM hardware and run Snort and Suricata on Intel CPU platforms.
-
After the Intel Atom problems I had with my SG-2220, I "upgraded" to the SG-3100. It looks like most of Netgate's low-end line is now ARM processors at this point. While a quick glance at the product specs didn't mention Snort specifically, it did mention IDS/IPS as a supported feature.
I've been fairly unhappy with the SG-3100 as I've had an issue with it randomly rebooting for a while now. (Which seems to be somewhat common) This is really putting the nail in that coffin if ARM can't support software I'd like to have running on my gateway. I'm paying for a Snort subscription which I can't use right now and Snort rules are commonly distributed in various netdefense communities.
This is unfortunate if the ARM architectures can't really be supported then.
-
@rsm4 said in Snort won't start after upgrade to 21.02 on SG-3100:
After the Intel Atom problems I had with my SG-2220, I "upgraded" to the SG-3100. It looks like most of Netgate's low-end line is now ARM processors at this point. While a quick glance at the product specs didn't mention Snort specifically, it did mention IDS/IPS as a supported feature.
I've been fairly unhappy with the SG-3100 as I've had an issue with it randomly rebooting for a while now. (Which seems to be somewhat common) This is really putting the nail in that coffin if ARM can't support software I'd like to have running on my gateway. I'm paying for a Snort subscription which I can't use right now and Snort rules are commonly distributed in various netdefense communities.
This is unfortunate if the ARM architectures can't really be supported then.
The ARM architecture has been a bit of a bear due to this
llvm
compiler thing and non-aligned memory access. That particular bad practice (non-aligned memory accesses) is typical in a lot of older software written in C, but because Intel CPUs always just fixed it for you on the fly, nobody changed the poorly written C source code. That's why we have the problem today on ARM hardware.I personally have only used Intel hardware, and will always continue to do so for my own use. However, I am maintaining an SG-3100 at our church. But it is only a plain-vanilla firewall. There are no installed packages on it. And in the role of plain firewall, it works great.
I should also add that I do have a Netgate appliance for my own use, but it is the SG-5100 unit which is an Intel platform.
-
Thanks for the info. It's possible this is related to a current issue with the SG-3100. I've created a bug report here to keep track:
https://redmine.pfsense.org/issues/11466 -
@marcos-ng Yes it is
Feb 18 18:04:29 Scimitar kernel: pid 84003 (php), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:04:37 Scimitar kernel: pid 348 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:14:07 Scimitar kernel: pid 34972 (php), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:23:14 Scimitar kernel: pid 357 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:53:01 Scimitar kernel: pid 678 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:55:08 Scimitar kernel: pid 94252 (php), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 18:57:27 Scimitar kernel: pid 357 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 22:30:53 Scimitar kernel: pid 760 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 19 09:41:30 Scimitar kernel: pid 678 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 19 09:52:18 Scimitar kernel: pid 7881 (php), jid 0, uid 0: exited on signal 11 (core dumped)
-
@marcos-ng said in Snort won't start after upgrade to 21.02 on SG-3100:
Thanks for the info. It's possible this is related to a current issue with the SG-3100. I've created a bug report here to keep track:
https://redmine.pfsense.org/issues/11466No, not likely related. Signal 10 Bus Error aborts are caused by non-word aligned access to memory. A Signal 11 is a conventional memory segmentation fault where a program attempted to access a memory region it was not authorized to access.
I am very familiar with the Snort issue on SG-3100 and SG-1000 hardware as I debugged and identified the cause shortly after the SG-3100 was introduced. As I stated, there previously was a partial workaround in place by compiling the Snort and Suricata packages with debugging enabled to prevent the
llvm
compiler from using the specific binary instructions that do not perform auto-fixup for non-aligned access. Either that workaround has ceased to function in FreeBSD-12.2, or the package compiler switch was toggled off. I suspect it may be the former, though.P.S. -- I am the volunteer developer and maintainer for the Snort and Suricata packages on pfSense, so I know them both intimately.
-
Thank you for the input and all of the hard work! :)
It sounds like sig 10 is its own regression then, and sig 11 being the one possibly related to the issue linked on the redmine entry.
-
Just to be clear this issue affects Suricata also?
-
@teamits said in Snort won't start after upgrade to 21.02 on SG-3100:
Just to be clear this issue affects Suricata also?
Yes, it likely can. No direct reports yet, but I would not be surprised. Especially in the case of the PHP code that is causing the Signal 11 segfault in PHP. That code is identical in the two packages. And it's not really the code itself that is the problem. It's the PHP engine underneath that is crashing. The same GUI code is used in all of the pfSense images (CE and pfSense+, and for all hardware platforms such as aarch64, x86-64 and arm. So if the upper-level GUI code was faulty, you would expect it to fail on all the platforms. That's not the case. It is only failing on the ARM 32-bit platform.
-
Yeah, I saw your posts in other threads, after mine. Sure sounds like PHP. Unfortunately that presumably means it's on someone upstream to fix it which means a 21.02p2. Good sleuthing.
-
Just for some analytics on Snort and PHp crashes, I ran a Splunk query against my syslogs from the router going back 30 days to see what was logged:
2021-02-18T14:02:43.000-0500,Feb 18 14:02:43 kernel: pid 46827 (snort) uid 0: exited on signal 10 2021-02-18T14:07:54.000-0500,Feb 18 14:07:54 kernel: pid 56899 (snort) uid 0: exited on signal 10 2021-02-18T14:10:04.000-0500,Feb 18 14:10:04 kernel: pid 21583 (snort) uid 0: exited on signal 10 2021-02-18T14:13:13.000-0500,Feb 18 14:13:13 kernel: pid 78778 (php-cgi) uid 0: exited on signal 11 (core dumped) 2021-02-18T14:18:38.000-0500,Feb 18 14:18:38 kernel: pid 1154 (snort) uid 0: exited on signal 10 2021-02-18T14:23:24.000-0500,Feb 18 14:23:24 kernel: pid 75058 (snort) uid 0: exited on signal 10 2021-02-18T14:26:01.000-0500,Feb 18 14:26:01 kernel: pid 26020 (snort) uid 0: exited on signal 10 2021-02-18T14:26:30.000-0500,Feb 18 14:26:30 kernel: pid 97052 (snort) uid 0: exited on signal 10 2021-02-18T14:29:12.000-0500,Feb 18 14:29:12 kernel: pid 84487 (snort) uid 0: exited on signal 10 2021-02-18T14:53:13.000-0500,Feb 18 14:53:13 kernel: pid 63165 (snort) uid 0: exited on signal 10 2021-02-18T14:55:54.000-0500,Feb 18 14:55:54 kernel: pid 64348 (snort) uid 0: exited on signal 10 2021-02-18T15:04:04.000-0500,Feb 18 15:04:04 kernel: pid 17533 (php-fpm) uid 0: exited on signal 11 (core dumped) 2021-02-18T15:05:49.000-0500,Feb 18 15:05:49 kernel: pid 9318 (snort) uid 0: exited on signal 10 2021-02-18T15:11:43.000-0500,Feb 18 15:11:43 kernel: pid 65338 (snort) uid 0: exited on signal 10 2021-02-18T15:22:04.000-0500,Feb 18 15:22:04 kernel: pid 24027 (snort) uid 0: exited on signal 10 2021-02-18T19:21:13.000-0500,Feb 18 19:21:13 kernel: pid 5625 (snort) uid 0: exited on signal 10 2021-02-20T10:22:02.000-0500,Feb 20 10:22:02 kernel: pid 42369 (php-cgi) uid 0: exited on signal 11 (core dumped) 2021-02-20T10:24:28.000-0500,Feb 20 10:24:28 kernel: pid 74738 (snort) uid 0: exited on signal 10 2021-02-21T16:06:59.000-0500,Feb 21 16:06:59 kernel: pid 30776 (snort) uid 0: exited on signal 10 2021-02-21T16:38:27.000-0500,Feb 21 16:38:27 kernel: pid 75666 (snort) uid 0: exited on signal 10 2021-02-21T19:28:30.000-0500,Feb 21 19:28:30 kernel: pid 67353 (snort) uid 0: exited on signal 10 2021-02-21T19:43:31.000-0500,Feb 21 19:43:31 kernel: pid 86017 (snort) uid 0: exited on signal 10 2021-02-21T19:48:33.000-0500,Feb 21 19:48:33 kernel: pid 81269 (snort) uid 0: exited on signal 10 2021-02-24T01:36:24.000-0500,Feb 24 01:36:24 kernel: pid 81513 (snort) uid 0: exited on signal 11 2021-02-24T01:36:26.000-0500,Feb 24 01:36:26 kernel: pid 62078 (snort) uid 0: exited on signal 11 2021-02-25T22:28:10.000-0500,Feb 25 22:28:10 kernel: pid 78826 (php-fpm) uid 0: exited on signal 11 (core dumped) 2021-02-25T22:29:59.000-0500,Feb 25 22:29:59 kernel: pid 73568 (php-fpm) uid 0: exited on signal 11 (core dumped) 2021-02-25T22:34:50.000-0500,Feb 25 22:34:50 kernel: pid 77758 (php-fpm) uid 0: exited on signal 11 (core dumped) 2021-02-25T22:35:09.000-0500,Feb 25 22:35:09 kernel: pid 28596 (snort) uid 0: exited on signal 11 2021-02-25T22:35:11.000-0500,Feb 25 22:35:11 kernel: pid 28801 (snort) uid 0: exited on signal 11 2021-02-25T23:15:01.000-0500,Feb 25 23:15:01 kernel: pid 60474 (snort) uid 0: exited on signal 10 2021-02-26T07:15:01.000-0500,Feb 26 07:15:01 kernel: pid 32892 (snort) uid 0: exited on signal 10 2021-02-27T09:24:14.000-0500,Feb 27 09:24:14 kernel: pid 45195 (snort) uid 0: exited on signal 10 2021-02-28T10:15:01.000-0500,Feb 28 10:15:01 kernel: pid 53195 (snort) uid 0: exited on signal 10 2021-02-28T21:12:07.000-0500,Feb 28 21:12:07 kernel: pid 21339 (snort) uid 0: exited on signal 10
Feb 18 was the day I upgraded my SG-3100, so, as expected, no crashes were occurring prior to that. (I can go back further--there were no crashes in the last 180 days)
Also, it's not clear to me that the PHP and Snort crashes are related. I have a few PHP crashes and some seem coincident with Snort, but Snort definitely crashes without PHP crashing.
My SG-3100 is removed from service so now I can start Snort and it won't crash until later. Incidentally, the pfBlockerNG update will almost always cause it to crash. This could be because Snort has traffic to scan at this point. (Traffic to the WebConfigurator doesn't count? Perhaps because it is SSL?)
-
is there a fix for this in the hopper? and when should we expect it? i had to rollback to 2.4.5p1 thus i am still waiting for the fix
-
Im having the same problem aswell, SG-3100 on 21.02-p1. Snort is not visible anymore. I managed to uninstall it the package but when trying to reinstall it it just hangs on "Please wait while the update system initializes". Is it possible to revert to 2.4.5? And if so, how can I do this? It seems there is no official rollback feature that I can use to perform a rollback.
-
@styxl said in Snort won't start after upgrade to 21.02 on SG-3100:
is there a fix for this in the hopper
I'm not a dev but if the issue is with PHP then it could go all the way back to Zend to find and fix it. And then it would presumably be in a p2 patch. I'm not optimistic it will be "soon."
@rek0n I haven't done this myself but this seems like the right path:
- get a copy of the 2.4.5 installer from Netgate (go.netgate.com)
- install 2.4.5
- set to Previous Stable Version in System/Update (to install 2.4.5 packages not 2.5)
- install desired packages
- restore configuration
- double check Previous Stable Version is still set
-
@teamits It seems that since I do not have an active subscription, so I am unable to create a ticket. I tried searching ftp repositories, and discovered that there are community releases, but these seem all to be amd64. Since I have the SG-3100 that is on arm64 Im wondering where I could obtain the SG-3100 arm64 2.4.5 image..
-
There have been no news as of yet. To roll back, you may create an account and request the previous stable version. A support subscription is not required.
-
You can follow the status of the bug here: https://redmine.pfsense.org/issues/11466.
As you can see in the notes, I worked on finding the bug as far as I could go. The issue is within PHP itself, and appears limited to the 32-bit ARM processor in the SG-3100. I say this because the identically same PHP code runs without issue on 64-bit ARM hardware such as the SG-1100 and also on all Intel hardware. PHP crashing is why Snort won't start.
-
It is sad we are being forced to choose between security/stability and access. The SG-3100 is a good viable product and a lot of Small to Medium Enterprises use it. I am still surprised that Netgate is yet to patch this knowing a lot of their SG-3100 installs are using SNORT as an IPS/IDS. Personally i chose to rollback and wait until a patch is published but one of my peers decided to ditch the SG-3100 and buy an SG-5100.... unfortunately i dont have that kind of money.
-
@styxl said in Snort won't start after upgrade to 21.02 on SG-3100:
Netgate is yet to patch this
But that's the thing, if it's a PHP problem Netgate may have to wait for Zend to fix it? Zend tends to update once a month and the March 4 update just came out. Plus if it's a compilation bug and not a code bug I would think that makes it harder. And if a new PHP is included in a 21.2-p2 then Netgate would presumably need to test all of pfSense before release. (I don't have any inside knowledge of this, I'm just connecting dots.)