Still seeing suricata stop an interface due to .pid error
-
@boobletins said in Still seeing suricata stop an interface due to .pid error:
I started with Snort and switched to Suricata ~1 year ago. Snort has no configured interfaces, but it looks like it was still updating rules (I didn't notice they were still running until you pointed it out). I'll remove the pkg entirely if that might be causing an issue.
I agree it looks like the interfaces are cycling. It looks like both LAN igb0 and WAN em0 cycled around the same time. It looks to me like rc.gateway_alarm detects 22% packet loss and cycles the interfaces assuming something went wrong.
I'll try to investigate what's going on with the interfaces, but in the meantime, would it be an acceptable solution to modify rc.start_packages (or similar) to simply rm *.pid in the appropriate suricata directories? Or would that be dangerous in ways I don't understand?
No, no danger at all in deleting the file. But that is just going to be masking the problem. The file being there is a symptom caused by something crashing Suricata.
I would try changing whatever host you are "pinging" for the gateway monitoring. Maybe that host is tardy responding to ICMP requests or even drops them when it gets busy.
-
-
@val said in Still seeing suricata stop an interface due to .pid error:
PM you the log file....it's way to big to post here.
Thanks bmeeks.
I looked through you log file. What version of the Snort Rules Snapshot file are you using? You should be using only rules packages for Snort 2.9.x if you are running Snort rules with Suricata. Your file name should be snortrules-snapshot-29120.tar.gz. Do not use the Snort3 rules (that means do not use any Snort rules file with 3 in the name). You should not be seeing those "unknown reference" error messages. The only time I've noticed those is when the user has downloaded the rules meant for use only with the new Snort3 beta package from the Snort team.
-
The version I am using and file name it's:-
snortrules-snapshot-29111.tar.gzThanks for that info I will change it to see if process still kill it self.
-
Just want to add I have been having the same issue with the Suricata .pid file becoming stale, and the engine failing to restart because of this after a crash. I am using SG-3100. I also run pfBlocker. I notice when I am tweaking setting within here, most of the time, that's when Suricata crashes. Having to manually rm the .pid file to start Suricata.
-
@bhjitsense said in Still seeing suricata stop an interface due to .pid error:
Just want to add I have been having the same issue with the Suricata .pid file becoming stale, and the engine failing to restart because of this after a crash. I am using SG-3100. I also run pfBlocker. I notice when I am tweaking setting within here, most of the time, that's when Suricata crashes. Having to manually rm the .pid file to start Suricata.
The SG-3100 crash is due to a compiler optimization problem for armv6 and armv7 CPUs (like those used in the SG-1000 and SG-3100 appliances). I've been in contact with the pfSense team about this, but so far there is no resolution posted. The only fix for now is to NOT run Suricata on SG-3100 hardware. If you do, it will continue to randomly crash with the Signal 10 Bus Error. The Signal 10 crash leaves the PID file in place, so the next time you attempt to start Suricata it will see the file remaining from the previously crashed instance and complain. The stale PID file is a symptom and not a cause in this case.
You can research Google for what Signal 10 Bus Errors are and what causes them. It is due to the clang/llvm compiler generating machine opcodes that do not support unaligned memory access on arm processors.
-
@bmeeks said in Still seeing suricata stop an interface due to .pid error:
The only fix for now is to NOT run Suricata on SG-3100 hardware.
Hmm, interesting. I checked the router I had posted about a week ago, and Suricata is still running. Is the crash random or only when making changes as @bhjitsense suggested might be the case? (I have made no changes in the past week)
-
@teamits said in Still seeing suricata stop an interface due to .pid error:
@bmeeks said in Still seeing suricata stop an interface due to .pid error:
The only fix for now is to NOT run Suricata on SG-3100 hardware.
Hmm, interesting. I checked the router I had posted about a week ago, and Suricata is still running. Is the crash random or only when making changes as @bhjitsense suggested might be the case? (I have made no changes in the past week)
It will appear to be somewhat random, although in fact each and every time a particular piece of code is hit (the part with the opcodes I mentioned) the error will be thrown. The randomness comes in from the fact that the problem code might be part of an if-then statement in code where one piece of code is executed if a tested condition is true while a different section of code is executed if the condition tests as false. I have no idea where specifically in the Suricata code the problem lies. It very well could exist in several places.
This is not a problem on Intel hardware because all Intel CPUs will automatically fix-up and execute data loads or stores to unaligned addresses. Intel CPUs do this by default with the processor hardware. Since developers mostly target Intel hardware, they have all gotten quite complacent and sloppy with data access via structures and pointer casting in C programming. Intel hardware will cover for their sloppiness, but other hardware (like the armv6 and armv7 CPU) is not as forgiving. Of course part of the fault here also lies with the clang/llvm compiler used to produce the binary code for the arm hardware.
I also want to make clear this is an issue within the acutal Suricata binary code and has nothing at all to do with the PHP GUI code of pfSense. Remember all that the GUI packages for Suricata and Snort do on pfSense is provide a wrapper to let users easily create the text configuration files the underlying binaries need to run.
-
Thanks for the explanation. I meant to ask, is this issue with the latest release version of the Suricata package/binaries or prior versions also?
-
@teamits said in Still seeing suricata stop an interface due to .pid error:
Thanks for the explanation. I meant to ask, is this issue with the latest release version of the Suricata package/binaries or prior versions also?
It can very well exist in any previous version, but appears to have reared its head in the latest binary update. The explanation of what unaligned access is and how it happens in C programming code is a bit long-winded and requires a good bit of hardware understanding (things like memory bus widths, CPU register loads/stores and how memory access works at a hardware level) in order to fully grasp the concept. You can Google "unaligned memory access" and start your research if interested.
It could be that a simple code change that happened in the latest binary (maybe fixing some other bug) caused this issue on arm hardware. Key to understanding the impact of unaligned access issues is also understanding what role compilers play in producing the acutal binary instruction codes from the higher-level C programming code. On FreeBSD, the compiler used for this is clang/llvm. Linux traditionally uses gcc. Of course Windows uses Microsoft-supplied compilers. Each compiler will produce slightly different binary code from the exact same C high-level code. On FreeBSD, that compiler produces a poorly chosen set of opcodes for certain memory access operations. It chooses to use a pair of opcodes that the arm CPU cannot fix-up on the fly for unaligned access. There are other opcodes that can perform the same operation and the arm CPU can auto fix-up those memory accesses to prevent the unaligned access. Research LDM/STM and LDR/STR instruction opcodes on armv6 and armv7 microprocessors to see what I mean.
EDIT: up above, when I say "poorly chosen set of opcodes" I mean it chooses speed over compatibility. The opcodes it chooses to use (LDM/STM) do execute ever so slightly faster than their counterparts (LDR/STR), but the latter codes support unaligned memory access without crashing on the Signal 10 Bus Error. The former LDM/STM instructions will crash if the original Suricata binary source code programmer inadvertently asked the CPU to load or store a piece of data that causes an unaligned memory access.
There are also frequent battles of words between compiler developers and other C programmers. The compiler gurus say the C programmers should not be so sloppy and make sure to avoid unaligned access issues, but the C programmers retaliate with the old "well it works on Intel without issue" or "it works fine on gcc", etc. So that can mean a lot of posturing on each side with nothing useful really happening. There is reluctance for say the Suricata team to dive into this because it is only affecting Suricata running on arm hardware where Suricata was compiled by clang/llvm. That's a pretty small footprint of users compared to Linux land. Thus no incentive for the Suricata developers to spend hours trying to find where the issue is.
-
!@#%$(*!@#^%
I had a whole post written up describing how to solve this on my igb0 card and the forum thought it was spam and poof. Extremely frustrating.
IPV6 TX Checksums are not disabled via the GUI when they appear to be. Command line ifconfig can fix this issue (until a reboot). I can reliably reproduce this on my igb interface, haven't tested the em yet.
Hope this helps someone? If it lets me post..........
Edit: I give up, PMing you Bill... this is absurd.
-
@boobletins said in Still seeing suricata stop an interface due to .pid error:
!@#%$(*!@#^%
I had a whole post written up describing how to solve this on my igb0 card and the forum thought it was spam and poof. Extremely frustrating.
IPV6 TX Checksums are not disabled via the GUI when they appear to be. Command line ifconfig can fix this issue (until a reboot). I can reliably reproduce this on my igb interface, haven't tested the em yet.
Hope this helps someone? If it lets me post..........
Edit: I give up, PMing you Bill... this is absurd.
Replied to your PM.
-
@boobletins said in Still seeing suricata stop an interface due to .pid error:
!@#%$(*!@#^%
I had a whole post written up describing how to solve this on my igb0 card and the forum thought it was spam and poof. Extremely frustrating.
IPV6 TX Checksums are not disabled via the GUI when they appear to be. Command line ifconfig can fix this issue (until a reboot). I can reliably reproduce this on my igb interface, haven't tested the em yet.
Hope this helps someone? If it lets me post..........
Edit: I give up, PMing you Bill... this is absurd.
I'm not 100% positive this is the cause of the Signal 10 error. That error definitely indicates an unaligned memory access. Now it is possible that the checksum failures may be leading to that "particular section" of bad opcodes I mentioned in my earlier post getting executed and thus throwing the Signal 10 error.
-
@bmeeks said in Still seeing suricata stop an interface due to .pid error:
@val said in Still seeing suricata stop an interface due to .pid error:
PM you the log file....it's way to big to post here.
Thanks bmeeks.
I looked through you log file. What version of the Snort Rules Snapshot file are you using? You should be using only rules packages for Snort 2.9.x if you are running Snort rules with Suricata. Your file name should be snortrules-snapshot-29120.tar.gz. Do not use the Snort3 rules (that means do not use any Snort rules file with 3 in the name). You should not be seeing those "unknown reference" error messages. The only time I've noticed those is when the user has downloaded the rules meant for use only with the new Snort3 beta package from the Snort team.
Hi bmeeks
I have since moved away from suricata backon Snort for now, my internet connection it's through an PPPoE connection so from my understanding suricata doesn't play well with PPPoE.I have tried few difference thing all result the same suricata still kill it self and wouldn't start again til I delete the pid file.
Thanks for all the help.