SG-3100 Very Unstable
-
It's not normal but there isn't enough detail to say what might be happening.
Unbound crashes randomly. Clamd crashes about every 8 hours
What do you mean when you say these services "crash"? Do the services stop? Is there an error in the logs? If so, what errors are in the main system log or other relevant logs?
Running MTR can bring down the WebConfigurator
Running MTR in the GUI from the package? Or from the console? Or from a client behind the firewall? And what does "bring down the webConfigurator" mean? It becomes unreachable? It gives an error? The service stops? Or what? (Also: See above, re: Log messages)
updating Snort rules makes the system unresponsive
As in nothing passes through the firewall at all? And it doesn't respond on the console? Or something else?
-
To add to jimp ?s what version are you running? I have 2 3100's in production and not showing any issues at all.. Unbound has zero problems.. Are you having it register dhcp? That is going to cause restarts of it... Are you running pfblocker - that again will cause it to restart.
Are you doing tls forwarding with unbound - there is a known memory link I do believe that causes it issues every few days.
Mine are both on 2.4.4 upgrades went fine.. From 2.4.3p1
-
The system log shows signal 10 for clamd. I regularly see clad stopped in status -> services. I am running 2.4.4 with Snort, pfBlocker-ng, ntopng and squid + clamd. Right now I can see Logmein clients online but can't connect to them. OpenVPN is not listening from outside.
-
I am running pfBlocker-NG and I am registering DHCP clients in unbound and I am using TLS from unbound -> WAN
-
I really haven't been able to find much useful in the logs but I know that the system keeps filling up it's RAM and there is no swap.
-
@0daymaster said in SG-3100 Very Unstable:
I am using TLS from unbound
the system keeps filling up it's RAMThere is a KNOWN memory leak...
https://redmine.pfsense.org/issues/9059And there have been atleast a couple of threads about it..
https://forum.netgate.com/topic/137028/unbound-dns-over-tls-memory-leak/2
https://forum.netgate.com/topic/136598/sg3100-needs-to-reboot-every-few-days-after-2-4-4-upgrade/9 -
Got it. Thanks. Now I just have to drive 75 miles and fix it before my boss finds out about this. Lol.
-
just a patch.. Don't need to be local to it.. So this is work site and you were using tls forwarding -- WTF??
-
Why not use TLS forwarding? I mean, I get that there was a thread about a known compatibility issue that I missed but I've been using TLS forwarding on ipfire for years without incident. What issue do you take with forwarding DNS requests over TLS?
-
I take issue to forwarding dns anywhere ;) Why would you not just resolve??
Why would you need to hide the dns security and add the latency and overhead to your queries even if going to forward? For a work connection - makes zero sense.. Such a configuration is for someone wanting to hide their dns queries form the big bad isp they use that spies on them ;)
-
@0daymaster said in SG-3100 Very Unstable:
The system log shows signal 10 for clamd. I regularly see clad stopped in status -> services. I am running 2.4.4 with Snort, pfBlocker-ng, ntopng and squid + clamd. Right now I can see Logmein clients online but can't connect to them. OpenVPN is not listening from outside.
Signal 10 is a hardware bus error. It is thrown by the ARM processor in the SG-3100 when an un-aligned memory access is attempted. This same problem occurred in Snort on the SG-3100. The problem is caused by the optimization performed by default by the cross-compiler for ARM used with FreeBSD. The general fix is to compile the affected executable with compiler optimizations turned off. Nine times out of ten that will fix the Signal 10 issue. The crash can be seemingly random because everything runs fine until a particular section of "optimized" instruction codes is encountered -- then the Signal 10 error can occur. You will need to find a clamd executable compiled with compiler optimizations disabled.
-
Thanks, but my boss would never even consider running a binary from anywhere besides the Netgate repos. I will have to look into getting a recompiled clamd binary from Netgate.
-
@0daymaster said in SG-3100 Very Unstable:
Thanks, but my boss would never even consider running a binary from anywhere besides the Netgate repos. I will have to look into getting a recompiled clamd binary from Netgate.
I've not used the Squid package on pfSense. Is clamd bundled with that, or did you load clamd from an independent repository? I'm guessing based on your quote above that clamd is part of the Squid package. If so, the pfSense Team can modify the compiler config file for clamd so that compiler optimizations are disabled when compiling for armv6 hardware. Try working through their Support Team. I will also drop a note to one of the Netgate team members about this issue.
-
We're aware and working on it. New version of clamav for ARM should be up at some point today for testing on 2.4.5. If it's stable there, we'll copy it back to 2.4.4.
-
@jimp said in SG-3100 Very Unstable:
We're aware and working on it. New version of clamav for ARM should be up at some point today for testing on 2.4.5. If it's stable there, we'll copy it back to 2.4.4.
Thanks Jim! Ignore my email note. I had just hit SEND when I got another email notice of your reply here.
-
You should now see squid pkg version 0.4.44_7 which includes the recompiled clamav package.
-
Thanks jimp! Just updated.
-
Recently I have also been experiencing issues with my SG-3100. I tried a fresh install and then discovered the unbound memory leak, which I updated per negate guidance. Despite these efforts, my device continues to restart after a few days of use. Current packages include snort, pfBlockerNG-Devel, Acme, and OpenVPN exporter. No restarts have provided any useful logs which I believe may suggest a potential hardware issue. I checked the physical layer and also changed DNS servers. Here is a snapshot of my system log - can anyone explain why e6000sw0port3 keeps going up/down:
Time Process PID Message
Nov 16 03:09:41 check_reload_status Reloading filter
Nov 15 19:09:40 kernel e6000sw0port3: link state changed to UP
Nov 16 03:09:40 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:38 check_reload_status Reloading filter
Nov 15 19:09:37 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:09:37 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:34 check_reload_status Reloading filter
Nov 15 19:09:33 kernel e6000sw0port3: link state changed to UP
Nov 16 03:09:33 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:32 check_reload_status Reloading filter
Nov 15 19:09:31 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:09:31 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:07:41 check_reload_status Reloading filter
Nov 15 19:07:40 kernel e6000sw0port3: link state changed to UP
Nov 16 03:07:40 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:07:38 check_reload_status Reloading filter
Nov 15 19:07:37 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:07:37 check_reload_status Linkup starting e6000sw0port3 -
@m3nt0r123 said in SG-3100 Very Unstable:
can anyone explain why e6000sw0port3 keeps going up/down:
Do you have a DHCP WAN with advanced options enabled, perhaps?
https://redmine.pfsense.org/issues/8507#note-15 -
No. I have not touched the Advanced Configuration checkbox under WAN.