SG-3100 Very Unstable

johnpoz

To add to jimp ?s what version are you running? I have 2 3100's in production and not showing any issues at all.. Unbound has zero problems.. Are you having it register dhcp? That is going to cause restarts of it... Are you running pfblocker - that again will cause it to restart.

Are you doing tls forwarding with unbound - there is a known memory link I do believe that causes it issues every few days.

Mine are both on 2.4.4 upgrades went fine.. From 2.4.3p1

0daymaster

The system log shows signal 10 for clamd. I regularly see clad stopped in status -> services. I am running 2.4.4 with Snort, pfBlocker-ng, ntopng and squid + clamd. Right now I can see Logmein clients online but can't connect to them. OpenVPN is not listening from outside.

0daymaster

I am running pfBlocker-NG and I am registering DHCP clients in unbound and I am using TLS from unbound -> WAN

0daymaster

I really haven't been able to find much useful in the logs but I know that the system keeps filling up it's RAM and there is no swap.

johnpoz

@0daymaster said in SG-3100 Very Unstable:

I am using TLS from unbound
the system keeps filling up it's RAM

There is a KNOWN memory leak...
https://redmine.pfsense.org/issues/9059

And there have been atleast a couple of threads about it..
https://forum.netgate.com/topic/137028/unbound-dns-over-tls-memory-leak/2
https://forum.netgate.com/topic/136598/sg3100-needs-to-reboot-every-few-days-after-2-4-4-upgrade/9

0daymaster

Got it. Thanks. Now I just have to drive 75 miles and fix it before my boss finds out about this. Lol.

johnpoz

just a patch.. Don't need to be local to it.. So this is work site and you were using tls forwarding -- WTF??

0daymaster

Why not use TLS forwarding? I mean, I get that there was a thread about a known compatibility issue that I missed but I've been using TLS forwarding on ipfire for years without incident. What issue do you take with forwarding DNS requests over TLS?

johnpoz

I take issue to forwarding dns anywhere ;) Why would you not just resolve??

Why would you need to hide the dns security and add the latency and overhead to your queries even if going to forward? For a work connection - makes zero sense.. Such a configuration is for someone wanting to hide their dns queries form the big bad isp they use that spies on them ;)

bmeeks

@0daymaster said in SG-3100 Very Unstable:

The system log shows signal 10 for clamd. I regularly see clad stopped in status -> services. I am running 2.4.4 with Snort, pfBlocker-ng, ntopng and squid + clamd. Right now I can see Logmein clients online but can't connect to them. OpenVPN is not listening from outside.

Signal 10 is a hardware bus error. It is thrown by the ARM processor in the SG-3100 when an un-aligned memory access is attempted. This same problem occurred in Snort on the SG-3100. The problem is caused by the optimization performed by default by the cross-compiler for ARM used with FreeBSD. The general fix is to compile the affected executable with compiler optimizations turned off. Nine times out of ten that will fix the Signal 10 issue. The crash can be seemingly random because everything runs fine until a particular section of "optimized" instruction codes is encountered -- then the Signal 10 error can occur. You will need to find a clamd executable compiled with compiler optimizations disabled.

0daymaster

Thanks, but my boss would never even consider running a binary from anywhere besides the Netgate repos. I will have to look into getting a recompiled clamd binary from Netgate.

bmeeks

@0daymaster said in SG-3100 Very Unstable:

Thanks, but my boss would never even consider running a binary from anywhere besides the Netgate repos. I will have to look into getting a recompiled clamd binary from Netgate.

I've not used the Squid package on pfSense. Is clamd bundled with that, or did you load clamd from an independent repository? I'm guessing based on your quote above that clamd is part of the Squid package. If so, the pfSense Team can modify the compiler config file for clamd so that compiler optimizations are disabled when compiling for armv6 hardware. Try working through their Support Team. I will also drop a note to one of the Netgate team members about this issue.

jimp

We're aware and working on it. New version of clamav for ARM should be up at some point today for testing on 2.4.5. If it's stable there, we'll copy it back to 2.4.4.

bmeeks

@jimp said in SG-3100 Very Unstable:

We're aware and working on it. New version of clamav for ARM should be up at some point today for testing on 2.4.5. If it's stable there, we'll copy it back to 2.4.4.

Thanks Jim! Ignore my email note. I had just hit SEND when I got another email notice of your reply here.

jimp

You should now see squid pkg version 0.4.44_7 which includes the recompiled clamav package.

0daymaster

Thanks jimp! Just updated.

m3nt0r123

Recently I have also been experiencing issues with my SG-3100. I tried a fresh install and then discovered the unbound memory leak, which I updated per negate guidance. Despite these efforts, my device continues to restart after a few days of use. Current packages include snort, pfBlockerNG-Devel, Acme, and OpenVPN exporter. No restarts have provided any useful logs which I believe may suggest a potential hardware issue. I checked the physical layer and also changed DNS servers. Here is a snapshot of my system log - can anyone explain why e6000sw0port3 keeps going up/down:

Time Process PID Message
Nov 16 03:09:41 check_reload_status Reloading filter
Nov 15 19:09:40 kernel e6000sw0port3: link state changed to UP
Nov 16 03:09:40 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:38 check_reload_status Reloading filter
Nov 15 19:09:37 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:09:37 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:34 check_reload_status Reloading filter
Nov 15 19:09:33 kernel e6000sw0port3: link state changed to UP
Nov 16 03:09:33 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:09:32 check_reload_status Reloading filter
Nov 15 19:09:31 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:09:31 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:07:41 check_reload_status Reloading filter
Nov 15 19:07:40 kernel e6000sw0port3: link state changed to UP
Nov 16 03:07:40 check_reload_status Linkup starting e6000sw0port3
Nov 16 03:07:38 check_reload_status Reloading filter
Nov 15 19:07:37 kernel e6000sw0port3: link state changed to DOWN
Nov 16 03:07:37 check_reload_status Linkup starting e6000sw0port3

jimp

@m3nt0r123 said in SG-3100 Very Unstable:

can anyone explain why e6000sw0port3 keeps going up/down:

Do you have a DHCP WAN with advanced options enabled, perhaps?
https://redmine.pfsense.org/issues/8507#note-15

m3nt0r123

@jimp

No. I have not touched the Advanced Configuration checkbox under WAN.

Sparklan

There are definitely issues with the SG-3100's. I have one in my office, and one at a customer location. Both have problems with rebooting randomly. I have around 25 Netgate firewalls in the field that I manage now, and these are the only units that randomly reboot. Both units are running Suricata (Balanced) and OpenVPN. That's it. There's nothing unusual in the logs. For instance, mine reset last night a little after 1am. There were no log entries over an hour before that. Just goes down with no warning and comes back up, almost as if the power was cycled, but it wasn't. It hasn't been a huge inconvenience for me because it doesn't do it that often but I'm really glad I didn't place a lot of these with customers. With Netgates "We're not cross shipping you a firewall it unless you spend a ridiculous amount of money on a support contract.", or "Buy another one." policy, it's not exactly easy to get a replacement either even if it is approved for an RMA. So far with these, I've just been dealing with it but I will definitely not be buying any more. Currently both these units are on 2.4.4, but it doesn't really matter what version they are running, they experience the same behavior. Sometimes it's a few days, sometimes a few weeks but sooner or later they experience this reset.