ADI/NetGate firewalls become unresponsive on 2.3/2.3.1 with high ipSec Traffic
-
Hello,
We have 2 ADI NetGate firewalls (one 8860 and one 2220) which are interconnected via an ipSec tunnel. We're seeing that both firewalls become unresponsive and require a hard reset (soft reboot hangs) when pushing close to 100Mbit of ipSec traffic. We run our nightly backups over this vpn tunnel and have been for many months, but only after updating to 2.3 did we start having this problem. There is nothing in the logs, it seems as if the firewall thinks everything is fine, but it won't respond to ping and it seems it randomly loses ethernet interfaces. At this point if you try to reboot from the console it will hang until power is physically unplugged.
I've had to revert both firewalls to 2.2.6, but I wanted to put this out there because I'm not finding anyone else who is having this problem. If there's anything I can do to help test I will see what I can do.
Has anyone else seen this?
Thanks
-
Are you Sure it is not fixed in 2.3.1?
-
Are you Sure it is not fixed in 2.3.1?
Definitely, had the 2220 firewall crash last night running on 2.3.1
-
When a system sees the 2.3.0 SMP+IPsec issue described here https://redmine.pfsense.org/issues/6296 it will show 100% interrupt CPU on one core. This can be seen in Status > Monitoring.
It can also be worked around by disabling all but one core as described here:
https://forum.pfsense.org/index.php?topic=110710.msg618388#msg618388
If neither of these conditions fit it is likely you are seeing something different.
-
When a system sees the 2.3.0 SMP+IPsec issue described here https://redmine.pfsense.org/issues/6296 it will show 100% interrupt CPU on one core. This can be seen in Status > Monitoring.
It can also be worked around by disabling all but one core as described here:
https://forum.pfsense.org/index.php?topic=110710.msg618388#msg618388
If neither of these conditions fit it is likely you are seeing something different.
Thanks for the info. Our main office has a lot of users and reducing from 8 cores to one is not an option. I will keep an eye on this issue and we will try to upgrade again at 2.3.2 or when I see on here that the issue is fully resolved.
Thanks!
-
What would be really important to verify is the 100% interrupt CPU on a node running 2.3.1 before you reboot.
-
The issue in #6296 with SMP and IPsec is 100% confirmed fixed by many different people in many circumstances. Any issues with 2.3.1 would be something different, and needs troubleshooting. No one else has reported any such issues on 2.3.1 (note that's not 2.3_1, 2.3.1 from this week). I wouldn't advise disabling cores for any reason in 2.3.1 at this point.
Ideally a dump from status.php if you can get to the GUI at the time would help. If not that, seeing the output of 'top -SH' might be at least somewhat telling.
-
@cmb:
The issue in #6296 with SMP and IPsec is 100% confirmed fixed by many different people in many circumstances. Any issues with 2.3.1 would be something different, and needs troubleshooting. No one else has reported any such issues on 2.3.1 (note that's not 2.3_1, 2.3.1 from this week). I wouldn't advise disabling cores for any reason in 2.3.1 at this point.
Ideally a dump from status.php if you can get to the GUI at the time would help. If not that, seeing the output of 'top -SH' might be at least somewhat telling.
cmb, Thanks for the info. I will see what I can do but these are production systems with little tolerance for downtime. If I get a chance to I will try to test again. status.php is pretty cool, I had no idea this existed.
Thanks