2.2.2 freezing randomly
-
Hi guys,
Has anyone been experiencing pfSense 2.2.2 locking up after a little while? I have two separate installs of it freezing up anywhere from a few minutes after boot to 10 hours after boot. I've never had problems like this with pfSense on older versions, but this is my first shot at 2.2.2.
Both installs are running on VMWare ESXi 6.0.0, both were clean installs with a simple bridge configuration set up. I originally thought maybe the host they were on was causing issues, so I moved them to another host and still had the problem. Then I thought maybe it was the shared storage, so I moved them to local host storage - but still no joy.
When the lock up happens, it's completely frozen. The web interface won't respond, no ping replies to any of the interfaces, and even opening the console in VMWare shows it's unresponsive (this would be similar to a monitor and keyboard connected to a physical host and not being able to get it to respond locally). The only solution is to have VMWare reset the virtual machine - then it's happy for a while again.
I checked the logs. There's zero happening when this occurs. The logs have nothing in them for hours before the crash (this is a test environment - not a while lot is going through the routers at the moment, so I'm not surprised the logs are quiet - though I'd kind of expect maybe an error in the System log that might give a hint - but nothing).
If I ask the VM to restart via Open-VM-Tools, I do get a little bit in the logs for shutdown (surprisingly), but it just hangs after a few steps and sits there. So, it seems at least some of it is still alive.
I wonder if it's the amount of RAM I've given pfSense? I only gave it 1GB and 1 vCPU for testing. I wouldn't expect a lock up if that were the case, though - more RAM might just let pfSense run longer before it locked up, you'd think.
I just figured I'd ask if anyone else is having a similar issue, and if they might have run into a solution?
-
Do you have a CARP IP on the bridge or one of its member interfaces? That's the only issue I'm aware of that'll hang a box with a bridge. If not, it's likely because you're creating a layer 2 loop and the resulting never-ending flood of traffic hangs the system as it's consumed with passing that traffic.
-
@cmb:
Do you have a CARP IP on the bridge or one of its member interfaces? That's the only issue I'm aware of that'll hang a box with a bridge. If not, it's likely because you're creating a layer 2 loop and the resulting never-ending flood of traffic hangs the system as it's consumed with passing that traffic.
Yes, I do have a CARP IP on the WAN interface.
I did run into the loop issue while setting up the cluster, but I'm working through a solution for that (either figuring out how to start up with the bridge interface down, which is looking like it'll involved some PHP hacking, or using spanning tree to shut down the loop). For now, I've just disconnected one side of the bridge on the VMWare host while I work on the solution.
The CARP IP with a bridge causing a lock up is new information, though. Is there a fix in the pipeline for that, do you know?
-
Yes, it may already be fixed: https://redmine.pfsense.org/issues/4607
If you have a test setup try a recent 2.2.3 snapshot if you can.Steve
-
Yes, it may already be fixed: https://redmine.pfsense.org/issues/4607
If you have a test setup try a recent 2.2.3 snapshot if you can.Steve
Thank you for the bug link and the advice. I am in the process of downloading the upgrade file now. I'll install and report back on the results after it's had a few hours of run time.
Update: Upgrade went smoothly. I did get an error about synchronizing, though, if that's of any use. The message is:
An error code was received while attempting Filter sync with username admin https://192.168.254.2:443 - Code 2: Invalid return payload.
192.168.254.x is my pfSync subnet.
I assume that means that if I make firewall changes on this release, they won't replicate to the other node. This is a development version, though, so maybe something that's still getting tweaks.
Now we wait and see if the problem has gone away. ;)
-
Well, it's been a little over 2 days of up time, and no freezes. Looks like 2.2.3 has this problem licked.
Thank for the help, guys!
-
Great, thanks for the feedback. :)
That error is probably expected (though I haven't reviewed the code changes there). In general syncing between different versions of pfSense is not supported, the format may have changed slightly. It's better to disable syncing when upgrading a CARP pair to avoid that.
Steve
-
Great, thanks for the feedback. :)
That error is probably expected (though I haven't reviewed the code changes there). In general syncing between different versions of pfSense is not supported, the format may have changed slightly. It's better to disable syncing when upgrading a CARP pair to avoid that.
Steve
Hi, Stephen,
I did upgrade both nodes to the 2.2.3 development snapshot, but the sync error still appears after each reboot. I haven't actually tested to see if it impacts anything, but it does show up once after each restart.
I'm not too worried about it, since it's just a proof-of-concept at this point. I'll wait for the full 2.2.3 to deploy in production. Hopefully 2.2.3 stable isn't too far away. :)