Problems with recently upgraded pfSense gateways
I have recently become responsible for a pair of pfSense gateways which is my first exposure to this product, although I've been a fairly proficient FreeBSD admin for many years.
They are a HA pair providing internet connectivity for my workplace LAN and WiFi, on both IPv4 and IPv6. We're making use of all of the standard pfSense services, as well as using the Squid package for transparent web proxying. Both gateways are running on identical Dell PowerEdge R210 hardware, each with 4GB RAM, a dual-core Pentium G6950 CPU and a gmirrored pair of SATA disks. CARP is configured such that gw1 is the master gateway and gw2 the backup (gw1 base=5, skew=0, gw2 base=5 skew=100)
My first task was to upgrade the gateways to the latest version of pfSense, as they were running 2.1.2 and 2.2.6. They're now both on 2.3.2_1, but since the upgrade we've been having a number of problems.
gw2 is crashing numerous times each day, resulting in a mirror rebuild each time. I've tried replacing the RAM, but it hasn't helped. There have been nine crashes over the last couple of days, eight of which show "current process = 12 (irq265: bce1)" (or bce0). The ninth had the current process as ntpd. Is there anything I can do to try to find the root cause of the crashes, or would my best bet to be to replace the server hardware entirely as a first step? I have submitted the crash reports to the developers.
gw2 stopped responding on the network earlier today, during which we had major packet loss out to the Internet. gw1 was the master at the time so it's not clear why this packet loss was occurring. After I rebooted gw2 (whose console wasn't showing any signs of errors or a crash), it started responding on the network again and the packet loss through gw1 stopped - almost (see next point).
gw1, even when seemingly working fine, always shows a small amount of packet loss. A ping to 188.8.131.52 showed 3 lost packets over a 60 second period. This seemed to coincide with peaks of CPU usage and repeated messages in the system log of "kernel: pfsync_undefer_state: unable to find deferred state". Googling this message turned up some forum posts referring to an ongoing problem between pfsync and traffic limiting and suggesting disabling pfsync as a temporary workaround, which I have done, causing the errors messages to stop, but not the packet loss.
Can anyone give any advice on the above issues?
My next step will be to try installing pfSense 2.3.2 afresh from a USB stick, as the upgrade process wasn't without issues (host failing to come back up on network after upgrade reboot).