Traffic routed through incorrect WAN upon pfSense boot.
-
After a few weeks of fighting this with various config changes I am convinced there is a bug somewhere in v2.1.2 with the WAN failover. (note that I am NOT referring to CARP - there is no CARP on this network).
I will try to be clear on our setup and what is happening. We have 2 WAN ports and 3 LAN ports. Focusing on the WAN:
WAN_TWC was initially configured as the only WAN port.
OPT2 was later changed from a LAN port to a WAN port (about 3 weeks ago) and named "WAN_CB" as our second WAN.
WAN_TWC is assigned to gateway "GW_WAN"
WAN_CB is assigned to gateway "GW_CB" (and set as the default)When the pfSense is at steady-state everything works GREAT. I can unplug one WAN port and the failover works flawlessly (well, the CPU rails to 100% for several seconds as the vpn/snort/pfblocker try to sort out the chaos, but it works). I can go back and forth uplugging/plugging in the WANs and the pfSense sorts it out.
Both WAN ports are "active" (have internet connection) when the pfSense boots. When the pfSense boots, however, it always routes traffic through GW_WAN rather than the default (and higher tier) GW_CB. If I do force it back to GW_CB the pfSense it will stay all day long on GW_WAN. This is bad b/c this ISP has a monthly bandwidth cap that I am trying to avoid.
To force it back to GW_CB I log into the pfSense, go to the Gateway groups section, edit the group, MAKE NO CHANGES AT ALL and just save it. This makes the pfSense switch back over to the correct Gateway where it remains for the rest of the day (barring any WAN_CB issues).
Most of you will never encounter this since you rarely reboot your pfSense, but ours shuts down every night to save power and I boot it every morning. A simple work-around might be a cron job that runs shortly after boot to reinit the gateway group, but I have no idea what the syntax nor commands for that might be.
My theory is that when the pfSense boots, GW_WAN is the first port that is "loaded" by pfSense followed by OPT1,2,3,4 (watching the freeBSD boot screen) and for some reason it just stays stuck on this WAN port. Swapping the WAN cables fixes the problem and will likely be my next step, but then the logs make no sense (I want to retain records of our ISP's down times, there are many hence the backup ISP).
EDIT
In the text on the traffic graph I state that I re-saved the "gateway" to force the WAN back, this should read "gateway group".