2.3 - LAGG, VLAN, Carp - after Update no route
we have many PFSense-Installations and updated a lot of them up to 2.3. On two
Servers we got really big problems. An importent on :)
The network there:
2x IBM Server x3650 (6x Network-IF, Two OnBoard, 4x on network cards) –> LAGG
(Failover) --> VLAN --> CARP --> HASync and Config --> HaProxy/Loadbalancing
After Update the PFSense from 2.2.6 to 2.3 everything was fine. After Reboot the
Master-Server the Failover goes to the Second. Everything is still fine. If the
Master comes back, the Carp switches again but no traffic is routed between the
Networks behind the PF. Ping to both PFs + Carp from all IFs are okay. From PF i
could ping everything. But not over the PF.
If i boot the Second one - there i got the Problem also. No network-connection over
the PFSense. Sometimes it helps to open some Gateway-Settings (equal one) and safe
But this is not often a solution.
What i've tested:
- Disable Carp
- Default-Config PFSense (with Backup from mine)
- PFSense 2.3.1
- HAProxy reinstalled (with bugs, look attached)
- HAProxy Dev
- Add VLANs directly on one Network IF (disable LAGG, attached)
- Delete LAGG and make it new (with a failure - look attached)
- Delete all network-config and make it new (also attached)
On the 2.2.6 this config is okay. No Problems are known.
Another Problem at these both Servers:
The GUI is really slow. Sometimes i could only do one change, click safe and the GUI
wait. And wait. Than i got Gateway-Timeout (look attached). I could resolv it with
restart PHP-FPM (16) and Restarting WebConfigurator (11).
Bevor i do a Rollback to 2.2.6. i ask you for help. Have i missed something?
Thats for your time and help!
Have you manually changed the MTU on your lagg interfaces? If so, the settings under the INTERFACE may have been lost, even though the MTU is reflected properly under the Status - Interface page.
no it's till at 1500.
Remove the 'old' haproxy files they likely are messing up some stuff for the pf / nat rules as well.
Then reinstall the haproxy package, that should at least get rid of the haproxy errors, and its possible interference with rule loading. Not sure if that will solve the lagg interface issues though..
that's for that hint.
The HaProxy-Error-Message is gone after deleting everything.
The LAGG-Problem is still there.
I had to downgrade to 2.2.6. There issn't the Problem anymore.
Any other ideas?
my Problem is still there. I found out now, that the Problem is the Slave-System!
Exactly after five days the second Server does something with the Carp and the Routing failes.
I don't know what happen there but after reboot from the Slave-System everything is fine again - till the next five days.
The Master-Hardware is changed, the slave not. Should i?
What should i Test next?
I have no ideas anymore and it's not so nice to get sunday a wake up call from the company that the problem is back again.