PfSense RC3 - CARP does not Fail-over on all interfaces
-
I am using the AMD64 builds of pfSense 2.0 RC3. I have the same build running on an dedicated hardware and in a VM with CARP configured on both. The dedicated hardware was installed with a build of AMD64 RC2. When I setup CARP, I updated the dedicated hardware to the following build and installed the VM using pfSense-2.0-RC3-amd64-20110708-1843.iso. I have updated three times since the install. The current build on both is 2.0-RC3 Built On: Sun Jul 24 04:39:44 EDT 2011 and I have had the same issue on all 4 of the builds I have used with the CARP setup.
- If I reboot the primary pfSense device (dedicated hardware) all interfaces fail-over to the pfSense VM and with-in 20 seconds my connections are active through the backup pfSense device (VM). When the primary pfSense device is back up, the connections fail-over to it and with-in 20 seconds my connections are active through the primary pfSence device.
This is a success and indicates that my configuration is correct. However there is more.
- If I physically disconnect the cable for the WAN uplink I see the following on the CARP Status pages on both devices.
Primary: WAN status is init and the status of the other interfaces is backup.
Secondary: WAN status is master and the status of the other interfaces is backup.Well the WAN interface does fail-over, but the other interfaces do not, so I get nothing useful.
I get the same if I physically disconnect one of the other interfaces. In this case, that interface cannot communicate with the other pfSense interfaces, but the other interfaces work as expected.
- If I take the WAN interface down via software, with ifconfig em5 down, I get the same result, but there are status differences.
Primary: WAN status is backup and the status of the other interfaces is master.
Secondary: WAN status is master and the status of the other interfaces is backup.I only tried this with the first two builds, so let me know if it would help to try this with the latest build. Also, I did not try taking down other interfaces, so again let me know if you need this result as well.
The bottom line is if all interfaces go down, then it works, but if not then I get nothing useful.
- This brings up a question. Is pfSence supposed to be able to detect flaky connections or dead routes and fail-over? I have included some examples below.
4a) If the WAN connection on the primary device starts getting errors and or collisions, will pfSense fail-over until the errors and/or collisions stop?
4b) If the WAN connection on the primary device is switching between up and down states, will pfSense fail-over until the connection became more stable?
4c) If primary device cannot reach the WAN gateway, will pfSense fail-over until it can reach the WAN gateway?
-
Can you please try with the latest build as well and let us know.
-
@ermal:
Can you please try with the latest build as well and let us know.
ermal,
I may be able to do this tonight, so it will likely be one of the Jul 25 builds that gets installed. Is this OK or do you need this specific build, pfSense-2.0-RC3-amd64-20110724-1854, installed? pfSense-2.0-RC3-amd64-20110724-1854 is the build that was release after the build I have installed and the current as of your response.
While I am not opposed to doing this, can you tell me what has changed in the 2nd Jul 24 release that would make you think that this issue may be resolved? I ask more for my reference as I am not sure I am reading the github commits correctly. I am reading the changes at the URL below and I don't see a change that seems to be related to CARP and there is only one commit for Jul 24 so I am unsure why there are two releases on Jul 24. I am sure my lack of understanding is my ignorance of how to interpret this information, so any explanation would be helpful.
https://github.com/bsdperimeter/pfsense/commits/master
-
There are even the pfsense-tools commits.
There were changes done. -
@ermal:
Can you please try with the latest build as well and let us know.
ermal,
I apologize, but I was not able to test until today. I updated to 2.0-RC3 (amd64)
built on Fri Jul 29 22:14:50 EDT 2011. I am having the same issue. If I disable an interface via ifconfig <emx>down or by disabling the switchport via the switch the primary firewall swiches all interfaces to backup, but the secondary only becomes the master only for the interface that was down.I hope this update helps.</emx>
-
OK this one is a little frustrating. I created Bug #1732 and it was rejected with the following statement.
–------Bug #1732--------
Please post on the forum to rule out configuration errors. I have just tested all that in a VM pair this week and it worked as expected for me. To gain a better understanding of how it's supposed to work, refer to the documentation, the book (if you have it), and the forum.Only open tickets once configuration errors have been ruled out on the forum.
--------Bug #1732--------I have read all the docs I could find and posted here and the fact that when I take the primary firewall off-line, shutdown or reboot, the all the CARP IPs become master on the secondary firewall and traffic that goes through the firewall resumes as you would expect. I do not see any other settings in the VirtualIPs, CARP or Advanced sections of the interface to tune CARP fail-over.
It would be helpful to know what setting(s) would force all interfaces to switch when one interface fails and the different options for the setting(s).
-
Well are you sure that your switch is not looping responses back?
Check the switch behavior on multicast traffic maybe something related there!The interesting parts will be packet traces from master and backup,
also the output of sysctl -a | grep carp