Where is the interface order set
-
Let me put the question differently: Where is the interface order stored?
I have identical hardware. Both systems look identical when looked at with ifconfig. They have:
igb0
igb1
igb2 - Cogent (WAN)
igb3 - Global (WAN)
bge0
bge1
pflog0
pfsync0
enc0
lo0
lagg0 (igb1 bge1) - Crossover
lagg1 (igb0 bge0) - LAN
lagg1_vlan19 - DMZSo same order, same assignments, both firewalls. But in the Interfaces / Interface Assignments screen the order now is not the same. The primary has:
Cogent
LAN
Global
DMZ
CrossoverThe secondary has:
Cogent
LAN
Global
Crossover
DMZThis gets CARP way confused on the secondary. It puts two IPs which properly belong to the DMZ, per the configuration on the primary, onto the Crossover interface on the secondary. I've tried deleting the Crossover definition on the secondary and recreating it, but it still ends up in 4th position rather than 5th. So it's not current order of creation that determines the menu order here. It may well be something we initially did. The goal now is to fix it without the extreme step of tearing everything down for a full reinstallation.
At some level there's a mapping from the hardware – identical on both systems, including in how ifconfig presents it and in how the assignments are defined in pfSense -- to the pfSense Interface Assignment menu order, which apparently is also the order that CARP depends on. What do I edit, where, to fix that mapping so that CARP will work right between these boxes?
-
When you go into Interfaces, Assign the two boxes should look the same.
In addition, check status, interfaces and verify they match there also- e.g. If Global is (opt1, igb3) on the primary it should be (opt1, igb3) on the secondary. NOT something like (opt2, igb3) -
Okay, where we messed up was not assigning the interfaces in the identical temporal order. We got the "same hardware, same assignments" part. But the primary has the DMZ as opt2, and the Crossover as opt3, while the secondary has the DMZ as opt3 and the Crossover as opt2. Since we had renamed these interfaces from opt2 and opt3 to "DMZ" and "Crossover," this mismatch was not readily apparent. It does show if you hover over the links to them on the Interface Assignments page.
Where do we find the file or database which has frozen the mappings into this order? We can't just switch cables around, since the DMZ is a vlan on what, for both systems is the "lan" interface which we've named "LAN," which is cabled identically and works.
-
Can this be fixed by editing /cf/conf/config.xml?
If so, what's the right edit? On the primary we have:
<opt2><if>lagg1_vlan19</if>
<enable></enable><spoofmac></spoofmac>
<ipaddr>172.17.19.3</ipaddr>
<subnet>24</subnet></opt2>
<opt3><if>lagg0</if>
<enable></enable><spoofmac></spoofmac>
<ipaddr>192.168.100.1</ipaddr>
<subnet>24</subnet>
<media>autoselect</media></opt3>On the secondary:
<opt2><if>lagg0</if></opt2>
<opt3><if>lagg1_vlan19</if>
<enable></enable>
<spoofmac></spoofmac><ipaddr>172.17.19.4</ipaddr>
<subnet>24</subnet></opt3>This partly reflects that I haven't fully reassigned opt2 on the secondary yet, since I was seeing if deleting and recreating it through the GUI would fix the order. Is it safe just to take the stanzas from the primary and copy them to the secondary with distinct IP assignments?
-
If it were mine I would completely rebuild the secondary:
Disable XMLRPC sync on the primary in System > High Availability
Disable sync in any packages you are using
Shut down the secondary, reinstall it (exactly the same version as the primary), reassign the interfaces in the correct order with the correct interface addresses for that node
Reinstall any packages - esp those that require sync
Place a pass any any rule on the sync interface
Reenable XMLRPC sync on the primary
Reenable any package syncThat's what I would do if it was mine.
I usually follow the HA chapter in the book when I need to rebuild a node. Pretty much works every time. Packages make it more difficult.
https://portal.pfsense.org/docs/book/highavailability/example-redundant-configuration.html
If you want to try a shortcut, take this from the primary:
<opt2><if>lagg1_vlan19</if>
<enable></enable><spoofmac></spoofmac>
<ipaddr>172.17.19.3</ipaddr>
<subnet>24</subnet></opt2>
<opt3><if>lagg0</if>
<enable></enable><spoofmac></spoofmac>
<ipaddr>192.168.100.1</ipaddr>
<subnet>24</subnet>
<media>autoselect</media></opt3>Make it look correct for the secondary (interface addresses) and replace the opt2/opt3 section on the secondary.
Search the configuration for other relations to opt2 and opt3 and make them correct.
Reboot and hope.
You might also try stopping sync, deleting those two interfaces on the secondary, making them again in the correct order, and reenabling sync.
I really like the starting from scratch and doing it right method. It shouldn't even involve any down time.
-
Derelict beat me to it. Editing the XML and restoring might be ok if you are careful, but I second the recommendation to rebuild the secondary from scratch.
-
Thanks for all advice. While I appreciate the "start from scratch" method, I'm tight on time so will try a shortcut first.
Is there anywhere else besides within the XML file that is even aware of opt2, opt3 and so on? Or is the only coordination at issue on the system within that file?
-
Everything is in config.xml
Shortcut if you want. HA needs to be correct to work properly.
-
Reworking that XML file – including later references to opt2 and opt3 -- got the interfaces in the right order after a reboot. There do seem to be some references stored elsewhere on the system though. NRPEv2 wasn't actually available until I saved that page (without changes). And a couple of CARP assignments that should be to the LAN interface are still associating themselves with the CROSSOVER interface on secondary system. I assume that will be fixable.
One thing I'm seeing that seems independent of this. With CARP set to off, there are many actions that will trigger it back on despite not specifically hitting the button to do so. That's a bit annoying when adjusting a secondary system. Granted the screen say "some configuration changes will re-enable." Just noting that some of those changes are seemingly at some distance from anything one might expect to turn CARP back on.
-
Yeah. CARP maintenance mode is your friend there.