Dual PfSense boxes, dual Internet connections, CARP, failover HELP!!



  • Hi everyone,

    First post and pretty much new to PfSense. Also, please excuse my English  :)

    Let me start by explaining my situation: I am trying to do a setup with 2 PfSense boxes, 2 Internet connections, that will have failover features if the boxes or the Internet dies. As of right now, everything works good but one thing: if the main internet connection dies, nothing happen, and so my failover is a… fail! I've looked over on Google and in the forum and it seems no one has a setup like me, so let me explain how it looks:

    PfSense1:
    Main Internet connection is provided with a modem/router and has local address of: 192.168.255.1
    WAN address: 192.168.255.100
    CARP card address: 10.255.255.1
    LAN address: 192.168.100.1 with a CARP vip of 192.168.100.10

    PfSense2:
    Backup Internet connection is a simple modem with DHCP
    WAN address: DHCP
    CARP card address: 10.255.255.2
    LAN address: 192.168.100.2 with a CARP vip of 192.168.100.10

    So, basically, I know that in a "normal" CARP setup, the WAN interface should have a vip for the CARP to work (not to mention that I should only have 1 Internet connection). The thing is, I want hardware and Internet redondancy at the same time. I want my PfSense2 to become the master when my main Internet connection drops, which is not hapenning right now.

    As I said earlier, I looked around on Google and in this forum. I found that the scripts that make CARP work is in /etc/devd.conf, and more specifically, this string:

    # CARP notify hooks. This will call carpup/carpdown with the
    # interface (carp0, carp1) as the first parameter.
    notify 100 {
        match "system"          "IFNET";
        match "type"            "LINK_UP";
        match "subsystem"           "vip";
        action "/etc/rc.carpmaster $subsystem";
    };
    
    notify 100 {
        match "system"          "IFNET";
        match "type"            "LINK_DOWN";
        match "subsystem"           "vip";
        action "/etc/rc.carpbackup $subsystem";
    };
    

    Some people were also suggesting to add some other "action" triggers that would run a script (.sh file) that would bring all the vip up or down, depending of the "match" trigger. This is what I did but, it was still not working.

    Then, I had the idea to go into console in my pfsense1 and manually shutdown all the vip (in my situation, I only have vip3), and I found that it did not even trigger the CARP to failover to pfsense2! Then after a bunch of tests, I found that if I bring the LAN interface down on pfsense1, CARP work and all the traffic goes to pfsense2! If I bring back the LAN interface up, BAM! pfsense1 takes the load back!

    So now, I just need a script that will bring the LAN interface down when I lose my main Internet connection, and that will bring it back up when my main Internet connection is back! So I did the following:

    • I added an "action" line in the lines about CARP that execute a .sh file
    • In the .sh files, it brings the lan interface up or down, depending on the "match"

    … So that should work, right?... I unplugged my WAN cable on my pfsense1 and... Well it doesn't... it seems that the "match" thing in devd.conf have something wrong in it or I don't know... even though, the CARP part seems to work, for I can manually shutdown the LAN interface and it works!

    So now I am stuck... if anyone have any idea how to make this work, or if I am really not doing the right thing, please let me know!

    Regards,



  • for CARP and VIPs to work, both must have a WAN IP on the same subnet then the VIP must be on that same subnet as well.

    you have internal VIP, but no external.

    if i were you, i would run a single pfsense w/ 3 NICs

    1 for LAN and 1 for each ISP WAN connection and either setup load-balancing or failover with gateway groups

    http://forum.pfsense.org/index.php?topic=28121.0



  • Hi Matt and thank you for the reply,

    I understand, but is there any other way to bypass that? I mean, I would love to be able to use the 2 machines I got. Also, let's say that my pfsense1 have an hardware problem, I would love to have another right by it ready to take the load. We can't really afford to have our gateway down!

    Of course I could just have my pfsense2 still running with nothing but the LAN and replicate manually every change I'm making so an hardware issue downtime would be lower but I don't want to go there.

    So if you or any other one have any other idea to make this work please let me know, thanks!



  • Off the top of my head:

    Setup a gateway on LAN for your first box which points at the second pfSense system.  Add your WAN gateway & that second gateway to a gateway group as Tier 1 & Tier 2 respectively.  Add that gateway group your your LAN Firewall rules.

    I can't remember if gateways & groups automatically sync over pfsync, so you may have to manually create the GW & group on the second box so that the rule doesn't break when you failover.



  • Hey there Jason,

    Sorry for the slow reply but I did not have much time to invest on that lately.

    Yesterday I did what you said:

    • Added a new gateway on Pfsense1 with my Pfsense2 LAN address (192.168.100.2)
    • Created a new gateway group with Pfsense1's WAN as tier1 (192.168.255.1) and my newly added gateway as tier2 (192.168.100.2)
    • I checked on my Pfsense2 and the configs done were synched already.
    • Also have an allow all any/any in LAN rules, for testing purposes

    Then I unplugged my Pfsense1's WAN and…. no failover :(

    I'll try to specify the gateway group on the any/any LAN rule and see how it goes in a few hours and let you know the result edit:(well, did not work either). Other than that… Is there anything I missed? =/

    Thanks!



  • Take a screenshot of the rules on your LAN interface.



  • Hi Jason,

    Here:
    http://imgur.com/fadQogt



  • You haven't told your rule to actually use the gateway group.



  • Hi Jason,

    Yes I did but when I tested, it was still not working so I changed it back to "*"

    I changed that for the gateway group again now, tested again and still no failover. Here's how it looks now:
    http://imgur.com/Hqykcmg

    Also, would'nt it work even if I don't specify the group? I mean, isn't the "*" a catch all?

    Thanks!



  • What does that failover group look like?  What is the gateway status for each?

    No, if you do not specify a gateway on your rule then it will use the system default.

    Also, try specifying the gateway you setup for your second pfSense box on that rule.  It will either send your traffic that way or you won't have any connectivity.



  • Hi Jason,

    I went in the gateway status menu (I didn't know it existed) and found out that, on my second box, the WAN of my first box was not reachable and therefore, offline on this side. I added a static route on my second box and now both boxes show the gateways online in the gateway status menu. I also sent a ping from my second box to my first box's WAN and it works.

    Now the weird thing is that, maybe 15 minutes later, I went again in the gateway status menu on my first box and second box and  here's how it looks now:

    http://imgur.com/mcRv2LH

    It is really strange because I did not change anything! I can also still ping my first box's WAN from my second box just fine, but in the gateway group it show offline?! Also, on my first box, my second box shows as "Gathering data".

    I also tried to unplug the first box's WAN and still no failover… Any idea why it's doing that?

    Thanks again!



  • What have you setup for the failover gateway on each box?  It sound like you used the WAN IP of the other system.  You should be using the LAN IP if that is the shared network.



  • Hi Jason,

    Not sure what you mean by "You should be using the LAN IP if that is the shared network" (edit: did you mean the CARP LAN vip (192.168.100.10), instead of the LAN IP of the boxes?) , but here's how the gateway groups are configured on each boxes:

    First box:
    Tier1: Pfsense1's WAN (192.168.255.1)
    Tier2: Pfsense2's LAN (192.168.100.2)
    never: Pfsense1's LAN (192.168.100.1)

    Second box:
    Tier1: Pfsense1's WAN (192.168.255.1)
    Tier2: Pfsense2's LAN (192.168.100.2)
    never: Pfsense1's LAN (192.168.100.1)
    never: Pfsense2's WAN (DHCP)

    That first box's WAN (192.168.255.1) shows offline in the gateway status of the second box. Also, I removed my static route and I can still ping it, but still shows offline in gateway status.

    Thanks!



  • I'm going to use an example with slightly different IPs to make it more clear.

    Interface IPs
    Box 1

    • WAN - 10.0.0.2
    • LAN - 192.168.1.2

    Box 2

    • WAN - 172.16.0.2
    • LAN - 192.168.1.3

    CARP

    • LAN - 192.168.1.1

    Gateways
    Box 1

    • GW_WAN - WAN - 10.0.0.1
    • GW_PF2 - LAN - 192.168.1.3

    Box 2

    • GW_WAN - WAN - 172.16.0.1
    • GW_PF1 - LAN - 192.168.1.2

    Gateway Groups
    Box 1

    • "Failover"
        - Tier 1 - GW_WAN
        - Tier 2 - GW_PF2

    Box 2

    • "Failover"
        - Tier 1 - GW_WAN
        - Tier 2 - GW_PF1

    Rules
    Apply the gateway group "Failover" to all the LAN rules you want to switch to the other box.



  • Hi Jason,

    Thanks for clarifying this to me, had some stuff wrong on the second box. Got that all fixed up now as you described, but still, the setup does not failover yet.

    There is 2 things I noticed, I don't know if it will tell you something but anyway:

    1- I checked the CARP status while the first box's WAN was unplugged and the first box was still the "master". I guess that make sense in a way, since the LAN address still work fine. Do I need to add something in the CARP setting so that it checks the first box's WAN also?

    2- When I go in the gateway status on the first box, the "GW_PF2 - LAN - 192.168.1.3" (If I take your example) always switch between "Online" and "Gathering data". What I mean there is that, if I keep refreshing the page, it always switch between the 2 modes.

    Any other idea?

    Thanks again for your time and your support!


Log in to reply