Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multi-WAN, High Availability, policy routing. Failover breaks connections

    Routing and Multi WAN
    4
    28
    4.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dayer
      last edited by

      Hi. Sorry if this message should be in the CARP section. I consider the situation is related with CARP and with Multi WAN also.
      pfSense 2.3.4-p1

      I've a this scenario:

      
                                  |---(WAN1)----------|
                  |---Pfsense1====                    |
                  |      |        |---(WAN2)--|       |
                  |      |                    |       |
      PC --(LAN)--|	(Sync)                 GW2     GW1
                  |      |                    |       |
                  |      |        |---(WAN2)--|       |
                  |---Pfsense2====                    |
                                  |---(WAN1)----------|
      
      
      • The gateway for PC is the VIP in the LAN.

      • The default gateway for Pfsense is GW2

      • The gateway for the traffic from LAN net is GW1 thank to policy routing:

        • Action: pass

        • Interface: LAN

        • Address Family: IPv4

        • Protocol: Any

        • Source: LAN net

        • Destination: ! LAN net

        • Gateway: GW1

      • NAT, Outbound settings:

      • In WAN1, from LAN net, to any: WAN1 VIP address

      • In WAN2, from LAN net, to any: WAN2 VIP address

      With this scenario all traffic from PC to outside goes by GW1 correctly. However, if I'm doing ping from PC to an Internet address with Pfsense1 as master and I disable CARP temporarily in Pfsense1, now Pfsense2 is the new master and the ping is broken. In case of a TCP connection, as SSH, the result is the same.
      I've been monitoring the traffic with tcpdump and I've realised Pfsense2 is trying to send this traffic by GW2 and using the WAN1 VIP address even, and not by GW1.
      In case I redo the ping, or close an reopen de SSH connection, the new traffic goes by GW1, all right. But I can't find the reason why the current traffic is forwarded by the default gateway system (GW2) instead of follow the policy routing (only GW1).

      Thanks

      1 Reply Last reply Reply Quote 0
      • D
        dayer
        last edited by

        I've also tested:

        • only with a gateway, but without a default gateway for the system (in order to use policy routing)
        /root: netstat -r 
        Routing tables
        
        Internet:
        Destination        Gateway            Flags      Netif Expire
        8.8.4.4            192.168.1.1        UGHS   lagg0_vl
        [...]
        
        • add a floating rule at the top with:

          • action: pass

          • quick: checked

          • interface: select all

          • direction: out

          • address family: IPv4

          • Protocol: Any

          • Source: This Firewall (self)

          • Destination: any

          • Gateway: GW1 - 192.168.1.1

        and a ping from the firewall to outdoor doesn't know the path:

        
        /root: ping -S 127.0.0.1 8.8.8.8
        PING 8.8.8.8 (8.8.8.8) from 127.0.0.1: 56 data bytes
        ping: sendto: No route to host
        
        

        However a ping from the WAN1 IP goes well:

        
        /root: ping -S 192.168.1.111 8.8.8.8
        PING 8.8.8.8 (8.8.8.8) from 192.168.1.111: 56 data bytes
        64 bytes from 8.8.8.8: icmp_seq=0 ttl=57 time=3.357 ms
        
        

        I think this could be related with some thread and the bug #5476  :(

        1 Reply Last reply Reply Quote 0
        • D
          dayer
          last edited by

          I've simulated and simplified that situation with pfSense 2.4.0-RC using three virtual machines with VirtualBox and the host computer doing NAT (see the attachment file).
          However the problem is the same.

          Settings:

          • Pfsense1 as master. Pfsense2 as backup.

          • WLAN1 as default gateway (GW1), in the system routing table

          • WLAN2 as gateway (GW2) for traffic from the LAN to outside (with policy routing)

          • There are Outbound NAT from LAN in WAN1 (HA IP WAN1) and WAN2 (HA IP WAN2).

          Reproduce:

          • I put a non-stop ping from PC to 8.8.8.8. The traffic flows through GW2. It's OK.

          • Disable CARP in Pfsense1. Now Pfsense2 is the master unit.

          • The states related to this ping are also in Pfsense2.

          • The ping begins to fail. There's no response.

          • With tcpdump in Pfsense2 I see:

            • The packets from PC arrive to LAN interface correctly.

            • The packets try leaving the firewall through WAN1 (¿¿¿why don't continue across WAN2???) and using the HA IP from WAN2 as source (because it keeps the NAT information according to the states)

          • If:

            • I close the ping and relaunch it, the pings goes OK through WAN2.

            • I enable CARP in Pfsense1 and it's the master unit again, the pings packets go through WAN2 again.

          From my point of view, when a Pfsense is the new master unit and there's traffic flowing, it try to route the traffic only according to routing table (or static routes) and it ignores the policy routing.

          This is the normal behavior in Pfsense? Do I need any special rules? Could it be a bug?

          PD: I've fixed the thread subject.
          PD2: I've also posted the first scenario in reddit and there's another person with the same behaviour.

          pfsense-multi-wan-ha.png
          pfsense-multi-wan-ha.png_thumb

          1 Reply Last reply Reply Quote 0
          • D
            dayer
            last edited by

            I attach some information I'm using to explain the problem also in mailing list thread:

            • Configuration files from a scenario with pfSense 2.4-0-RC over two virtual machines

            • Screenshots for an example

            00-initial-pf1.png
            00-initial-pf1.png_thumb
            00-initial-pf2.png
            00-initial-pf2.png_thumb
            01-gateways.png
            01-gateways.png_thumb
            02-rules.png
            02-rules.png_thumb
            03-outbound-nat.png
            03-outbound-nat.png_thumb
            04-ping-pf1-master.png
            04-ping-pf1-master.png_thumb
            05-pf2-new-master.png
            05-pf2-new-master.png_thumb
            06-incorrect-routing.png
            06-incorrect-routing.png_thumb
            07-restart-ping.png
            07-restart-ping.png_thumb
            config.zip

            1 Reply Last reply Reply Quote 0
            • luckman212L
              luckman212 LAYER 8
              last edited by

              Interesting problem. Sorry I don't have a solution right now, but I'll be following this thread.

              If, in your "Reproduce" section above, instead of stopping the ping and restarting it in step#6 - you go to Diag>States>Reset States and kill all states, does the ping start succeeding again?

              1 Reply Last reply Reply Quote 0
              • D
                dayer
                last edited by

                Thank you for your interest :)

                I've followed your suggestion related to the setp #6:

                • If I kill all states the ping starts succeeding again :)

                • If I kill all states related to the destination IP, the ping starts succeeding again :)

                • If I kill the states related to the destination IP in WAN2 , the ping continues failing :(

                • If I kill the states related to the destination IP in LAN , the ping starts succeeding again :)

                And in the successfully situations, If Pfsense1 recover the master rol the ping continues succeeding.
                I think it's a problem related to keep the routes in established states at LAN interface.

                Could it be considered a bug?

                1 Reply Last reply Reply Quote 0
                • luckman212L
                  luckman212 LAYER 8
                  last edited by

                  Ok at least you're a little closer to identifiying the culprit. I don't unfortunately have too much experience with CARP. So I will have to let someone else respond as to whether this is a config issue or a bug, and if it's a bug - whether the bug lies in pfSense or FreeBSD itself. If it's the latter, we'll have to file it upstream.

                  1 Reply Last reply Reply Quote 0
                  • D
                    dayer
                    last edited by

                    One more thing.
                    I've done the last tests again, with SSH instead of ping, but I haven't achieved recover the SSH.

                    1 Reply Last reply Reply Quote 0
                    • DerelictD
                      Derelict LAYER 8 Netgate
                      last edited by

                      Could it be considered a bug?

                      Probably a configuration issue or an issue in your virtual environment. I have done countless tests like you are doing. Up to and including failing over during live video streams, etc and state sync works great.

                      I just pinged through from behind my 2.4.0-RC HA Multi-WAN VM pair last night when I updated it to 2.4.1-DEV. Worked great failing over and back.

                      Chattanooga, Tennessee, USA
                      A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                      DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                      Do Not Chat For Help! NO_WAN_EGRESS(TM)

                      1 Reply Last reply Reply Quote 0
                      • D
                        dayer
                        last edited by

                        Thank you very much for your tests.

                        Related to the failing over during live video streams, where those using the default gateway or… a secondary gateway according to a firewall rule or a failover gateway group? This difference is important because I've only found the problem if I'm using a not default gateway while I change the master unit.

                        I've done several tests, with virtual and physical machines, with LACP and without them, but the behavior always has been the same  :-
                        I attached my config in the reply #3 a few days ago to try clearing any doubt about the settings.

                        Please, do you know a more detailed guide than Multi-WAN + Configuring pfSense Hardware Redundancy (CARP) from PFSenseDocs or the High Availability » Multi-WAN with HA from The pfSense Book for this purpose?
                        Could you share your tests configuration?

                        1 Reply Last reply Reply Quote 0
                        • DerelictD
                          Derelict LAYER 8 Netgate
                          last edited by

                          Policy routing all LAN pings out WAN2. WAN is the default gateway. Pinging 8.8.8.8:

                          States on Primary/MASTER:

                          LAN icmp 172.25.236.227:29353 -> 8.8.8.8:29353 0:0 139 / 139 11 KiB / 11 KiB
                          WAN2 icmp 172.25.227.17:38069 (172.25.236.227:29353) -> 8.8.8.8:38069 0:0 139 / 139 11 KiB / 11 KiB

                          States on Secondary/BACKUP:

                          LAN icmp 172.25.236.227:29353 -> 8.8.8.8:29353 0:0 0 / 0 0 B / 0 B
                          WAN2 icmp 172.25.227.17:38069 (172.25.236.227:29353) -> 8.8.8.8:38069 0:0 0 / 0 0 B / 0 B

                          Persistent CARP maintenance mode on Primary:

                          States on secondary start seeing the traffic.

                          LAN icmp 172.25.236.227:29353 -> 8.8.8.8:29353 0:0 23 / 23 2 KiB / 2 KiB
                          WAN2 icmp 172.25.227.17:38069 (172.25.236.227:29353) -> 8.8.8.8:38069 0:0 23 / 23 2 KiB / 2 KiB

                          Client dropped three pings then continued. That's about right for a failover event.

                          States still exists on primary but are not seeing any traffic:

                          LAN icmp 172.25.236.227:29353 -> 8.8.8.8:29353 0:0 239 / 239 20 KiB / 20 KiB
                          WAN2 icmp 172.25.227.17:38069 (172.25.236.227:29353) -> 8.8.8.8:38069 0:0 239 / 239 20 KiB / 20 KiB

                          Leave Persistent CARP mantenance mode on Primary. States on primary are seeing the traffic again (239 now 262):

                          LAN icmp 172.25.236.227:29353 -> 8.8.8.8:29353 0:0 262 / 262 21 KiB / 21 KiB
                          WAN2 icmp 172.25.227.17:38069 (172.25.236.227:29353) -> 8.8.8.8:38069 0:0 262 / 262 21 KiB / 21 KiB

                          Same results using temporary enabling and disabling of CARP on the primary.

                          Chattanooga, Tennessee, USA
                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                          1 Reply Last reply Reply Quote 0
                          • DerelictD
                            Derelict LAYER 8 Netgate
                            last edited by

                            And FWIW, here's a TCP session. Policy routed all outbound SSH out WAN2:

                            States on Primary:

                            LAN tcp 172.25.236.227:46380 -> 192.168.223.6:22 ESTABLISHED:ESTABLISHED 75 / 81 7 KiB / 10 KiB
                            WAN2 tcp 172.25.227.17:21325 (172.25.236.227:46380) -> 192.168.223.6:22 ESTABLISHED:ESTABLISHED 75 / 81 7 KiB / 10 KiB

                            States on Secondary:

                            LAN tcp 172.25.236.227:46380 -> 192.168.223.6:22 ESTABLISHED:ESTABLISHED 0 / 0 0 B / 0 B
                            WAN2 tcp 172.25.227.17:21325 (172.25.236.227:46380) -> 192.168.223.6:22 ESTABLISHED:ESTABLISHED 0 / 0 0 B / 0 B

                            I failed back and forth a couple times. At no point did the ssh session drop. Noticed a couple delays in output but TCP did its thing and no data was lost in the session.

                            Chattanooga, Tennessee, USA
                            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                            Do Not Chat For Help! NO_WAN_EGRESS(TM)

                            1 Reply Last reply Reply Quote 0
                            • D
                              dayer
                              last edited by

                              Thank you Derelict :)

                              I've been simulating your tests.

                              WAN is the default gateway:

                              WAN1 (default)      WAN1 	xxx.xxx.xxx.xxx     8.8.8.8     WAN1 Gateway
                              WAN2                WAN2 	192.168.1.1         8.8.4.4     WAN2 Gateway
                              

                              Policy routing all LAN traffic to outside go through WAN2:

                              States      Protocol    Source  Port    Destination     Port    Gateway     Queue   Schedule    Description
                              5 /3.80 MiB IPv4*       *       *       internals       *       *           none                Internal traffic to default gateway
                              0 /254 KiB  IPv4*       *       *       *               *       WAN2        none                The rest to WAN2
                              

                              NAT for internal traffic to outside:

                              WAN1    internals 	* 	* 	* 	xxx.xxx.xxx.xxy     * 		NAT from internal networks
                              WAN2    internals 	* 	* 	* 	192.168.1.100       * 		NAT from internal networks
                              

                              internals is an alias with internal networks.

                              Pinging 208.123.73.69

                              States on Primary/MASTER:

                              LAN     icmp    172.16.103.2:10618 -> 208.123.73.69:10618                           0:0     69 / 69     6 KiB / 6 KiB 	
                              WAN2    icmp    192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632     0:0     69 / 69     6 KiB / 6 KiB
                              

                              States on Secondary/BACKUP:

                              
                              LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618                           0:0     0 / 0 	0 B / 0 B 	
                              WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632     0:0     0 / 0 	0 B / 0 B
                              

                              Persistent CARP maintenance mode on Primary:

                              States on secondary start seeing the traffic but something goes wrong. The number of packets observed matching the state from the destination side is zero.

                              LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618 	                        0:0 	58 / 0 	5 KiB / 0 B 	
                              WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632 	0:0 	58 / 0 	5 KiB / 0 B
                              

                              Client doesn't get ping reponses.

                              States still exists on primary but are not seeing any traffic (it's reasonable):

                              LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618 	                        0:0 	368 / 368 	30 KiB / 30 KiB 	
                              WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632 	0:0 	368 / 368 	30 KiB / 30 KiB
                              

                              Leave Persistent CARP maintenance mode on Primary. States on primary are seeing the traffic again (368 now 372):

                              LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618 	                        0:0 	372 / 372 	31 KiB / 31 KiB 	
                              WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632 	0:0 	372 / 372 	31 KiB / 31 KiB
                              

                              SSH to aaa.bbb.ccc.ddd (I've replaced the public IP for security reasons):

                              States on Primary:

                              LAN 	tcp 	172.16.103.2:43290 -> aaa.bbb.ccc.ddd:22                        ESTABLISHED:ESTABLISHED 	138 / 126 	12 KiB / 20 KiB 	
                              WAN2 	tcp 	192.168.1.100:54741 (172.16.103.2:43290) -> aaa.bbb.ccc.ddd:22  ESTABLISHED:ESTABLISHED 	138 / 126 	12 KiB / 20 KiB
                              

                              States on Secondary:

                              
                              LAN 	tcp 	172.16.103.2:43290 -> aaa.bbb.ccc.ddd:22                        ESTABLISHED:ESTABLISHED 	0 / 0 	0 B / 0 B 	
                              WAN2 	tcp 	192.168.1.100:54741 (172.16.103.2:43290) -> aaa.bbb.ccc.ddd:22  ESTABLISHED:ESTABLISHED 	0 / 0 	0 B / 0 B
                              

                              Persistent CARP maintenance mode on Primary:

                              States on secondary start seeing the traffic, but something appears wrong:

                              
                              LAN 	tcp 	172.16.103.2:43290 -> aaa.bbb.ccc.ddd:22                        ESTABLISHED:ESTABLISHED 	13 / 3 	2 KiB / 708 B 	
                              WAN2 	tcp 	192.168.1.100:54741 (172.16.103.2:43290) -> aaa.bbb.ccc.ddd:22  ESTABLISHED:ESTABLISHED 	13 / 3 	2 KiB / 708 B
                              

                              With tcpdump in WAN1, I see the Secondary firewall is routing through WAN1 using the WAN2 VIP address like for NAT:

                              14:09:03.973890 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [.], ack 1150910396, win 593, options [nop,nop,TS val 3297622232 ecr 1263162549], length 0
                              14:09:07.569891 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [.], ack 1, win 593, options [nop,nop,TS val 3297625828 ecr 1263163448,nop,nop,sack 1 {4294967113:1}], length 0
                              14:09:08.810847 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [P.], seq 0:52, ack 1, win 593, options [nop,nop,TS val 3297627069 ecr 1263163448], length 52
                              14:09:08.978668 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [P.], seq 52:104, ack 1, win 593, options [nop,nop,TS val 3297627237 ecr 1263163448], length 52
                              14:09:09.035602 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [P.], seq 52:104, ack 1, win 593, options [nop,nop,TS val 3297627294 ecr 1263163448], length 52
                              14:09:09.122214 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [P.], seq 104:156, ack 1, win 593, options [nop,nop,TS val 3297627380 ecr 1263163448], length 52
                              14:09:09.180582 IP 192.168.1.100.54741 > aaa.bbb.ccc.ddd.22: Flags [P.], seq 0:156, ack 1, win 593, options [nop,nop,TS val 3297627439 ecr 1263163448], length 156
                              

                              States still exists on primary but are not seeing any traffic (it's reasonable):

                              LAN 	tcp 	172.16.103.2:43290 -> aaa.bbb.ccc.ddd:22                        ESTABLISHED:ESTABLISHED 	232 / 220 	17 KiB / 34 KiB 	
                              WAN2 	tcp 	192.168.1.100:54741 (172.16.103.2:43290) -> aaa.bbb.ccc.ddd:22  ESTABLISHED:ESTABLISHED 	232 / 220 	17 KiB / 34 KiB
                              

                              Leave Persistent CARP mantenance mode on Primary. States on primary are seeing the traffic again (232 now 294) and SSH terminal replies again:

                              
                              LAN 	tcp 	172.16.103.2:43290 -> aaa.bbb.ccc.ddd:22                        ESTABLISHED:ESTABLISHED 	294 / 291 	22 KiB / 80 KiB 	
                              WAN2 	tcp 	192.168.1.100:54741 (172.16.103.2:43290) -> aaa.bbb.ccc.ddd:22  ESTABLISHED:ESTABLISHED 	294 / 291 	22 KiB / 80 KiB
                              

                              I'm going to check our differences.

                              1 Reply Last reply Reply Quote 0
                              • D
                                dayer
                                last edited by

                                @Derelict, please, what VM has you tested? VirtualBox? VMware? Could you share your two XML config files?

                                I've tested that situation with VirtualBox (before and after a factory default, to remake and check the settings) and with two identical physical machines also. No success.
                                We're considering use pfSense and be sure this functionality works in pfSense is important to us.

                                1 Reply Last reply Reply Quote 0
                                • DerelictD
                                  Derelict LAYER 8 Netgate
                                  last edited by

                                  XenServer. No, I do not have the configurations from those tests any more.

                                  The real key is how you are testing. Note that TRex was generating about 350K states there.

                                  The hypervisor used will be of no consequence to what gets policy routed where.

                                  Chattanooga, Tennessee, USA
                                  A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                  DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                  Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                  1 Reply Last reply Reply Quote 0
                                  • D
                                    dayer
                                    last edited by

                                    Thank you, Derelict.

                                    I can't find why a established state in pfSense1 isn't routed successfully when pfSense2 is the master.
                                    I'm talking about this example (rules here):

                                    Pinging 208.123.73.69

                                    States on Primary/MASTER:

                                    LAN     icmp    172.16.103.2:10618 -> 208.123.73.69:10618                           0:0     69 / 69     6 KiB / 6 KiB 	
                                    WAN2    icmp    192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632     0:0     69 / 69     6 KiB / 6 KiB
                                    

                                    States on Secondary/BACKUP:

                                    
                                    LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618                           0:0     0 / 0 	0 B / 0 B 	
                                    WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632     0:0     0 / 0 	0 B / 0 B
                                    

                                    Persistent CARP maintenance mode on Primary:

                                    States on secondary start seeing the traffic but something goes wrong. The number of packets observed matching the state from the destination side is zero.

                                    LAN 	icmp 	172.16.103.2:10618 -> 208.123.73.69:10618 	                        0:0 	58 / 0 	5 KiB / 0 B 	
                                    WAN2 	icmp 	192.168.1.100:57632 (172.16.103.2:10618) -> 208.123.73.69:57632 	0:0 	58 / 0 	5 KiB / 0 B
                                    

                                    It's like the policy routing is ignored for this kind of situation and the firewall is trying route the established traffic through the default gateway (and use the NAT for a connection to exit through another WAN).
                                    I've tried also with floating rules, without success.
                                    There's some command or log to check what decisions takes pfSense with search state or packet

                                    1 Reply Last reply Reply Quote 0
                                    • DerelictD
                                      Derelict LAYER 8 Netgate
                                      last edited by

                                      Except policy routing is not ignored in that scenario.

                                      It looks like it is not policy routing that is your problem, since you see the traffic going out thoses states.

                                      It looks like your problem is whatever is upstream is refusing to accept the CARP VIP (and MAC address) moving from primary to secondary.

                                      Policy routing only affects outbound traffic. It can't do anything about problems with the reply traffic.

                                      Chattanooga, Tennessee, USA
                                      A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                      DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                      Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                      1 Reply Last reply Reply Quote 0
                                      • D
                                        dayer
                                        last edited by

                                        Thank you Derelict.
                                        I understand your point of view. However, if…

                                        It looks like your problem is whatever is upstream is refusing to accept the CARP VIP (and MAC address) moving from primary to secondary.

                                        I can't understand why I only see this behavior when the gateway from LAN traffic to outside is different from the default gateway. If the gateway from LAN traffic to outside is the same from the default gateway, everything goes well.

                                        That is:

                                        • LAN: 192.168.2.0/24

                                        • WAN1: 192.168.1.0/24

                                        • WAN2: 192.168.56.0/24

                                        Rules for LAN:

                                        States      Protocol    Source  Port    Destination     Port    Gateway     Queue   Schedule    Description     Actions
                                        1 /427 B    IPv4 *      *       *       LAN net         *       *           none
                                        1 /1.04 MiB IPv4 *      *       *       *               *       GW1         none
                                        

                                        Gateways (default gateway = gateway for LAN to outside):

                                        Name            Interface   Gateway         Monitor IP
                                        GW1 (default)   WAN1        192.168.1.1     192.168.1.1
                                        GW2             WAN2        192.168.56.1    192.168.56.1
                                        

                                        I try with SSH and it's goes well.

                                        States relalted to xx.xxx.xxx.xxx in pfsense1 (master):

                                        
                                        LAN     tcp     192.168.2.1:60626 -> xx.xxx.xxx.xxx:22522                           ESTABLISHED:ESTABLISHED     146 / 130   11 KiB / 20 KiB 	
                                        WAN1    tcp     192.168.1.20:62445 (192.168.2.1:60626) -> xx.xxx.xxx.xxx:22522      ESTABLISHED:ESTABLISHED     146 / 130   11 KiB / 20 KiB
                                        
                                        

                                        States related to xx.xxx.xxx.xxx in pfsense2 (backup):

                                        LAN     tcp     192.168.2.1:60626 -> xx.xxx.xxx.xxx:22522                           ESTABLISHED:ESTABLISHED     0 / 0       0 B / 0 B 	
                                        WAN1    tcp     192.168.1.20:62445 (192.168.2.1:60626) -> xx.xxx.xxx.xxx:22522      ESTABLISHED:ESTABLISHED     0 / 0       0 B / 0 B
                                        

                                        Enter Persistent CARP Maintenance Mode

                                        States related to xx.xxx.xxx.xxx in pfsense1 (backup):

                                        LAN     tcp     192.168.2.1:60626 -> xx.xxx.xxx.xxx:22522                           ESTABLISHED:ESTABLISHED     339 / 321   21 KiB / 48 KiB 	
                                        WAN1    tcp     192.168.1.20:62445 (192.168.2.1:60626) -> xx.xxx.xxx.xxx:22522      ESTABLISHED:ESTABLISHED     339 / 321   21 KiB / 48 KiB
                                        

                                        States related to xx.xxx.xxx.xxx in pfsense2 (master):

                                        LAN     tcp     192.168.2.1:60626 -> xx.xxx.xxx.xxx:22522                           ESTABLISHED:ESTABLISHED     111 / 111   6 KiB / 16 KiB 	
                                        WAN1    tcp     192.168.1.20:62445 (192.168.2.1:60626) -> xx.xxx.xxx.xxx:22522      ESTABLISHED:ESTABLISHED     111 / 111   6 KiB / 16 KiB
                                        

                                        But if the default gateway is not the gateway for LAN to outside:

                                        Name            Interface   Gateway         Monitor IP
                                        GW1             WAN1        192.168.1.1     192.168.1.1
                                        GW2 (default)   WAN2        192.168.56.1    192.168.56.1
                                        

                                        Then, the behavior is well until I put CARP Maintenance Mode (pfsense1 backup, pfsense2 master) and the states related to xx.xxx.xxx.xxx in pfsense2 (master) are:

                                        LAN     tcp     192.168.2.1:60632 -> xx.xxx.xxx.xxx:22522                           ESTABLISHED:ESTABLISHED     31 / 5      5 KiB / 1 KiB 	
                                        WAN1    tcp     192.168.1.20:49862 (192.168.2.1:60632) -> xx.xxx.xxx.xxx:22522      ESTABLISHED:ESTABLISHED     31 / 5      5 KiB / 1 KiB
                                        

                                        and the SSH client is like frozen until I leave Persistent CARP Maintenance Mode and pfsense1 recovers the master role.

                                        1 Reply Last reply Reply Quote 0
                                        • DerelictD
                                          Derelict LAYER 8 Netgate
                                          last edited by

                                          I don't know. You have something scrwed up in your outbound NAT it looks like. I built this and it works fine. Using multiple versions of pfSense. Countless people are doing exactly the same thing.

                                          I suggest you reinstall and start over as simply as possible. Adding nothing but what is necessary to test this concept.

                                          You are chasing a red herring with the "not default gateway" thing. It doesn't exist. It is something else you have done.

                                          Chattanooga, Tennessee, USA
                                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                          1 Reply Last reply Reply Quote 0
                                          • Z
                                            ZsZs
                                            last edited by

                                            Hi Dayer and Derelict,

                                            Apparently I am in the same situation as Dayer. The network layout is the same.
                                            I am using 2.3.4 and I made a clean install as follows (apologize for the detailed list, but there might be an obvious mistake or missing part):

                                            • the two VMs running on the same ESXi 6.0U3 host (for testing purpose)
                                            • set up WAN1 and WAN2 in different subnet:
                                                - WAN1 public /26 (default GW)
                                                - WAN2 internal /24 (behind a cable modem)
                                            • set up WAN1 and WAN2 with following monitoring IP addresses:
                                                - WAN1: 8.8.8.8
                                                - WAN2: 208.67.220.220
                                            • add two DNS servers to each WAN
                                            • Configure DNS resolver to forwarder mode
                                            • install Open-vm-tools package (no other packages have been installed)
                                            • Set up HA for syncing state and configs
                                            • Set up CARP IPs (WAN1-VIP, WAN2-VIP, LAN-VIP) with appropriate netmask
                                            • change Outbound NAT to Manual
                                                - remove auto-created SYNC interface related outbound NAT rules
                                                - change NAT address to WAN1-VIP on rules with interface WAN1
                                                - change NAT address to WAN2-VIP on rules with interface WAN2
                                            • create WAN1first gateway group with WAN1GW Tier1, WAN2GW: Tier2
                                            • create WAN2first gateway group with WAN1GW Tier2, WAN2GW: Tier1
                                            • create FW rule with Policy routing for ssh traffic in LAN:
                                            Protocol    Src Prt Dst Prt Gateway     Queue
                                            IPv4 TCP    *   *   *   22  WAN2first   none
                                            

                                            I've tried following policy routing scenarios by simply:

                                            • changing GW in the above rule
                                            • disabling the aboce rule
                                            • toggling default GW
                                            
                                            defGW   policy route   SSH sesseion after failover
                                            WAN1    disabled       OK
                                            WAN1    GW:WAN1GW      OK
                                            WAN1    GW:WAN2GW      Freezes
                                            WAN1    GW:WAN1first   OK
                                            WAN1    GW:WAN2first   Freezes
                                            
                                            WAN2    disabled       OK
                                            WAN2    GW:WAN1GW      Freezes
                                            WAN2    GW:WAN2GW      OK
                                            WAN2    GW:WAN1first   Freezes
                                            WAN2    GW:WAN2first   OK
                                            

                                            I had the same issue, that in case the policy routing rule points to a gateway (group) other than the default, then after the HA fail-over to the secondary node the opened session freezes.
                                            I can open a new ssh session via the new master, but moving the VIP back to the primary node this one freezes and the previously opened session starts responding again.
                                            I also saw in tcpdump, that when the ssh session freezes, the traffic leaves the firewall on wrong WAN interface (on the default one) with the other WAN interface's source IP address.

                                            I appreciate any hints you might have.

                                            Regards,
                                            Zsolt

                                            edit: typos, some clarification added

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.