Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP VIP periodic packet loss

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    6 Posts 4 Posters 3.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Snoopy
      last edited by

      I'm seeing very strange behaviour in pfsense 1.2.3, and I can't isolate the problem.

      The config:
      ISP switch - pfsense - wan_ip .6 – 192.168.0.1/24 user lan
                                    carp_vip .5 -- DMZ interface for mail server with local ip 10.0.0.5
                                    carp_vip .4 -- DMZ2 interface for vpn with local ip 10.0.1.2

      DMZ and DMZ2 have NAT 1:1 rules.

      It was working fine for years until yesterday. Since then pings from outside to vip .5 are periodically lost: it responds for 87 seconds, then there is no reply for 27 seconds, then it comes back for another 87 secs...
      Pings from internal lan to mailserver's local ip 10.0.0.5 are not lost. Pings from outside to .4 vpn box are also fine.

      Tried all this, and nothing helped:
      changed physical interface on pfsense for DMZ
      tried laptop instead of real mail server

      Also tried disabling carp_vip .5, put laptop with the same address .5 directly on ISP switch - no loss!
      Put everything back, then ran packet capture on WAN side, but when pings are lost, I don't even see the ICMP request coming from ISP switch (should I?)

      ISP says everythings fine on their side. They also claim that ARP entry for .5 is not expired, when packets are lost. But if that is true, then why I can't see ICMP request? Is it problem on my side or theirs?

      The only thing I changed on pfsense at around the moment of problem beginning, was changing one firewall rule for DMZ interface (changed source IP in traffic block rule).

      1 Reply Last reply Reply Quote 0
      • R
        rickbaran
        last edited by

        I know that this is not a direct answer but I would look at upgrading. There where so many fixes in 2.x compared to 1.2.x. I had a whole lot of wierd things  happening in 1.2.x and all of my issues went away, most important no new issues after upgrade.

        1 Reply Last reply Reply Quote 0
        • S
          Snoopy
          last edited by

          Yup, probably it's time.

          For the time being, I just moved mailserver to another external ip, works fine. I still will experiment some more on the old one, put some bogus server behind it, cause I still have a feeling that it's the ISP fault.

          1 Reply Last reply Reply Quote 0
          • C
            cmb
            last edited by

            Sounds a lot like an IP conflict, or potentially a MAC conflict (if you or your ISP are running CARP or VRRP somewhere else on the same broadcast domain with the same VHID).

            You should definitely upgrade, but it's highly unlikely the described scenario will be any different.

            1 Reply Last reply Reply Quote 0
            • R
              Reiner030
              last edited by

              Hi,

              upgrading can't help - I have same problem with pfSense 2.1. Beta1 … :(

              I have a DMZ Setup for 2 buildings... the DMZ area is an public AS.
              "Public" router pair on building 1 has the .1 and works great from all firewalls.
              "Public" router pair on building 2 has the .254 and works lousely...

              I first noticed it when my master router on building2 crashed and slave router was using the .254.
              The only maschine who get a ping to the CARP IP was the slave itself. All other firewalls get not response.

              Now when master is up again they got an answer but with different loss percentages between 18% and 52%.
              The only packet-lossy machine is the holder of the .254. (even the slave has losses) :(
              Important: other way works all completely packet-lossy so there can't be local networking problems
              (it's an VLAN, all other normal traffic has also no problems).

              When I found this problem I do debugging  "tcpdump -ni em1 icmp" on the slave with .254 => no ICMP pings arrived at the machine but pings to .252 came 100% trough.
              So I checked with "arp <ip>" what it MAC is.... correct CARP MAC was shown...
              I deleted for security that there must be a second response partner with this IP the arp cache "arp -d -a" and ping again.
              But again the right CARP appears in the arp cache... several times tested with different sources.

              Here an short overview of actually ping/response to the internal master of building 1 to .254:

              
              [2.1-BETA1][root@fw1-jws1.local]/root(3): ping xx.xx.176.254
              PING xx.xx.176.254 (xx.xx.176.254): 56 data bytes
              64 bytes from xx.xx176.254: icmp_seq=0 ttl=64 time=2.343 ms
              64 bytes from xx.xx.176.254: icmp_seq=2 ttl=64 time=2.262 ms
              64 bytes from xx.xx.176.254: icmp_seq=3 ttl=64 time=2.167 ms
              64 bytes from xx.xx.176.254: icmp_seq=6 ttl=64 time=2.308 ms
              64 bytes from xx.xx.176.254: icmp_seq=7 ttl=64 time=2.403 ms
              64 bytes from xx.xx.176.254: icmp_seq=9 ttl=64 time=2.502 ms
              64 bytes from xx.xx.176.254: icmp_seq=10 ttl=64 time=7.149 ms
              ^C
              --- xx.xx.176.254 ping statistics ---
              11 packets transmitted, 7 packets received, 36.4% packet loss
              round-trip min/avg/max/stddev = 2.167/3.019/7.149/1.689 ms
              
              

              And what is seen on interface from master on building 2 with IP .254:

              
              [2.1-BETA1][root@gw1.zws8.local]/root(13): tcpdump -ni em1 icmp | grep -E "seq [0-9]{1,3},"
              tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
              listening on em1, link-type EN10MB (Ethernet), capture size 96 bytes
              13:09:09.795925 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 0, length 64
              13:09:09.795958 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 0, length 64
              13:09:11.820866 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 2, length 64
              13:09:11.820879 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 2, length 64
              13:09:12.830401 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 3, length 64
              13:09:12.830418 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 3, length 64
              13:09:15.859130 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 6, length 64
              13:09:15.859143 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 6, length 64
              13:09:16.868789 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 7, length 64
              13:09:16.868802 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 7, length 64
              13:09:17.264433 IP xx.xx.176.6 > xx.xx.176.254: ICMP echo request, id 45327, seq 355, length 60
              13:09:17.264452 IP xx.xx.176.254 > xx.xx.176.6: ICMP echo reply, id 45327, seq 355, length 60
              13:09:18.274343 IP xx.xx.176.6 > xx.xx.176.254: ICMP echo request, id 45327, seq 611, length 60
              13:09:18.274356 IP xx.xx.176.254 > xx.xx.176.6: ICMP echo reply, id 45327, seq 611, length 60
              13:09:18.888109 IP xx.xx.176.5 > xx.xx.176.254: ICMP echo request, id 4933, seq 9, length 64
              13:09:18.888121 IP xx.xx.176.254 > xx.xx.176.5: ICMP echo reply, id 4933, seq 9, length 64
              13:09:19.284002 IP xx.xx.176.6 > xx.xx.176.254: ICMP echo request, id 45327, seq 867, length 60
              13:09:19.284017 IP xx.xx.176.254 > xx.xx.176.6: ICMP echo reply, id 45327, seq 867, length 60
              ^C714 packets captured
              4302 packets received by filter
              0 packets dropped by kernel
              
              

              And ARP cache fits again… this is an CARP IP and the right one...

              
              [2.1-BETA1][root@fw1-jws1.local]/root(4): arp xx.xx.176.254
              xx.xx.176.254 (xx.xx.176.254) at 00:00:5e:00:01:d5 on em0_vlan7 expires in 809 seconds [vlan]
              
              
              
              [2.1-BETA1][root@gw1.zws8.local]/root(14): tcpdump -ni em1 proto carp
              tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
              listening on em1, link-type EN10MB (Ethernet), capture size 96 bytes
              ...
              13:41:04.203621 IP xx.xx.176.253 > 224.0.0.18: VRRPv2, Advertisement, vrid 213, prio 0, authtype none, intvl 1s, length 36
              
              

              VHID/vrid 213 => MAC :D5 …

              Courious: if I ping them on my other internal transfer net it works great, too (251 is virtual IP for gw1-zws8.local on it):

              
              [2.1-BETA1][root@fw1-jws1.local]/root(7): ping 192.168.6.251
              PING 192.168.6.251 (192.168.6.251): 56 data bytes
              64 bytes from 192.168.6.251: icmp_seq=0 ttl=64 time=3.487 ms
              64 bytes from 192.168.6.251: icmp_seq=1 ttl=64 time=2.282 ms
              64 bytes from 192.168.6.251: icmp_seq=2 ttl=64 time=2.066 ms
              64 bytes from 192.168.6.251: icmp_seq=3 ttl=64 time=2.157 ms
              64 bytes from 192.168.6.251: icmp_seq=4 ttl=64 time=2.184 ms
              64 bytes from 192.168.6.251: icmp_seq=5 ttl=64 time=2.125 ms
              64 bytes from 192.168.6.251: icmp_seq=6 ttl=64 time=2.136 ms
              64 bytes from 192.168.6.251: icmp_seq=7 ttl=64 time=2.613 ms
              64 bytes from 192.168.6.251: icmp_seq=8 ttl=64 time=2.410 ms
              64 bytes from 192.168.6.251: icmp_seq=9 ttl=64 time=2.508 ms
              64 bytes from 192.168.6.251: icmp_seq=10 ttl=64 time=2.635 ms
              ^C
              --- 192.168.6.251 ping statistics ---
              11 packets transmitted, 11 packets received, 0.0% packet loss
              round-trip min/avg/max/stddev = 2.066/2.418/3.487/0.389 ms
              
              

              Bests

              Reiner</ip>

              1 Reply Last reply Reply Quote 0
              • R
                Reiner030
                last edited by

                @Reiner030:

                Hi,

                upgrading can't help - I have same problem with pfSense 2.1. Beta1 … :(

                I have a DMZ Setup for 2 buildings... the DMZ area is an public AS.
                "Public" router pair on building 1 has the .1 and works great from all firewalls.
                "Public" router pair on building 2 has the .254 and works lousely...

                I first noticed it when my master router on building2 crashed and slave router was using the .254.
                The only maschine who get a ping to the CARP IP was the slave itself. All other firewalls get not response.

                Now when master is up again they got an answer but with different loss percentages between 18% and 52%.
                The only packet-lossy machine is the holder of the .254. (even the slave has losses) :(
                Important: other way works all completely packet-lossy so there can't be local networking problems
                (it's an VLAN, all other normal traffic has also no problems).

                Sorry, found out my problem of this post…
                Other admin transferred my testing VM to another ESX server which wasn't "fixed" several days before this errror behavior so I didn't remembered it:
                http://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting#VMware_ESX.2FESXi_Users

                Perhaps this troubleshooting page helps origin poster, too ...

                Bests

                Reiner

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.