CARP issues
-
Update.
I swapped out the TP-link with a Cisco switch and the same issue persists. The primary unit is the only unit online and ping -t to 8.8.8.8 from a workstation on the local lan works for about 450 seconds then begins to timeout. If I go into carp settings page and just click "save" and "accept changes" (even without making any changes} it begins to work for another 450 seconds. The switch is a plain vanilla install and the outside interface is running advanced outbound nat which is mapping to the virtual ip
As an added test, I simultaneously setup a continuous ping to the outside interface VIP address from the same host inside and that ping begins to die at the exact same time as the one to 8.8.8.8. It would stand to reason in my mind that if this were a upstream switch related issue. I also have a laptop sitting on the outside subnet where this internet facing VIP resides and the laptop can ping the VIP and it sees the VIP autogenerated MAC address in it's own arp table.
-
You'll have to be pore specific, post some screen shots, packet captures, etc.
It does not stand to reason that you are seeing a CARP problem that nobody else is.
Proper functioning of CARP does not require an ARP entry on the local firewall for the CARP VIP.
xn1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=3<RXCSUM,TXCSUM> ether f2:92:fa:6a:32:79 hwaddr f2:92:fa:6a:32:79 inet6 fe80::f092:faff:fe6a:3279%xn1 prefixlen 64 scopeid 0x6 inet 172.25.228.18 netmask 0xffffff00 broadcast 172.25.228.255 inet 172.25.228.140 netmask 0xffffff00 broadcast 172.25.228.255 vhid 228 inet 172.25.228.65 netmask 0xffffff00 broadcast 172.25.228.255 vhid 228 inet 172.25.228.66 netmask 0xffffff00 broadcast 172.25.228.255 vhid 228 inet 172.25.228.67 netmask 0xffffff00 broadcast 172.25.228.255 vhid 228 inet 172.25.228.17 netmask 0xffffff00 broadcast 172.25.228.255 vhid 228 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet manual status: active carp: MASTER vhid 228 advbase 1 advskew 0
PING 172.25.228.1 (172.25.228.1) from 172.25.228.17: 56 data bytes 64 bytes from 172.25.228.1: icmp_seq=0 ttl=64 time=1.280 ms 64 bytes from 172.25.228.1: icmp_seq=1 ttl=64 time=0.268 ms 64 bytes from 172.25.228.1: icmp_seq=2 ttl=64 time=0.244 ms --- 172.25.228.1 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.244/0.597/1.280/0.483 ms
Shell Output - arp -n 172.25.228.17 172.25.228.17 (172.25.228.17) -- no entry
-
Hello,
thanks again for responding. I'm uploading a tcp dump taken off the outside interface
there are two packet captures. the first was taken while it's working and the second when it stops.
I've also included my config.xml, loader.conf outpout of sysctl and ifconfig and pciconf files and attached the log file generated around the same time -
Oh, and sorry for not specifying. The packet captures and other files are in the debug.tar file and the log is in the other file
thanks again
-
Please download and post actual pcaps so wireshark can do the heavy lifting.
Thanks.
Also please describe exactly what is "working" and what isn't. Like what traffic to actually look at.
-
I'm posting a capture of the pfsense capture as it's working. meaning, this is the gateway I am literally connecting to this forum right now through .. I've got a continuous ping running to 8.8.8.8 as well. I about 5 minutes I'll be posting the capture taken when it stops working [0_1532557784353_while-working.cap](Uploading 100%)
-
-
You might want to set the pcaps to more than 100 frames.
That showed one request and one reply and no CARP.
The best thing to have in your case is probably a transition from working to not working. 10000 frames, 100000 frames. Whatever it takes.
-
-
-
Is there a package you haven't installed?
It also appears you are playing fast and loose with what is and isn't RFC1918.
-
I see nothing in those captures to indicate a problem.
"when-its-broken" When exactly WHAT is broken?
-
Can no longer connect to the internet and the continuous ping comes to an immediate halt
-
Cannot connect to the internet from where and continuous pings to what? What does cannot connect mean? DNS resolution? HTTP? HTTPS? What? To where? From where?
Sorry, but you are going to have to be far more specific. From what you have posted it looks like there is no problem on the WAN.
I am fairly sure this is an issue in your virtual environment/switching that will not be solved by changing anything in the firewall settings.
https://www.netgate.com/docs/pfsense/routing/connectivity-troubleshooting.html
https://www.netgate.com/docs/pfsense/highavailability/troubleshooting-high-availability-clusters.html
-
OK so it looks like there is no response to traffic when it is "broken" and the source is the CARP VIP. But the traffic is going out and there is no reply. You will need to investigate upstream and see why that is.
I don't see any ARP requests for the CARP VIP, and certainly none that are going unanswered.
So, it still points to something upstream probably in your layer 2.
-
Gone through 3 switches now. 1 TP-link and two Cisco catalyst 2960G both reset to factory defaults running a completely vanilla out of the box config.
-
And?
Look at this file you sent: 1532558530465-when-its-broken.pcap
Set a wireshark filter for
icmp
Start at frame 183 and look through frame 212.
When the traffic is sourced from .172 (the CARP VIP) there is no response but the traffic IS being sent.
When the traffic is sourced from .173 (the interface address) there is a response.
You have to figure out what is going on UPSTREAM of the firewall that causes this to be true.
I am going to move this to the Virtualization forum because that is where I'll bet your problem is. Some setting in the hypervisor. Maybe it will only allow one active MAC address on the interface at a time or something.
-
Ok. Thanks for the reply
Does anyone on the virtualization side have any ideas? Ive done pci passthrough via hostdev in libvirt xml and pci stubs in grub. Im under the impression that since the OS has no knowledge of the NIC card then neithier does libvirt since its a user space app. As i posted ealier freebesd sees the actual intel chipset instead of the standard e1000 emulated chip that QEMU provides to the guest. Also the mac addresses that pfsense sees on the NICs are those that are hardcoded on the hardware Additionally, the xml config has no entry for these nics and the centos cant even bring them up via ifup as the driver has never bound itself to the card.
Maybe im missing something on the hypervisor side here but im under the impression that atandard anti spoofing mac address feature shouldnt apply here since libvirt is unaware of the existence or the card. Or is it?
Thsnks