ARP packets not send from CARP-MAC cause problem in modem portforward ARP-cache
It seams ARP packets not using CARP-MAC cause problems when trying to work around that..
Here is my situation:
Ive got 2 PFsense boxes (2.0.1-RELEASE) which im trying to configure for a lot of purposes.
-failover if 1 internet connection fails
-failover if 1 PFsense server fails
-LAN client machines trying to use the internet using gateway CARP-LAN ip
-roadWarior's using OpenVPN, connecting to externIP of router(s), portforwarding to internal CARP-WAN ip
Also by use of VLAN's webservers accessible from the internet with portforwarding are separated from the LAN by firewall rules.. should the webserver become compromised the other LAN machines would not be accessible.
If all fails, be able to change the gateways of the client machines and then without any other reconfiguring be able to use the internet.
internet is connected behind 2 modem/router-devices both have internet IPs and a internal IPs(192.168.8.243 and 192.168.8.242).. And have a portforwarding rule to the CARP-WAN-IP UDPport 96 which is used for OpenVPN trafic.
pfSense also uses 2 these devices as it gateway's.
PFsense.3 LAN-ip 192.168.8.3 WAN-ip: 192.168.8.2 [00:0c:29:a1:cc:db] (backup)
PFsense.5 LAN-ip 192.168.8.5 WAN-ip: 192.168.8.4 [00:0c:29:46:26:2d] (master)
CARP-WAN: 192.168.8.6 [00:00:5e:00:01:02]
Yes i know WAN and LAN interfaces on the same subnet/network is pretty strange, but seams to best fit some of the current requirements and constraints..
It seams to work 'mostly'.. If PFsense.5 is running i can connect with OpenVPN client and it works 'OK', after shutting down PF.5 and failing over to PF.3 after reconnection of OpenVPN clients all seams fine.
Now after the PF.5 comes back up and takes back the MASTER of the carp interfaces. my Draytek modem/router keeps recieving udp packets from PF.3 interface and source 192.168.8.6. and as because of this assumes the PF.3 WAN interface is the way to send packets to destined for 192.168.8.6.. Also other connections destined for 192.168.8.6 go to box PF.3 because of a wrong ARP-cache.
TCPdump from PF.5
22:20:01.679471 00:50:7f:c9:ac:b0 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.8.6 tell 192.168.8.243, length 46 22:20:01.679477 00:0c:29:46:26:2d > 00:50:7f:c9:ac:b0, ethertype ARP (0x0806), length 42: Reply 192.168.8.6 is-at 00:00:5e:00:01:02, length 28
TCPdump from PF.3
22:20:00.688137 00:0c:29:a1:cc:db > 00:50:7f:c9:ac:b0, ethertype IPv4 (0x0800), length 591: 192.168.8.6.96 > 89.98.X.X.1194: UDP, length 549 22:20:00.713112 00:50:7f:c9:ac:b0 > 00:0c:29:a1:cc:db, ethertype IPv4 (0x0800), length 143: 89.98.X.X.1194 > 192.168.8.6.96: UDP, length 101 22:20:01.346365 00:0c:29:a1:cc:db > 00:50:7f:c9:ac:b0, ethertype IPv4 (0x0800), length 271: 192.168.8.6.96 > 89.98.X.X.1194: UDP, length 229 22:20:01.502634 00:50:7f:c9:ac:b0 > 00:0c:29:a1:cc:db, ethertype IPv4 (0x0800), length 191: 89.98.X.X.1194 > 192.168.8.6.96: UDP, length 149
After clearing ARP-cache of the draytek it asks again what it the destination for 192.168.8.6, and gets a reply as can be seen above.. However after that the UDP packets from PF.3 still come in and seam to 'overwrite' the ARPcache. Have as a result that the OpenVPN client still is/gets connected to PF.3 ..
The draytek can be configured to not allow reconfiguring by 'spoofed' packets using telnet IP ARP ALLOW , but then also rejects the ARP-Reply from PF.5, because it is not send from interface 00:00:5e:00:01:02..
With telnet to Draytek ther is an option "ip arp accept" which can be used for setting this option: "Ethernet source address doesn't match ARP sender address. Accept illegal ARP REPLY packets or not." This is currently configured to accept the Reply from PF.5 even though packet sender address does not match..
So i think my main question is, is it possible to have the ARP traffic from a CARP-ip also leave with the CAPR interface MAC-address? I think it would solve this issue.
After some more investigation i can simplify (part of) the problem to this:
The ARP-Cache of the Draytek router gets 'poisoned' by IP packets that contain a wrong source MAC adress, more or less the same as it is described here: http://www.keil.com/support/man/docs/rlarm/rlarm_tn_using_udp_arpempty.htm. In short, when an IP packet is recieved it is used to fill the ARP cache. Even when the source MAC is not the 'true owner' of the IP adress.
My simplified situation:
-1 Draytek Internet router
-2 pfSense boxes pf.3 and pf.5 used for failover scenario used for incomming openVPN roadwarrior connections.
The the draytek has internal IP 192.168.8.243 [00-50-7f-c9-ac-b0]
PFsense.3 WAN-ip: 192.168.8.2 [00:0c:29:a1:cc:db] (backup)
PFsense.5 WAN-ip: 192.168.8.4 [00:0c:29:46:26:2d] (master)
CARP-WAN: 192.168.8.6 [00:00:5e:00:01:02]
OpenVPN servers on both pfSense boxes are listening on the CARP-WAN.
Draytek uses a to portforward to 192.168.8.6:94 to let the external openVPN clients connect to the openVPNservers on the pfSense boxes.
When master is down and a openVPN connection is established to PF.3. After the master get back the openVPN client is still trying to communicate with PF.3 and udp packes from both sides keep flowing. The Draytek looks at these packets with sourceIP .8.6 and MAC 00:0c:29:a1:cc:db and has/puts this in its arp cache.. Now when another packet from the roadwarrior arrives it is forwarded to .8.6 which according to ARPcache is destined for the MAC of PF.3 .. However the normal internet connections which which have outgoing traffic through PF.5 also send packets with source .8.6 and MAC 00:0c:29:46:26:2d, thus the ARP-cache is constantly switching what mac to send packets to which are supposed to go to 192.168.8.6.
My initial conclusion in first post was wrong, the disabling of the the "Accept illegal ARP REPLY" does NOT solve the 'ARP-processing' of normal IP packets. If the ARP-cache is empty a arp message is send and the proper MAC is returned but as soon as packets actually start going through the DrayTek it changes its ARPcache to contain the wrong MAC.. And problems occur with the failovers when the PFsense boxes are both 'running'..
Ive seen new openVPN connections buildup to the backup pfSense box and work for a few minuts, even though the master was functioning properly for a few hours already.. There are a high number of 'lost packets' and as such the openVPN connection is highly unstable.. But on reconnect still manages to get connected to the backup PF.3 box.. Even while PF.5 is master of all CARP ip's and has been so for over an hour..
One thing that could possibly workaround this problem is if at least no more packets with source WAN-CARP-ip would be send from the PFsense box that is not the current owner of the WAN-CARP-ip..
Does anyone know of a good workaround for this kind of problems? Or did i miss some setting? What other information is needed to investigate further? Thanks for you time.
Turns out that i made a NAT rule for outbound traffic so it leaves from the carp ip instead if the interface-ip. However for checking the state of the gateway it also send regular packets with the carp-ip as the source… Which is the cause for the draytek to keep updating the arp cache.. After changing this the problem is mostly avoided.. Though currently not a problem it is still strange, in my opinion, that it does not use the carp-mac adress for the source.
I'd file that as a bug with Draytek, that's not proper behavior. I've never heard of anything else that behaves that way. It's not just CARP that does that, other routing redundancy protocols are no different.