[SOLVED] New install not working… Proxy ARP vs. IP Alias?

  • I am installing a new server. I have applied a merged backup from an existing, working, install. The configuration in practically the same with a few exceptions:

    The new server is using Link Aggregation (wasn't supported in v1 when the original server was built)
    The new server has the virtual IPs defined as IP Alias where the original server had them defined as Proxy ARP.

    This is a moderately complex configuration. It handles two public /29 blocks so there are two WAN interfaces. It also handles 6 LAN interfaces. Each WAN interface has a public IP defined and the remaining 4 IPs are defined as a virtual IP on each WAN interface.

    What does not work are the cases where there is a NAT rule that sets the "NAT Address" to the IP of an Alias.

    Network trace shows that the UDP traffic in translated and exits correctly. However, the UDP reply is only seen on the WAN interface, it never makes it across to the LAN side. There is no indication in the log that the packets were rejected by a filter or failed to match a filter.

    Could this be related to the difference in the Alias type? Is a different configuration needed when changing to IP Alias from ARP Proxy?

  • Well, I flipped them back to Proxy ARP and noting going out the Alias gets back in. The states all say SINGLE:NO_TRAFFIC. I have done a diff on the two config.xml's and there is no appreciable difference that I can see.

  • OK, I've dumped the rules on both boxes (pfctl -sa) and done a diff on the [TRANSLATION RULES:] and [FILTER RULES:] sections. There is 100% match between the two. Yet pf on the new box behaves differently. How is this possible? ???

    Is there a bug in pf with LAG? Actually its VLAN (802.1Q) over LACP (802.1ax). The 802.1Q isn't new, the old server is using it. The 802.1ax is new to this server.

  • Ok, this is solved. :)

    It turned out to be a Layer 1 problem of all things. I wasn't looking at the network trace closely enough. :-[

    The traffic between the WAN side router and the WAN interfaces were fine. However upon much, much, [u]much closer inspection I found the outbound WAN traffic from the virtual IPs had the source MAC for the new server and a destination MAC of the WAN side routers, however, the inbound traffic had the source MAC as the WAN side routers but the destination MAC was that of the old server. ::)

    The network monitor saw the traffic on the new servers interface since its on the same VLAN and the interface is in promiscuous mode, but, the new server never saw the traffic since it was the wrong MAC address.

    Its somewhat odd that the edge routers readily picked up the new MAC for the primary interface IP but then held on to the old MAC on all the virtual IP for dear life. ???  Nevertheless, clearing the ARP cache in the edge routers solved the problem. 8)

  • You meant Layer2 problem? cause of MAC-addresses work on datalink-layer and not on physical.

  • @Metu69salemi:

    You meant Layer2 problem? cause of MAC-addresses work on datalink-layer and not on physical.

    Yes, that is my mistake, the Link Layer, which is Layer 2, which is not the physical hardware. Attention to detail like that is very important. I had traffic with the right IP address showing up on the right interface just as expected… but the Ethernet MAC address makes all the difference in the world. Overlook the details and you get burned.

  • I had two devices last thursday which had weird kind of arp-table, it didn't clear up at all before boot.. that caused few hours problem solving. those devices were LAN/LONtalk converters

Log in to reply