kernel: arpresolve: can't allocate llinfo (on interface with BGP instance)

mohsh86

I've got a Virtualised pfSense with the following Setup:

LAN/VLANS: each VLAN belongs to one vmx interface, all LAN-VLAN's are created via separate PortGroups in ESXi, living on the same vSwitch having a dedicated physical uplink
WAN/VLAN: a single VLAN via a separate portGroup in ESXi, living on a different vSwitch_WAN, having another dedicated physical uplink
Layer2 connection to a Virtualized Cloud Router (MCR) that hooks up to Amazon AWS: a service provided by ISP providing a L2 connection to the Cloud Connect Service Provider's (Megaport) Data Center, the corresponding pfSense interface has BGP configured and it works like a charm. subnet configuration on this interface is 10.10.20.0/30, pfSense having .20.2, virtualized router having .20.1

connection drops randomly. Whenever i try pinging the Virtual Router interface (10.10.20.1) in console, pfSense output:

PING 10.10.20.1 (10.10.20.1): 56 data bytes
ping: sendto: Invalid argument

log output:

Dec 10 10:24:45 pfSense kernel: arpresolve: can't allocate llinfo for 10.10.20.1 on vmx7

I've decided to a packet capture (attached0_1544401910838_packetcapture.zip ) and see what makes it drop the connection, and the surprise was that the Virtualized Router interface (10.10.20.1) is simply sending an ARP asking again for it's peer connection (10.10.20.2), then pfSense replies to ARP, but does not respond to anything else after that. The ARP table of pfSense contains the correct MAC addresses for both IP's

Any idea?

Note first ARP packet in capture is 2201, after that pfSense stops responding.

stephenw10

Where was that packet capture taken? Was it filtered?

It looks like pfSense is losing ARP for the upstream device from the logs but I don't see it ARPing for it in the pcap. At least not wherever that was taken. Check other interfaces. Maybe you have some conflict there.

Steve

mohsh86

@stephenw10 the packet capture is taken from the same interface suffering from this issue, vmx7 is a virtualised interface that lives on a separate vSwitch_NOC alone in esxi, and has a separate dedicated uplink that goes to the ISP switch providing the L2 connection to the Megaport data centre.

I've tried changing the the uplink from a dual port nic (intel igb) to the broadcome built in (port 4 of the quad on a Power Edge R720), didn't make a difference.

I've just noticed an update in pfSense kernel, i've done the upgrade and rebooted, will keep monitoring.

stephenw10

Ok well if it comes back I'd check the other interfaces to see if it's ARPing there. It's not doing so there if you were pcapping on the actual interface in question.
Also make sure you have all the hardware offloading options disabled.

Steve