ARP failure with error - kernel: arpresolve: can't allocate llinfo for <address></address>
-
Hi
I've encountered what looks to be a bug where pfSense fails to send an ARP request when it should. A transcript showing problem follows:
(Note that the below is tested with a /31 link, but I have verified the same behaviour with a /30 as well, to discount the 31-bit prefix being an issue)
[2.2.6-RELEASE][admin@firewall]/root: ifconfig vtnet4
vtnet4: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=6c00b8 <vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso,linkstate,rxcsum_ipv6,txcsum_ipv6>ether 52:54:00:af:ae:63
inet6 fe80::5054:ff:feaf:ae63%vtnet4 prefixlen 64 scopeid 0x5
inet 10.198.15.0 netmask 0xfffffffe broadcast 255.255.255.255
inet6 2404:4880:0:800f:: prefixlen 127
nd6 options=21 <performnud,auto_linklocal>media: Ethernet 10Gbase-T <full-duplex>status: active
[2.2.6-RELEASE][admin@firewall]/var/log: arp -an | grep 10.198.15
? (10.198.15.0) at 52:54:00:af:ae:63 on vtnet4 permanent [ethernet]
[2.2.6-RELEASE][admin@firewall]/var/log: ping 10.198.15.1
PING 10.198.15.1 (10.198.15.1): 56 data bytes
ping: sendto: Invalid argument
ping: sendto: Invalid argument
ping: sendto: Invalid argument
^C
–- 10.198.15.1 ping statistics ---
3 packets transmitted, 0 packets received, 100.0% packet loss
[2.2.6-RELEASE][admin@firewall]/var/log: grep vtnet4 /var/log/system.log | tail -5
Mar 8 01:22:26 fw1 kernel: arpresolve: can't allocate llinfo for 10.198.15.1 on vtnet4
Mar 8 01:22:32 fw1 kernel: arpresolve: can't allocate llinfo for 10.198.15.1 on vtnet4
Mar 8 01:23:00 fw1 kernel: arpresolve: can't allocate llinfo for 10.198.15.1 on vtnet4
Mar 8 01:24:00 fw1 kernel: arpresolve: can't allocate llinfo for 10.198.15.1 on vtnet4
Mar 8 01:25:00 fw1 kernel: arpresolve: can't allocate llinfo for 10.198.15.1 on vtnet4The kernel logs an "arpresolve" error, and no ARP request is sent.
A tcpdump on the vtnet4 interface is silent during this time:
[2.2.6-RELEASE][admin@firewall]/root: tcpdump -i vtnet4 -n
A workaround that fixes the problem - for the next ARP timeout period (20 minutes) - is to bounce the interface:
[2.2.6-RELEASE][admin@firewall]/var/log: ifconfig vtnet4 down
[2.2.6-RELEASE][admin@firewall]/var/log: ifconfig vtnet4 uptcpdump then shows the correct ARP request/reply
[2.2.6-RELEASE][admin@firewall]/root: tcpdump -i vtnet4 -n
15:00:20.088441 ARP, Request who-has 10.198.15.1 tell 10.198.15.0, length 28
15:00:20.090577 ARP, Reply 10.198.15.1 is-at 52:54:00:0f:42:c8, length 46And subsequent ARP table and ping response are as expected:
[2.2.6-RELEASE][admin@firewall]/var/log: ping 10.198.15.1
PING 10.198.15.1 (10.198.15.1): 56 data bytes
64 bytes from 10.198.15.1: icmp_seq=0 ttl=64 time=2.043 ms
64 bytes from 10.198.15.1: icmp_seq=1 ttl=64 time=1.760 ms
^C
–- 10.198.15.1 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.760/1.901/2.043/0.142 ms
[2.2.6-RELEASE][admin@firewall]/var/log: arp -an | grep 198.15
? (10.198.15.1) at 52:54:00:0f:42:c8 on vtnet4 expires in 1194 seconds [ethernet]
? (10.198.15.0) at 52:54:00:af:ae:63 on vtnet4 permanent [ethernet]Deleting the arp entry manually reintroduces the problem:
[2.2.6-RELEASE][admin@firewall]/var/log: arp -d 10.198.15.1
10.198.15.1 (10.198.15.1) deleted
[2.2.6-RELEASE][admin@firewall]/var/log: arp -an | grep 10.198.15
? (10.198.15.0) at 52:54:00:af:ae:63 on vtnet4 permanent [ethernet]
[2.2.6-RELEASE][admin@firewall]/var/log: ping 10.198.15.1
PING 10.198.15.1 (10.198.15.1): 56 data bytes
ping: sendto: Invalid argument
ping: sendto: Invalid argumentThe problem only seems to occur on a quiet interface. When configured with a continuous ping through the interface, the ARP entry was renewed OK.
I'm running this pfSense version (the latest at this time):
2.2.6-RELEASE (amd64)
built on Mon Dec 21 14:50:08 CST 2015
FreeBSD 10.1-RELEASE-p25If there's no apparent explanation for this, I'll raise a bug report, but I thought I'd canvass the collective wisdom of the forum first.
Kind regards
Mark</full-duplex></performnud,auto_linklocal></vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso,linkstate,rxcsum_ipv6,txcsum_ipv6></up,broadcast,running,simplex,multicast> -
I'm having a similar problem. I see that exact same message in my log files, but your ifconfig down up trick doesn't help me.
kernel: arpresolve: can't allocate llinfo for 192.168.100.1 on em3I've been fighting this for a while. The system is running on Physical hardware:
Supermicro 1U Intel Atom D525 with 4 Intel (em0) interfaces. 4GB ram 60GB SSD.Banking on your ifconfig down and up idea, I wrote a script to auto-detect that error message in the logs and reset the NICs but it isn't making a difference.
Unfortunately, I'm not able to access the router in person for testing for a few weeks but I do have ssh access. What else can I try without bricking a distant router? -
"can't allocate llinfo" almost always means you're trying to ARP something on a network that isn't directly-connected. AkkerKid: you almost certainly have a different, unrelated issue, that sounds like you're legitimately in a scenario where you can't ARP your gateway. Start a new thread describing what type of WAN you have, and what 192.168.100.1 is, to not conflict your issue with something unrelated.
wildernessvoice: my first guess would have been something /31-related, as maybe something gets confused about that no longer being a directly-connected network with a /31, since that's not a very common circumstance. But given it happens with a /30 as well, that rules that out. Where an ifconfig down/up fixes something, it's most always a NIC driver issue. I haven't heard of anything along those lines with vtnet, but it'd be worth trying 2.3 to see if that's something that's been fixed in FreeBSD 10.3. What hypervisor are you running in?
-
cmb:
My "unrelated issue" has it's own post. Thanks.
https://forum.pfsense.org/index.php?topic=108616.0wildernessvoice:
Here's a quick and dirty script I made to attempt your temp fix on my problem. Maybe it would be of some use to you. I just had cron run it every five minutes.#!/bin/sh
ping for a few seconds to allow the log to show an error in current minute.
/sbin/ping -c 5 208.67.222.222
Now that it's 5 second after the minute, let's see if our error is in the log. Then count them.
cat /var/log/system.log | grep -a "
date +%d
date +%R
" | grep -c "allocate llinfo for" > /tmp/FailCount
FailCount=cat /tmp/FailCount
echo $FailCount
if [ $FailCount -gt 0 ]; then
echo "Error detected in logs. Attempting Repair."your NIC may vary.
ifconfig em0 down
ifconfig em0 up
else
echo "Log file doesn't show arp errors."
fi -
Thanks AkkerKid, your root cause is certainly different, I replied there.
-
Hi cmb - thanks for your reply. It does appear to be either a NIC driver issue or an problem with the kernel's ARP module. I'm running this as a KVM guest on Debian jessie (Linux 3.16 kernel). I've upgraded to the new pfSense release 2.3 (FreeBSD 10.3), and the issue is still present.
-
I get the same error message (arpresolve llinfo etc…) during an upgrade from 2.2.6 => 2.3 and 2.3.1 install.
The following commands below would restore the LAN connection, but upon reboot, I loose my LAN connectivity again.
ifconfig bge1 down
ifconfig bge1 upWhen connectivity is lost its affects both the LAN and CARP addresses.
Not sure what to look at here, but I imagine it is a driver issue.
When I blew away the 2.3.1 upgrade to factory defaults the LAN address stayed up (Not sure with CARP) so I'm guessing some carry over isn't playing well from the 2.2.6 land.
Thanks!