ARP problems

rubenc

I'm narrowing a problem I stated on another 2 threads.

Internet -> (em0) pfsense (bge1) -> lan

Under normal circumstances:

? (10.10.10.2) at 00:1e:c9:3b:fb:2c on bge1 permanent [ethernet]
? (10.10.10.3) at 00:30:48:64:1f:1e on bge1 permanent [ethernet]
pfsense (10.10.10.1) at 00:18:8b:73:c9:7c on bge1 permanent [ethernet]
? (10.10.10.4) at 00:25:90:08:76:1c on bge1 permanent [ethernet]
etc

After 5-6 days, I loose communication between 10.10.10.1 (pfsense) and the rest of the 10.10.10.0/24 subnet. All the servers in that network are able to correctly communicate among them. But in pfsense, the prior situation turns to:

? (10.10.10.2) at (incomplete) on bge1 expires in 20 seconds [ethernet]
? (10.10.10.3) at (incomplete) on bge1 expires in 20 seconds [ethernet]
pfsense (10.10.10.1) at 00:18:8b:73:c9:7c on bge1 permanent [ethernet]
? (10.10.10.4) at (incomplete) on bge1 expires in 20 seconds [ethernet]

ping 10.10.10.4

PING 10.10.10.4 (10.10.10.4): 56 data bytes
ping: sendto: Host is down

ping 10.10.10.2

PING 10.10.10.2 (10.10.10.2): 56 data bytes
ping: sendto: Host is down

Setting the entries statically (arp -s) doesnt work:

arp -s 10.10.10.2 00:1e:c9:3b:fb:2c

ping 10.10.10.2

PING 10.10.10.2 (10.10.10.2): 56 data bytes
^C
–- 10.10.10.2 ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss

There are absolutely no traces of errors neither in dmesg or in syslog.

The state of the system is attached as an screenshot. On the LAN interface (the problematic one) there are only input errors, which grow and grow. Same situation if I use bge0 instead of bge1 as the LAN interface. The only solution is rebooting pfsense. The rest of services on pfsense continue working normally, but the box gets isolated from the LAN (and, of course, needless to say that the LAN hosts can't reach pfsense).

netstat -m

5126/1534/6660 mbufs in use (current/cache/total)
5123/1539/6662/25600 mbuf clusters in use (current/cache/total/max)
5122/766 mbuf+clusters out of packet secondary zone in use (current/cache)
0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
11527K/3877K/15405K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/8/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Any hint, or idea, or help, would be appreciated...

Regards,

Ruben.

chof5.png_thumb

jimp

Looks like some sort of partial connectivity loss to the LAN side there. It's almost like it's receiving traffic but can't get back to things on LAN (Looks like ARP is failing)

I'd check the cables, the switch/switch port, and the NIC (generally in that order, depending on how handy replacement parts are available).

rubenc

Hi,

It's not the nic or cables since it happened with 2 of them, both bge (I prefer keeping the em for WAN). I found this after searching and searching:

http://lists.freebsd.org/pipermail/freebsd-questions/2007-August/156868.html

I'm waitting (for almost 12 days since I posted this, with the last fail) for it to happen again, to try what is suggested there.

Regards,

Ruben.

jimp

Ah.. yeah there is someone else on here that needed to set promisc on the nic to even get DHCP on bge. Rather strange, not sure if it's a specific chipset problem or a driver issue. Looks like it's been around a while.