ARP problems



  • I'm narrowing a problem I stated on another 2 threads.

    Internet -> (em0) pfsense (bge1) -> lan

    Under normal circumstances:

    ? (10.10.10.2) at 00:1e:c9:3b:fb:2c on bge1 permanent [ethernet]
    ? (10.10.10.3) at 00:30:48:64:1f:1e on bge1 permanent [ethernet]
    pfsense (10.10.10.1) at 00:18:8b:73:c9:7c on bge1 permanent [ethernet]
    ? (10.10.10.4) at 00:25:90:08:76:1c on bge1 permanent [ethernet]
    etc

    After 5-6 days, I loose communication between 10.10.10.1 (pfsense) and the rest of the 10.10.10.0/24 subnet. All the servers in that network are able to correctly communicate among them. But in pfsense, the prior situation turns to:

    ? (10.10.10.2) at (incomplete) on bge1 expires in 20 seconds [ethernet]
    ? (10.10.10.3) at (incomplete) on bge1 expires in 20 seconds [ethernet]
    pfsense (10.10.10.1) at 00:18:8b:73:c9:7c on bge1 permanent [ethernet]
    ? (10.10.10.4) at (incomplete) on bge1 expires in 20 seconds [ethernet]

    ping 10.10.10.4

    PING 10.10.10.4 (10.10.10.4): 56 data bytes
    ping: sendto: Host is down

    ping 10.10.10.2

    PING 10.10.10.2 (10.10.10.2): 56 data bytes
    ping: sendto: Host is down

    Setting the entries statically (arp -s) doesnt work:

    arp -s 10.10.10.2 00:1e:c9:3b:fb:2c

    ping 10.10.10.2

    PING 10.10.10.2 (10.10.10.2): 56 data bytes
    ^C
    –- 10.10.10.2 ping statistics ---
    5 packets transmitted, 0 packets received, 100.0% packet loss

    There are absolutely no traces of errors neither in dmesg or in syslog.

    The state of the system is attached as an screenshot. On the LAN interface (the problematic one) there are only input errors, which grow and grow. Same situation if I use bge0 instead of bge1 as the LAN interface. The only solution is rebooting pfsense. The rest of services on pfsense continue working normally, but the box gets isolated from the LAN (and, of course, needless to say that the LAN hosts can't reach pfsense).

    netstat -m

    5126/1534/6660 mbufs in use (current/cache/total)
    5123/1539/6662/25600 mbuf clusters in use (current/cache/total/max)
    5122/766 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/104/104/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
    11527K/3877K/15405K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0/8/6656 sfbufs in use (current/peak/max)
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    0 requests for I/O initiated by sendfile
    0 calls to protocol drain routines

    Any hint, or idea, or help, would be appreciated...

    Regards,

    Ruben.



  • Rebel Alliance Developer Netgate

    Looks like some sort of partial connectivity loss to the LAN side there. It's almost like it's receiving traffic but can't get back to things on LAN (Looks like ARP is failing)

    I'd check the cables, the switch/switch port, and the NIC (generally in that order, depending on how handy replacement parts are available).



  • Hi,

    It's not the nic or cables since it happened with 2 of them, both bge (I prefer keeping the em for WAN). I found this after searching and searching:

    http://lists.freebsd.org/pipermail/freebsd-questions/2007-August/156868.html

    I'm waitting (for almost 12 days since I posted this, with the last fail) for it to happen again, to try what is suggested there.

    Regards,

    Ruben.


  • Rebel Alliance Developer Netgate

    Ah.. yeah there is someone else on here that needed to set promisc on the nic to even get DHCP on bge. Rather strange, not sure if it's a specific chipset problem or a driver issue. Looks like it's been around a while.


Locked