WAN Gateway - Many Intermittent Outages logged - ARP Related?



  • I have been escalating a problem with my Cable ISP (Rogers in Toronto, Canada) for 3 weeks now and they have been on site and looked at signal to noise ratio and the entire neighbourhood. Still no resolution. But it is incumbent on me to make sure my pfSense hardware and my Gigabit switches are 'clean' before I go to their executive office.

    So I have a log to post here because it looks similar to some threads I have seen on this forum regarding an error message that I can't understand the meaning of.

    Hope you folks can help.

    I am running 2.4.4 right now out of desperation but the same errors apply to the current final release 2.4.3 or 2.4.3-1.

    I have replaced the SSD and re-installed from the ISO and restored my XML backup. That went very smoothly BTW !

    My ISP does not offer static IP's and I have the Hitron Modem/Gateway in Bridged mode.

    My WAN is 'em0' in this case.

    I see the following for example in system.log:

    Aug 30 03:54:24 pfsense kernel: em0.1: link state changed to DOWN
    Aug 30 03:54:24 pfsense check_reload_status: Linkup starting em0
    Aug 30 03:54:24 pfsense check_reload_status: Linkup starting em0.1
    Aug 30 03:54:25 pfsense php-fpm[17113]: /rc.linkup: DEVD Ethernet detached event for wan
    Aug 30 03:54:25 pfsense check_reload_status: Reloading filter
    Aug 30 03:54:26 pfsense check_reload_status: Reloading filter
    Aug 30 03:54:28 pfsense kernel: arpresolve: can't allocate llinfo for 99.redacted.136.1 on em0
    Aug 30 03:54:33 pfsense check_reload_status: Linkup starting em0
    Aug 30 03:54:33 pfsense kernel: em0: link state changed to UP
    Aug 30 03:54:33 pfsense kernel: em0.1: link state changed to UP
    Aug 30 03:54:33 pfsense check_reload_status: Linkup starting em0.1
    Aug 30 03:54:33 pfsense kernel: arpresolve: can't allocate llinfo for 99.redacted.136.1 on em0
    Aug 30 03:54:34 pfsense check_reload_status: Reloading filter
    Aug 30 03:54:34 pfsense php-fpm[63690]: /rc.linkup: DEVD Ethernet attached event for wan
    Aug 30 03:54:34 pfsense php-fpm[63690]: /rc.linkup: HOTPLUG: Configuring interface wan
    Aug 30 03:54:34 pfsense check_reload_status: rc.newwanip starting em0
    Aug 30 03:54:34 pfsense check_reload_status: Restarting ipsec tunnels
    Aug 30 03:54:35 pfsense php-fpm[38632]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Aug 30 03:54:35 pfsense php-fpm[38632]: /rc.newwanip: rc.newwanip: on (IP address: 99.redacted.redacted.181) (interface: WAN[wan]) (real interface: em0).
    Aug 30 03:54:35 pfsense dhcpleases: /etc/hosts changed size from original!
    Aug 30 03:54:35 pfsense dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Aug 30 03:54:35 pfsense php-fpm[38632]: /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through 99.redacted.136.1
    Aug 30 03:54:36 pfsense dhcpleases: /etc/hosts changed size from original!
    Aug 30 03:54:36 pfsense dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Aug 30 03:54:38 pfsense php-fpm[38632]: /rc.newwanip: dpinger: timeout while retrieving status for gateway WAN_DHCP
    Aug 30 03:54:39 pfsense dhcpleases: /etc/hosts changed size from original!
    Aug 30 03:54:39 pfsense dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Aug 30 03:54:39 pfsense php-fpm[63690]: /rc.linkup: dpinger: timeout while retrieving status for gateway WAN_DHCP
    Aug 30 03:54:40 pfsense dhcpleases: kqueue error: unkown
    Aug 30 03:54:40 pfsense php-fpm[38632]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1535615680] unbound[4895:0] error: bind: address already in use [1535615680] unbound[4895:0] fatal error: could not open ports' 
    Aug 30 03:54:41 pfsense check_reload_status: updating dyndns wan
    Aug 30 03:54:41 pfsense check_reload_status: Reloading filter
    Aug 30 03:54:42 pfsense php-fpm[38632]: /rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Aug 30 03:54:42 pfsense php-fpm[38632]: /rc.newwanip: Creating rrd update script
    Aug 30 03:54:44 pfsense php-fpm[38632]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 99.redacted.redacted.181 ->  99.redacted.redacted.181 - Restarting packages.
    Aug 30 03:54:44 pfsense check_reload_status: Starting packages
    Aug 30 03:54:45 pfsense php-fpm[58044]: /rc.start_packages: Restarting/Starting all packages.
    
    

    What interests me is the arp resolve error on the gateway address ending in "136.1"
    My thoughts are that the llinfo message will shed some light on the problem.
    Most of the time the link comes back up but sometimes I have to reboot the pfSense machine.

    I don't believe this is change related as I have not previously had downtime over the past year except for some ISP outages and did nothing 3 weeks ago to provide any stimuli that would induce this problem.

    I can supply other log snippets such as dmesg stuff if needed.



  • The link state changing would seem to indicate a hardware problem. If you lose the connection, everything depending on it , including arp & dhcp, fails. This could be a modem, Ethernet cable or other issue.

    What happens if you connect just a stand alone computer? If it fails too, then the problem has nothing to do with pfSense. If it fails too, and you've tried a different Ethernet cable, then give Rogers a call.

    BTW, I'm also on Rogers with the Hitron in bridge mode.

    I also had a problem a few years ago, where I'd temporarily lose my Internet connection and home phone. It took a while to get through the lower level support to someone who could work on the problem. The first "tech" who showed up insisted the problem was with the coax within my condo, that was installed nicely behind the drywall etc., by Rogers. He couldn't explain why the coax from the utility room, which was older wasn't the problem. I then did my own testing. I have two coax cables coming up from the utility room and moved the cable modem to the other one, with the phone on the original. I'd leave a radio station streaming over the Internet playing and when it failed, I picked up the phone and noticed it also failed. This proved the problem was somewhere outside my unit. I also wrote a shell script to ping the Rogers router every minute, so I'd have a record of when it failed. Rogers then sent another tech who worked back from the splitter in the utility room and eventually found a defective cable out next to the street. Once that was repaired, I again had solid service.

    Bottom line, don't let the first people you speak to brush you off. Do some testing on your own, if needed, to help guide them. If I had listened to that first tech, I would have had black coax cable stapled along the baseboards, around door frames, etc., instead of using the nicely installed cable and that wouldn't have done a thing about the problem.



  • Thanks for the info JKnott. I changed out the motherboard and did a fresh install from the latest stable ISO and enabled DHCP6 and DHCP. So 3 changes all at once which is not a great way to find a root cause but no errors so far.



  • I have walked back some changes but I thought I would now change the ENTIRE hardware platform.

    New...

    An ASUS Prime A320M-K Motherboard. Together with the 2200 CPU is cheap and it comes with a whisper quit fan and heatsink glue already applied so you just need to be careful screwing the fan to the motherboard - and of course take care inserting the CPU. Not to mention it has VGA which is perfect for a router appliance. Also has serial port, but I don't expect to need it.

    So this motherboard has a Realtek ethernet port onboard I also use an Intel 1000 card in a PCI slot for the WAN. BSD/pfsense has booted and installed correctly to my observations.

    But why do I still get this message... ?

    code
    ```Sep  6 19:38:24 pfsense kernel: arpresolve: can't allocate llinfo for 99.2xx.xxx.1 on em0