WAN looses connection randomly with 24-36 hours - tried everything



  • Hi all,

    My pfSense box seems to loose the WAN connection at random times with 24-36 hour intervals, continuously. It looks like it looses the wan address.
    This issue just started a few weeks ago, otherwise I've been running pfSense for 5+ years without any issues.

    Can anyone help me solve this weird issue, I'm at my wits end :(

    Thanks
    Jim

    I have seen multiple similar forum threads and tried various things:

    DHCP setting changes on WAN to FreeBSD default and other custom timings
    Manual setting the speed on WAN NIC
    Swapping ethernet cables to new ones
    Removed all custom tweaks
    Reinstalled pfSense (applied previous config xml)
    Changed gateway monitor IP to 8.8.8.8
    Swapped LAN ports on the modem
    Scouring logs till I'm blind

    Timeline and useful information:
    July 6th 10:27: Internet goes down again Jul 6 10:27:08 dpinger WAN_DHCP 8.8.8.8: Alarm latency 19669us stddev 3716us loss 21%
    July 6th 10:35: I restart the modem Jul 6 10:35:00 pfSense dhclient[1007]: igb0 link state up -> down
    This seems to be my DHCP server at the ISP: DHCPREQUEST on igb0 to 80.62.121.174 port 67
    I seem to be recieving DHCP OFFER, but when internet is down 0.0.0.0 is shown cat /var/log/dhcpd.log | grep 80.62.121.174 Jul 6 08:50:54 pfSense dhclient[1007]: DHCPREQUEST on igb0 to 80.62.121.174 port 67 Jul 6 08:50:54 pfSense dhclient[1007]: DHCPACK from 80.62.121.174
    I use MANUAL outbound NAT, as I have selective routing for my VPN (worked fine for multiple years)
    I have a STATIC ip assigned via DHCP reservation at my ISP
    NO static routes setup
    NO traffic shaper

    How to resolve:
    Reboot my cable modem solves the problem
    Reboot pfsense box solves the problem

    Hardware:
    Qotom 6x1Gbit Intel i211 NIC / Intel i7-7500U
    Cable modem Sagemcom Fast 3890v3 in BRIDGE MODE
    1000/100Mbit connection

    Log files:

    dhcp logs:
    https://pastebin.com/qPKSyvRv

    Gateway log:
    https://pastebin.com/NEK3WbSS

    General log:
    https://pastebin.com/fAyZnmuY

    /var/log/dhclient.leases.igb0: MOST RECENT version

    lease {
      interface "igb0";
      fixed-address 2.106.185.197;
      option subnet-mask 255.255.255.192;
      option routers 2.106.185.193;
      option domain-name-servers 193.162.153.164,194.239.134.83;
      option host-name "x1-6-40-62-31-0b-a7-d9";
      option domain-name "webspeed.dk";
      option interface-mtu 576;
      option broadcast-address 255.255.255.255;
      option dhcp-lease-time 259200;
      option dhcp-message-type 5;
      option dhcp-server-identifier 80.62.121.174;
      renew 2 2020/7/7 18:50:54;
      rebind 3 2020/7/8 21:50:54;
      expire 4 2020/7/9 06:50:54;
    }
    lease {
      interface "igb0";
      fixed-address 192.168.100.10;
      next-server 192.168.100.1;
      option subnet-mask 255.255.255.0;
      option routers 192.168.100.1;
      option dhcp-lease-time 30;
      option dhcp-message-type 5;
      option dhcp-server-identifier 192.168.100.1;
      renew 1 2020/7/6 08:36:21;
      rebind 1 2020/7/6 08:36:40;
      expire 1 2020/7/6 08:36:51;
    }
    lease {
      interface "igb0";
      fixed-address 192.168.100.10;
      next-server 192.168.100.1;
      option subnet-mask 255.255.255.0;
      option routers 192.168.100.1;
      option dhcp-lease-time 30;
      option dhcp-message-type 5;
      option dhcp-server-identifier 192.168.100.1;
      renew 1 2020/7/6 08:36:28;
      rebind 1 2020/7/6 08:36:47;
      expire 1 2020/7/6 08:36:58;
    }
    lease {
      interface "igb0";
      fixed-address 192.168.100.10;
      next-server 192.168.100.1;
      option subnet-mask 255.255.255.0;
      option routers 192.168.100.1;
      option dhcp-lease-time 30;
      option dhcp-message-type 5;
      option dhcp-server-identifier 192.168.100.1;
      renew 1 2020/7/6 08:36:59;
      rebind 1 2020/7/6 08:37:18;
      expire 1 2020/7/6 08:37:29;
    }
    lease {
      interface "igb0";
      fixed-address 2.106.185.197;
      option subnet-mask 255.255.255.192;
      option routers 2.106.185.193;
      option domain-name-servers 193.162.153.164,194.239.134.83;
      option host-name "x1-6-40-62-31-0b-a7-d9";
      option domain-name "webspeed.dk";
      option interface-mtu 576;
      option broadcast-address 255.255.255.255;
      option dhcp-lease-time 252802;
      option dhcp-message-type 5;
      option dhcp-server-identifier 80.62.121.174;
      renew 2 2020/7/7 19:44:13;
      rebind 3 2020/7/8 22:04:12;
      expire 4 2020/7/9 06:50:54;
    }
    

    /var/log/dhclient.leases.igb0: PREVIOUS version

    lease {
      interface "igb0";
      fixed-address 2.106.185.197;
      option subnet-mask 255.255.255.192;
      option routers 2.106.185.193;
      option domain-name-servers 193.162.153.164,194.239.134.83;
      option host-name "x1-6-40-62-31-0b-a7-d9";
      option domain-name "webspeed.dk";
      option interface-mtu 576;
      option broadcast-address 255.255.255.255;
      option dhcp-lease-time 257956;
      option dhcp-message-type 5;
      option dhcp-server-identifier 80.62.121.174;
      renew 1 2020/7/6 05:03:03;
      rebind 2 2020/7/7 07:55:13;
      expire 2 2020/7/7 16:52:41;
    }
    lease {
      interface "igb0";
      fixed-address 2.106.185.197;
      option subnet-mask 255.255.255.192;
      option routers 2.106.185.193;
      option domain-name-servers 193.162.153.164,194.239.134.83;
      option host-name "x1-6-40-62-31-0b-a7-d9";
      option domain-name "webspeed.dk";
      option interface-mtu 576;
      option broadcast-address 255.255.255.255;
      option dhcp-lease-time 245012;
      option dhcp-message-type 5;
      option dhcp-server-identifier 80.62.121.174;
      renew 1 2020/7/6 06:50:54;
      rebind 2 2020/7/7 08:22:10;
      expire 2 2020/7/7 16:52:40;
    }
    

    Screenshots:
    advanced_networking.png
    dchp_wan_advanced.png
    dhcp_wan.png
    gateway_details.png
    gateways.png
    monitoring_quality.png
    outbound_NAT.png
    routes.png
    packages.png
    service_watchdog.png
    system_tuneables.png
    wan_interfaces.png


  • Netgate Administrator

    There are a bunch of leases from the modem shown: 192.168.100.10.
    You should set the dhcp client to reject leases from 192.168.100.1.
    That usually only happens when the modem loses sync with the cable so it implies the modem is losing sync. Possibly some upstream issue.

    I would use hybrid outbound NAT mode there but it shouldn't be affecting this.

    Steve



  • Thanks for your suggestion. I will try to set that, however I find it weird that this just started happening. I have not changed anything for a long time.


  • Netgate Administrator

    Yup. And that can be some upstream problem. Cable degraded causing the modem to lose sync for example.



  • Called my ISP, they're looking into the issue as well. Will report back, when if something happens again.



  • Now there's new stuff in the logs:

    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	kernel		arpresolve: can't allocate llinfo for 2.106.185.193 on igb0
    Jul 6 15:12:34	check_reload_status		rc.newwanip starting igb0
    Jul 6 15:12:34	check_reload_status		Restarting ipsec tunnels
    Jul 6 15:12:35	php-fpm	78624	/rc.newwanip: rc.newwanip: Info: starting on igb0.
    Jul 6 15:12:35	php-fpm	78624	/rc.newwanip: rc.newwanip: on (IP address: 2.106.185.197) (interface: WAN[wan]) (real interface: igb0).
    Jul 6 15:12:35	dhcpleases		/etc/hosts changed size from original!
    Jul 6 15:12:35	dhcpleases		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 6 15:12:36	dhcpleases		/etc/hosts changed size from original!
    Jul 6 15:12:36	php-fpm	78624	/rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through 2.106.185.193
    Jul 6 15:12:36	dhcpleases		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 6 15:12:37	dhcpleases		kqueue error: unknown
    Jul 6 15:12:38	check_reload_status		updating dyndns wan
    Jul 6 15:12:39	php-fpm	27344	/rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
    Jul 6 15:12:40	dhcpleases		/etc/hosts changed size from original!
    Jul 6 15:12:40	php-fpm	67720	/interfaces.php: Removing static route for monitor 8.8.8.8 and adding a new route through 2.106.185.193
    Jul 6 15:12:40	check_reload_status		Reloading filter
    Jul 6 15:12:40	php-fpm	67720	/interfaces.php: Creating rrd update script
    Jul 6 15:12:40	snmpd	84244	disk_OS_get_disks: adding device 'ada0' to device list
    Jul 6 15:12:41	php-fpm	78624	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1594041161] unbound[22452:0] error: bind: address already in use [1594041161] unbound[22452:0] fatal error: could not open ports'
    


  • @jim82 said in WAN looses connection randomly with 24-36 hours - tried everything:

    Called my ISP, they're looking into the issue as well. Will report back, when if something happens again.

    @stephenw10 is 100% spot on. Likely upstream issue with your provider.

    When you have a cable modem, your pfSense WAN is configured to use DHCP to obtain its IP address. The cable modem is the provider of that DHCP address. The cable modem operates in two modes. The normal mode, when it is properly synchronized with the CMTS (cable modem termination system) equipment of your cable ISP, is for the cable modem to pass the IP it gets from the CMTS on to pfSense using DHCP. However, when an issue occurs out in the cable TV coax plant such that your modem loses sync with the CMTS, then the cable modem goes into "private DHCP mode" and issues attached devices a local non-routable IP address (typically in the 192.168.100.0/24 subnet).

    So long as pfSense sees an IP address on its WAN, it thinks things are fine and tries to use it. But that 192.168.100.x IP is not good for the internet, so nothing works properly. Ideally, when the modem loses sync and goes into that private DHCP mode, it should tell pfSense when it gets back in touch with the CMTS and obtains a proper IP address and let pfSense restart the WAN interface with the new address. Unfortunately that can sometimes not happen properly and pfSense winds up stuck with the non-routable private address.

    Telling pfSense to ignore DHCP offers from the cable modem's private DHCP pool (that 192.168.100.0/24 pool) can prevent pfSense from getting "stuck" on the private non-routable IP. That still won't stop your upstream issue that is the root cause, but it will prevent pfSense from getting stuck on one of those non-routable IP addresses whenever loss of sync happens.



  • Du kører med en Yousee forbindelse.

    Sagem modem skal i bridge mode og være helt transparent. Det klarer yousee support.

    Dernæst konfigurerer du DHCP på pfsense og lader den stå der.

    Så kører det.



  • @bmeeks Thanks a lot for the indepth explanation. That sure makes sense. Hoping my ISP is able to see why it's loosing sync.



  • Current status:

    Talked to a technician from my ISP today. He reports that Sagem apparently have issues with intermittent "freeze ups" when OFDN channels are activated on upstream for the Docsis 3.1 standard.

    For now, he has deactivated those channels on my connection, which apparently removes the "freeze" problem. Fingers crossed, that it will also remove my sync problem.

    New firmware is being worked on.


  • Netgate Administrator

    Huh, that sounds fun. But that's a good response from an ISP, most would never admit any fault exists.

    Steve



  • So far, so good. It's been 48+ hours since the last dropout. Looks like the removal of OFDM** upstream channels, solved the issue. Now I'm waiting for the new firmware to arrive.

    Not a single error in the logs since. Thanks for your help so far.

    683a391b-e251-42a2-95f6-bbc1e65bbb98-image.png



  • 14 days after and my connection is rock solid. Thanks for your help.



  • @jim82 said in WAN looses connection randomly with 24-36 hours - tried everything:

    14 days after and my connection is rock solid. Thanks for your help.

    👍


Log in to reply