Unstable WAN link, pfsense not recovering?



  • Lately it's been discovered that my wan link (twc cable) is becoming unstable.  What's occurring is that the link drops for a few minutes, then resumes.  Unfortunately, Pfsense does NOT recover correctly without manual intervention.

    All that needs to be done to fix this is have DHCP renewed.  Clicking release/renew on the wan interface resolves connectivity and I've tested this with a laptop by simply restarting dhcpcd on my linux machine.

    Is there some way to make pfsense actually try this?  It seems to just leave it as down, keep attempting to ping the gateway that's down, and never come up on its own.

    I'd like to have pfsense properly retry to restart dhcp continuously if the gateway is down.

    Thanks in advance,
    Bez



  • Just a shot in the dark, have you tried enabling "State Killing on Gateway Failure"? (Uncheck the box at System: Advanced: Miscellaneous: Gateway Monitoring: State Killing on Gateway Failure)

    At the very least this would remove the stale states to the gateway ip address. (Does it get assigned a new WAN IP each time?)

    (To resolve a similar issue, Wayhome changed his timeout and retry values: https://forum.pfsense.org/index.php?topic=11840.0)



  • Yep. I've tried that option. :(

    It does not grab a new ip.  From experimentation, it appears that the cable modem itself has something in it that limits connectivity when there's a connectivity loss unless a dhcp request occurs for it.  (Ie, unplug cable modem coax, plug back in a few minutes later, gateway still down.  Even if IP is still correct and gateway is still correct.  refresh dhclient (or in my laptops case the alternative) and it grabs the same ip/gateway, yet connectivity is restored.

    It's very strange, but at least fairly repeatable.  Seems to be some kind of intentional lockout on my modem I'd wager. (Link state does not need to change.  Just a refresh of a dhcp request.)

    Apinger just keeps trying to hit the gateway, but doesn't recycle dhclient or anything.

    Edit: Just took a look at the issue Wayhome had.  He's seeing dhclient time out. My dhclient seems to think everything is still ok and peachy and I have no dhclient entries until I came home and manually involved.  :(



  • I suspect that adjusting the DHCP timeouts will help in your situation too. You can try this test; watch the log (Status: System logs: DHCP)  and then unplug the cable modem. If pfSense starts sending dhclient DHCPREQUEST (or possibly DHCPDISCOVER) you should see it in the log. If that happens, then all we need to do is have pfSense keep sending DHCPREQUEST until the outage passes. We can do that by adjusting the dhcp timers as Wayhome did. (If dhclient does not notice the dhcp server is gone, then we'll probably have to manually  modify apinger to expire the dhclient lease.)

    Here are the default settings in /etc/inc/interfaces.inc:
    timeout 60;
    retry 15;
    select-timeout 0;
    initial-interval 1;

    This means that pfSense will try to reach the dhcp server 15 seconds after it "notices" that the server is not available. It will give up trying after 60 seconds and just use the most recent IP address indefinitely. If you set timeout to a value large enough to survive the outage, it should still be trying to contact the DHCP server when the outage passes. I would try 600 seconds. Remember to backup /etc/inc/interfaces.inc before you edit it.

    Ref. http://www.freebsd.org/cgi/man.cgi?query=dhclient.conf&apropos=0&sektion=5&manpath=FreeBSD+8.3-RELEASE&arch=default&format=html



  • @bezerker:

    My dhclient seems to think everything is still ok and peachy and I have no dhclient entries until I came home and manually involved.  :(

    Ok, my bad. I didn't process that part about no dhclient log entries. (Remember to look in Status: System logs: DHCP and not Status: System logs: General)

    As I see it there are three options from here:
    1. Do more troubleshooting to see if the modem (or ISP) can be reconfigured to resume service without the need for DHCP requests.
    2. Modify the apinger subsystem to dump the WAN DHCP lease when the alarm triggers.
    3. Create a crontab to periodically renew the WAN lease every ## minutes.



  • I seem to have a related issue. However I am a beginner user and might simply miss something trival or missjudge my case.

    Once in a while my ISP shuts my connection down (they do this automatic here (Taiwan) upon late payment - and my wife frequently "forgets" do transfer the money.

    In this case it seems like they switch off the port on their side - the DSL modem can simply not sync the DSL line.

    However pfsense behaves very strangely. Apinger service basically shutsdown and cannot be restartet and also the wan port is disabled.

    If i manually enable the wan port it has no effect (stays disabled) and apinger basically never recovers and cannot be restarted (apply changes -> reloads and no change).

    If i re-create the wan interface it worked last time, meaning i deleted the interface and created it again.

    As i have the opportunity to face the same issue again today I plan to try a re-boot first.



  • One thing that helped me was to reject DHCP leases from my cable modem.  When the link to the ISP goes down, the cable modem would try to step in and help by offering a 192.168.100.x IP address (so the user could at least connect to the modem internal web page and get diagnostic info).  But I believe this confused pfSense, as now the WAN link was up, and the when the original IP address returned, pfSense would not always notice the change in time.

    Go to Interfaces, WAN, and tick 'Reject Leases From'.  Enter your cable modem IP

    You can also add an OPT interface, ie static IP on your modem subnet, so you can always get to your modem diagnostic page.  Maybe see why your line is dropping.



  • @Skar78:

    I seem to have a related issue. However I am a beginner user and might simply miss something trival or missjudge my case.

    Once in a while my ISP shuts my connection down (they do this automatic here (Taiwan) upon late payment - and my wife frequently "forgets" do transfer the money.

    In this case it seems like they switch off the port on their side - the DSL modem can simply not sync the DSL line.

    However pfsense behaves very strangely. Apinger service basically shutsdown and cannot be restartet and also the wan port is disabled.

    If i manually enable the wan port it has no effect (stays disabled) and apinger basically never recovers and cannot be restarted (apply changes -> reloads and no change).

    If i re-create the wan interface it worked last time, meaning i deleted the interface and created it again.

    As i have the opportunity to face the same issue again today I plan to try a re-boot first.

    Ok i tested this again. Reboot and disable/enable did not work.

    What I noticed is that i need to assign the interface from its pppoe1(cuau0) to the vr1 again and re-enter login/password - only than it would work. Why i have to do that i dont know.

    Everytime this happens it looks like pfsense would increase the index of the pppoe interface and add one bound to cuau0, why pppoe would be assigned to the serial port, no clue.

    However this issue is different from OP, so sorry my bad.


Log in to reply