DNS stops resolving following WAN IP change



  • Hi there, I have a strange issue whereby I periodically lose DNS capability but with no obvious services down. I believe that I have tracked it down to when my ISP changes my WAN IP, after which I can ping IP addresses successfully but DNS doesn't work. To get around this I can typically stop/start the WAN interface, but occasionally only a full reboot will fix it.

    I've used a slightly modified version of the script in this useful post to 1) restart the interface, or 2) reboot if a bunch of pings fail.

    The issue looks quite similar to this old one, but the solution proposed didn't work for me.

    I'd love to understand why this is happening so that I can prevent it instead. Here are some logs around the last time this happened.

    Jul 24 19:40:45 pfsense-fw dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 24 19:40:47 pfsense-fw dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 24 19:40:51 pfsense-fw php-fpm[63465]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1595590851] unbound[75943:0] error: bind: address already in use [1595590851] unbound[75943:0] fatal error: could not open ports'
    

    I'm not sure where to look for more logs or settings so please let me know if I can provide anything more helpful. Thanks in advance for any help.



  • This helped me in the past, don't know, if it is related to your problem.

    Capture.JPG



  • That first error looks like the one in this thread.
    https://forum.netgate.com/topic/130010/could-not-deliver-signal-hup-to-process-because-its-pidfile-doesn-t-exist

    I can also vouch for the advice of NEVER enabling DHCP registration option in the DNS resolver settings. In my experience it only causes problems with unbound (DNS resolver). If you do have that enabled, try disabling it. Go to Services > DNS Resolver , make sure DHCP registration is unchecked.



  • @Raffi_ said in DNS stops resolving following WAN IP change:

    I can also vouch for the advice of NEVER enabling DHCP registration option in the DNS resolver settings.

    Works perfect here.



  • @Raffi_ Thanks, I do have that checked so I've unchecked and we'll have to wait and see...



  • @Bob-Dig Thanks. What is the highlighted IP in your case and what is it supposed to achieve? Just wondering what I should be putting in there.



  • @DANgerous25 If you have cable, it is a typical cable-modem IP.

    This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
    


  • @Bob-Dig said in DNS stops resolving following WAN IP change:

    @DANgerous25 If you have cable, it is a typical cable-modem IP.

    This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
    

    Thanks for your suggestion, but I don't think I have this. Your example looks to be an internal IP. My pfSense device does connect to some kind of ISP provided device, but I can't access it and it doesn't have an internal IP on my network. My pfSense device gets its WAN IP using DCHP which I assume is external to my network entirely.



  • @Bob-Dig said in DNS stops resolving following WAN IP change:

    @DANgerous25 If you have cable, it is a typical cable-modem IP.

    This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
    

    This sounds like the right solution. You can find out your modem IP by typing that in your browser and seeing it you get the modem GUI to come up. My cable modem has that same exact IP. I did not have to use this option in either of my setups, but it sounds like it could be your issue.



  • @Raffi_ said in DNS stops resolving following WAN IP change:

    @Bob-Dig said in DNS stops resolving following WAN IP change:

    @DANgerous25 If you have cable, it is a typical cable-modem IP.

    This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
    

    This sounds like the right solution. You can find out your modem IP by typing that in your browser and seeing it you get the modem GUI to come up. My cable modem has that same exact IP. I did not have to use this option in either of my setups, but it sounds like it could be your issue.

    By typing what in my browser exactly? If I put http://<my public IP> then it tries to access the pfSense web interface (but fails as I have blocked the port). I don't know what other IPs I could put in there, I can't get to any web interface on the ISP device.



  • @DANgerous25 said in DNS stops resolving following WAN IP change:

    @Raffi_ said in DNS stops resolving following WAN IP change:

    @Bob-Dig said in DNS stops resolving following WAN IP change:

    @DANgerous25 If you have cable, it is a typical cable-modem IP.

    This is useful for rejecting leases from cable modems that offer private IP addresses when they lose upstream sync.
    

    This sounds like the right solution. You can find out your modem IP by typing that in your browser and seeing it you get the modem GUI to come up. My cable modem has that same exact IP. I did not have to use this option in either of my setups, but it sounds like it could be your issue.

    By typing what in my browser exactly? If I put http://<my public IP> then it tries to access the pfSense web interface (but fails as I have blocked the port). I don't know what other IPs I could put in there, I can't get to any web interface on the ISP device.

    Have you tried the address @Bob-Dig had (192.168.100.1) in the browser? That is the most common one so a good place to start. If that doesn't work, look at the sticker on your modem. It might show the IP right on it. If not, you can Google "<modem model number> default IP".



  • Guys I really appreciate your help, but I don't think I can access this device. I think it's just a bridge device for the fibre-optic network. It's a HUAWEI HG8040H. From looking at the Huawei website I can't find anything about a default IP, and other searches I've come up with have suggested other IPs but none of them work.

    Aside from trying that, is there perhaps something else going on here? Any other way for me to be able to handle the WAN IP changing more gracefully?



  • @DANgerous25 said in DNS stops resolving following WAN IP change:

    I can't find anything about a default IP

    If your pfSense is using DHCP on it's WAN iterface, you have the HG8040H IP right here :

    2c8709a2-9277-4474-a76b-fd5286e683b9-image.png

    The info isn't worth much if you don't have access.



  • @Gertjan It's always the simple things... Indeed there is a gateway IP, it looks like an external address but I will give this a try as per the original suggestion. (btw. I can't access it, not even to get a login page, but I don't think that matters)

    Can someone help me to explain what this will do and why this might fix the problem? Presumably my ISP needs to renew my IP from time to time, is this going to stop it from doing that or what is going to happen?



  • Argh this happened again even after putting in the gateway IP as suggested, although this time my WAN IP didn't change, although the "newwanip" script was kicked off. So I presume this was my internet connection dropping; however it still needed my script to bounce the interface in order to restore connectivity.

    Any other ideas what could be preventing DNS from working without me restarting the interface?

    Logs:

    Jul 28 03:46:06 pfsense-fw check_reload_status: rc.newwanip starting re1
    Jul 28 03:46:07 pfsense-fw php-fpm[11957]: /rc.newwanip: rc.newwanip: Info: starting on re1.
    Jul 28 03:46:07 pfsense-fw php-fpm[11957]: /rc.newwanip: rc.newwanip: on (IP address: <my WAN IP>) (interface: WAN[wan]) (real interface: re1).
    Jul 28 03:46:09 pfsense-fw php-fpm[11957]: /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through <my gateway IP>
    Jul 28 03:46:10 pfsense-fw php-fpm[11957]: /rc.newwanip: Default gateway setting Interface WAN_DHCP Gateway as default.
    Jul 28 03:46:10 pfsense-fw php-fpm[11957]: /rc.newwanip: Gateway, none 'available' for inet6, use the first one configured. ''
    Jul 28 03:46:12 pfsense-fw php-fpm[11957]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1595879172] unbound[21304:0] error: bind: address already in use [1595879172] unbound[21304:0] fatal error: could not open ports'
    


  • When you see - the last line

    @DANgerous25 said in DNS stops resolving following WAN IP change:

    unbound[21304:0] error: bind: address already in use ... fatal error: could not open ports

    you know something is wrong. It concerns unbound, so DNS will get impacted = probably not working.

    Normally, when the WAN interface changes, several processes will get restarted. unbound (DNS) is one of them. Unbound is stopped, and started.
    Or, when unbound is started again, the old process , the one hooked up to port 53, still did not released it's resources, it still has 53 port locked. The new process detects this, and bails out.

    Check this out : restart unbound with the GUI.
    Then check on the Status > System Logs > System > DNS Resolver page how many seconds you find between the unbound stop - the moment you restarted the process, and unbound started message. Normally, on a default system, the stop and start will take a second, or less.

    It has been seen that :
    People try to use the forwarder (dnsmasq) and unbound at the same time. That can not work.
    people also installed the package 'bind' which is also a resolver. That can not work.
    As these processes all try to use the same port '53'.
    Very popular is the usage of pfBlockerNG. pfBlockerNG prepares lists (DNSBL) for unbound to be parsed at start-up. If there are hundreds of thousand, or even millions of DNSBL to parse, this will take time. To much time - people mentiooned start up times over a minute. That will cripple the system. Do you use pfBlockerNG ?



  • @Gertjan Thank you for the response. Of all of those services, I am using pfBlockerNG which might be the problem. If it is the problem, it doesn't take more than a few seconds to start (I would guess) because my script simply stops/starts the WAN interface and it works again.

    Can you suggest a way to make this Unbound process start after a few seconds, if that is indeed a suitable solution?



  • @DANgerous25 said in DNS stops resolving following WAN IP change:

    Can you suggest a way to make this Unbound process start after a few seconds,

    Measure the time needed to stop and start (== restart) unbound without pfBlockerNG
    And with pfBlockerNG activated.

    Also : de activate pfBlockerNG a couple of days, and see if the issue goes away.

    What is the total of the "Count' colon :

    d73d1037-6c0f-49b9-8651-48082aa22a86-image.png

    A couple of thousand IP's and DNSBL : the restart time doesn't really change.
    Close to a million : it will delay the restart, which might cause issues.
    It boils down to : big lists (= DNSBL feeds) need big processors and a boatload of RAM.



  • @Gertjan the unbound process restart takes <1s when pfBlockerNG is running. Hence don't think that's the issue.

    Jul 28 16:09:39	unbound	71047:0	info: start of service (unbound 1.10.1).
    Jul 28 16:09:39	unbound	71047:0	notice: init module 0: iterator
    Jul 28 16:09:38	unbound	71047:0	notice: Restart of unbound 1.10.1.
    

    Here's my pfBlockerNG widget. I don't think it's remarkable based on what you said in your post.

    3510624d-f377-4769-a58b-7c71bb636d4b-image.png

    Just looking more at the event this morning when the interface needed to be restarted, here are some extra log lines from the system log:

    Jul 28 03:46:07 pfsense-fw dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 28 03:46:08 pfsense-fw dhcpleases: Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    Jul 28 03:46:12 pfsense-fw php-fpm[11957]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1595879172] unbound[21304:0] error: bind: address already in use [1595879172] unbound[21304:0] fatal error: could not open ports'
    

    The line above related to "dhcp leases" looks like it is perhaps trying to HUP unbound but can't. A smoking gun there perhaps?



  • @DANgerous25 said in DNS stops resolving following WAN IP change:

    "dhcp leases"

    That's the other one that tries to kill the hell out of unbound - see the many (no, more !) post about that subject.

    To make a long story very short :

    Remove the check from here :

    DNS resolver => DHCP Registration [ _] Register DHCP leases in the DNS Resolver

    What happens is that for every new lease, dhcpleases process restart unbound ....
    If some stupid device is asking a lease every second, unbound will get hail-stormed.

    You can check the DHCP log, see what device is asking a lot (often) of leases.



  • To make a long story very short :

    Remove the check from here :

    DNS resolver => DHCP Registration [ _] Register DHCP leases in the DNS Resolver

    What happens is that for every new lease, dhcpleases process restart unbound ....
    If some stupid device is asking a lease every second, unbound will get hail-stormed.

    You can check the DHCP log, see what device is asking a lot (often) of leases.

    1. I've disabled the DHCP registration.
    2. I don't have a long DHCP log, but interestingly there is one device (which happens to be a Dyson fan) which almost exclusively fills up the log with DHCP requests... so you could be right @Gertjan . I've changed its lease to a static lease for now as well. I'm not sure what it is doing so will investigate separately.

    Let's give it a few more days and see what happens again. Thanks a lot for your continued help!



  • Hi @Gertjan , it's been a couple of days and in my logs I can see that "rc.newwanip" has been kicked off a couple of times, but I haven't had the "bind: address already in use" error and hence my automatic restart script has not needed to kick in. This looks likely due to the change of my Dyson fan to a static lease.

    Thanks a lot for the suggestion for checking the DHCP log, I believe that has got to the bottom of this issue.

    I'm considering this one solved! Thanks everyone for your help and suggestions.


Log in to reply