Default gateway incorrect



  • This is a follow up, with more information about this problem I reported.

    I've been running pfSense, as my firewall since around March.  That issue, in July was the 1st time it happened.  Since then, it's happened 3 more times.  August 5th, August 18th, and last time on August 20th.  On that last occasion, my system had been up for less than 2 hours when it "broke".

    Basically, I lose all connection to the internet, from my internal network, all all DNS lookup attempts are blocked by the firewall.

    What it looks like happens, is that something happens on the WAN interface, as each time I've seen this, it appears to start with:

    Aug 18 21:29:33 roadblock kernel: re0: link state changed to DOWN
    Aug 18 21:29:35 roadblock kernel: re0: link state changed to UP
    

    At that point, and I don't entirely understand why, when it tries to do a DHCP renewal, it get's an IP from my cable modem, 192.168.100.10.

    Aug 18 21:29:53 roadblock dhclient[29217]: bound to 192.168.100.10 -- renewal in
     30 seconds.
    

    I'm assuming, at that point, is where the "damage" is done, based on what I see later.  Unfortunately, I've never been able to catch things, while it still has the 192.168.100.10 address assigned.

    A couple of minutes later, I see the dhclient binding to the correct, cable ISP, address, and everything appears to be correct.

    Aug 18 21:31:18 roadblock dhclient[29283]: bound to 98.148.127.153 -- renewal in
     43199 seconds.
    Aug 18 21:31:18 roadblock dhclient[29218]: connection closed
    Aug 18 21:31:18 roadblock dhclient[29218]: connection closed
    Aug 18 21:31:18 roadblock dhclient[29218]: exiting.
    Aug 18 21:31:18 roadblock dhclient[29218]: exiting.
    Aug 18 21:31:32 roadblock dhclient[29952]: bound to 98.148.127.153 -- renewal in 43192 seconds.
    
    

    But, unfortunately it isn't.  The one piece that I think was changed, when the 192.168.100.10 address was given out, that doesn't get reset, when the correct address is recovered, is the "default gateway":

    [root@roadblock.bogolinux.net]/root(2): netstat -r
    Routing tables
    
    Internet:
    Destination        Gateway            Flags    Refs      Use  Netif Expire
    default            192.168.100.1      UGS         0   790210    vr0
    98.148.120.0/21    link#2             UC          0        0    re0
    98.148.120.1       00:01:5c:31:76:01  UHLW        1        0    re0   1222
    98.148.127.153     localhost          UGHS        0        1    lo0
    localhost          localhost          UH          2        0    lo0
    192.168.0.0        link#1             UC          0        0    vr0
    ...
    
    

    Notice it's still pointing to the cable modem, as it's IP, instead of my ISP's IP, and also it's assigned my LAN interface, not my WAN.

    Here's the "correct" version:

    [root@roadblock.bogolinux.net]/root(2): netstat -r
    Routing tables
    
    Internet:
    Destination        Gateway            Flags    Refs      Use  Netif Expire
    default            cpe-98-148-120-1.s UGS         0    21976    re0
    98.148.120.0/21    link#2             UC          0        0    re0
    cpe-98-148-120-1.s 00:01:5c:31:76:01  UHLW        2     1070    re0   1200
    cpe-98-148-127-153 localhost          UGHS        0        0    lo0
    localhost          localhost          UH          1        0    lo0
    192.168.0.0        link#1             UC          0        0    vr0
    ...
    
    

    I do have an alias, for 192.168.100.1, for my cable modem, in the LAN configuration, to stop my syslog being flooded with messages, during normal operation, as per a previous post I found in these forums:

    [root@roadblock.bogolinux.net]/root(1): ifconfig
    vr0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=2808 <vlan_mtu,wol_ucast,wol_magic>ether 00:16:35:06:de:b9
            inet 192.168.0.55 netmask 0xffffff00 broadcast 192.168.0.255
            inet6 fe80::216:35ff:fe06:deb9%vr0 prefixlen 64 scopeid 0x1
            inet 192.168.100.1 netmask 0xffffff00 broadcast 192.168.100.255
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    re0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=389b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_ucast,wol_mcast,wol_magic>ether 00:c0:49:fa:42:d1
            inet6 fe80::2c0:49ff:fefa:42d1%re0 prefixlen 64 scopeid 0x2
            inet 98.148.127.153 netmask 0xfffff800 broadcast 255.255.255.255
            media: Ethernet autoselect (100baseTX <full-duplex>)
            status: active
    enc0: flags=0<> metric 0 mtu 1536
    lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384
            inet 127.0.0.1 netmask 0xff000000
            inet6 ::1 prefixlen 128
            inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
    pfsync0: flags=41 <up,running>metric 0 mtu 1460
            pfsync: syncdev: lo0 syncpeer: 224.0.0.240 maxupd: 128
    pflog0: flags=100 <promisc>metric 0 mtu 33204</promisc></up,running></up,loopback,running,multicast></full-duplex></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_ucast,wol_mcast,wol_magic></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu,wol_ucast,wol_magic></up,broadcast,running,simplex,multicast> 
    

    I have the full syslog saved from the last 2 occurrences of this, together with the full routing tables, if it helps.

    Does anyone have any idea why I would get an IP from my cable modem, and then, when it "corrects" itself, the routing table isn't fixed.

    Cheers.



  • That is a problem but I have a weird thought. What if it's your cable modem and not your pfsense box? Your link goes down, when it comes up it does a DHCP renew (which is supposed to happen) and more….

    Are you sure your cable modem isn't going bad? I know it's a strange way to go but is it possible to eliminate pfsense from the problem?



  • Well, the cable modem only hands out a 192.168.100/24 address when it doesn't have an Internet connection.  I'd agree with TommyBoy180 that it could be your modem going bad.



  • Firstly, why would the cable modem hand out an IP address, It's not a router, it's just a bridge.

    Secondly, and more importantly, when the situation corrects itself a couple of minutes later, and I get an IP from my ISP, why isn't the routing table set to the correct gateway.  That's the more troubling part.

    Cheers.



  • @EddieA:

    Firstly, why would the cable modem hand out an IP address, It's not a router, it's just a bridge.

    Secondly, and more importantly, when the situation corrects itself a couple of minutes later, and I get an IP from my ISP, why isn't the routing table set to the correct gateway.  That's the more troubling part.

    Cheers.

    All modems hand out a private IP address. The DHCP lease is usually set to expire in 60 seconds. This allows the Cable modem to sync with the ISP. Then it will hand you a Public IP. However you are loosing that connection. This makes me think the Cable modem is going bad or your ISP needs troubleshoot with you.

    If your pfsense box isn't getting the correct configuration from your ISP then you need to troubleshoot what is happening. Just like I pointed out in my first post, try to eliminate some variables first. Take the pfsense box out of the picture and see if you still have the same problem. Go from there.



  • @tommyboy180:

    All modems hand out a private IP address. The DHCP lease is usually set to expire in 60 seconds. This allows the Cable modem to sync with the ISP. Then it will hand you a Public IP. However you are loosing that connection. This makes me think the Cable modem is going bad or your ISP needs troubleshoot with you.

    Or it just lost connection for a while, gave me the 192.168.100.10 for a short period, and then I got the IP from the ISP, as can be seen in the logs I quoted.  OK, despite the fact I've never seen that before, on either my previous Linux gateway, or pfSense, I can live with this part.
    @tommyboy180:

    If your pfsense box isn't getting the correct configuration from your ISP then you need to troubleshoot what is happening.

    But it is.  My ISP wouldn't hand out a gateway address of 192.168.100.1.  And it works perfectly well for every other acquire and renew of the lease.

    The issue with the gateway only happens after I have the "short term" lease from the cable modem.

    BTW, I forgot to give the contents of re_router, when I captured all the other information:

    98.148.120.1
    

    So again, please explain why the gateway is not being set correctly when I get the correct lease from my ISP.

    Cheers.



  • I could be completely wrong, but I don't think your issue is pfsense. I think you have another problem. Do you have the same problem when you connect a PC directly to the modem?



  • No problems.  Other machines connect correctly.

    pfSense connects correctly when I re-boot, or power cycle it following these occurrences.

    pfSense renews it's lease correctly, every time except when it follows the 192.168.100.10 lease.

    Cheers.



  • Can you try a 30-30-30 reset on your modem?



  • @tommyboy180:

    Can you try a 30-30-30 reset on your modem?

    I'm out of town for a couple of days, so can't really do too much until I return.  When I get back, I'll try that.  But, unfortunately, I can't predict if, or when, the issue might happen again.

    I'm also going to pull the cable, from the modem, to see if I can simulate an outage, to see if that reproduces it.

    Cheers.



  • OK, now I'm back home, I can play with this again.  And guess what, I was able to re-produce the issue perfectly.  I pulled the cable, from the modem, without powering it off.  Obviously I lost all internet connection.  When I plugged the cable back in, and checked what was happening in pfSense, I saw exactly the same issues.

    Now, as an experiment, I removed this line from my configuration:

    <shellcmd>ifconfig vr0 inet 192.168.100.1 netmask 255.255.255.0 alias</shellcmd>
    

    Rebooted, and pulled the cable again.

    This time, things worked differently.  My IP again went to 192.168.100.10, and here is my routing table:

    Routing tables
    
    Internet:
    Destination        Gateway            Flags    Refs      Use  Netif Expire
    default            192.168.100.1      UGS         0       58    re0
    98.148.127.153     127.0.0.1          UGHS        0        2    lo0
    127.0.0.1          127.0.0.1          UH          2        0    lo0
    192.168.0.0/24     link#1             UC          0        0    vr0
    
    ...
    
    

    Notice this time, that the gateway still points to the WAN interface, not the LAN.

    A couple of minutes later, the IP reverted back to my ISP's IP, and now the routing table is:

    Routing tables
    
    Internet:
    Destination        Gateway            Flags    Refs      Use  Netif Expire
    default            98.148.120.1       UGS         0       21    re0
    98.148.120.0/21    link#2             UC          0        0    re0
    98.148.120.1       00:01:5c:31:76:01  UHLW        2        0    re0   1200
    98.148.127.153     127.0.0.1          UGHS        0        4    lo0
    127.0.0.1          127.0.0.1          UH          2        0    lo0
    192.168.0.0/24     link#1             UC          0        0    vr0
    ...
    
    

    All correct, and I am able to access the internet again, without issues.

    So, it looks like the alias of 192.168.100.1 I set up, for the LAN interface causes the issues with the gateway.

    However, removing that alias now means that my log gets flooded with this error:

    kernel: arplookup 192.168.100.1 failed: host is not on local network
    

    Which, makes it difficult to look back through what's happened, should I need to.  Is there any other way to suppress that particular error message.

    Cheers.



  • Good post back. Thanks for sharing your findings!
    As far as the arplookup, not sure why your modem is sending arp requests since it already has an IP, perhaps it's how the ISP needs their modems to function. pfSense sees the arp requests and according to its config it is impossible to have this IP on this interface and reacts with the message in log. Check you NAT entries to make sure nothing is mis-configured.

    I do have a question though, why would you set up an alias for a node that is outside of your network? What is the benefit? It just sounds really weird.



  • The reason for the alias can be found here.  It was a way to stop the errors being generated.

    I just noticed that in the "bounty" post, it was suggested to use an alias of 192.168.100.10, not the 192.168.100.1 that I used.  Maybe I'll try that next, especially as I can reproduce this now.

    Cheers.


Log in to reply