Default gateway incorrect
-
This is a follow up, with more information about this problem I reported.
I've been running pfSense, as my firewall since around March. That issue, in July was the 1st time it happened. Since then, it's happened 3 more times. August 5th, August 18th, and last time on August 20th. On that last occasion, my system had been up for less than 2 hours when it "broke".
Basically, I lose all connection to the internet, from my internal network, all all DNS lookup attempts are blocked by the firewall.
What it looks like happens, is that something happens on the WAN interface, as each time I've seen this, it appears to start with:
Aug 18 21:29:33 roadblock kernel: re0: link state changed to DOWN Aug 18 21:29:35 roadblock kernel: re0: link state changed to UP
At that point, and I don't entirely understand why, when it tries to do a DHCP renewal, it get's an IP from my cable modem, 192.168.100.10.
Aug 18 21:29:53 roadblock dhclient[29217]: bound to 192.168.100.10 -- renewal in 30 seconds.
I'm assuming, at that point, is where the "damage" is done, based on what I see later. Unfortunately, I've never been able to catch things, while it still has the 192.168.100.10 address assigned.
A couple of minutes later, I see the dhclient binding to the correct, cable ISP, address, and everything appears to be correct.
Aug 18 21:31:18 roadblock dhclient[29283]: bound to 98.148.127.153 -- renewal in 43199 seconds. Aug 18 21:31:18 roadblock dhclient[29218]: connection closed Aug 18 21:31:18 roadblock dhclient[29218]: connection closed Aug 18 21:31:18 roadblock dhclient[29218]: exiting. Aug 18 21:31:18 roadblock dhclient[29218]: exiting. Aug 18 21:31:32 roadblock dhclient[29952]: bound to 98.148.127.153 -- renewal in 43192 seconds.
But, unfortunately it isn't. The one piece that I think was changed, when the 192.168.100.10 address was given out, that doesn't get reset, when the correct address is recovered, is the "default gateway":
[root@roadblock.bogolinux.net]/root(2): netstat -r Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 192.168.100.1 UGS 0 790210 vr0 98.148.120.0/21 link#2 UC 0 0 re0 98.148.120.1 00:01:5c:31:76:01 UHLW 1 0 re0 1222 98.148.127.153 localhost UGHS 0 1 lo0 localhost localhost UH 2 0 lo0 192.168.0.0 link#1 UC 0 0 vr0 ...
Notice it's still pointing to the cable modem, as it's IP, instead of my ISP's IP, and also it's assigned my LAN interface, not my WAN.
Here's the "correct" version:
[root@roadblock.bogolinux.net]/root(2): netstat -r Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default cpe-98-148-120-1.s UGS 0 21976 re0 98.148.120.0/21 link#2 UC 0 0 re0 cpe-98-148-120-1.s 00:01:5c:31:76:01 UHLW 2 1070 re0 1200 cpe-98-148-127-153 localhost UGHS 0 0 lo0 localhost localhost UH 1 0 lo0 192.168.0.0 link#1 UC 0 0 vr0 ...
I do have an alias, for 192.168.100.1, for my cable modem, in the LAN configuration, to stop my syslog being flooded with messages, during normal operation, as per a previous post I found in these forums:
[root@roadblock.bogolinux.net]/root(1): ifconfig vr0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=2808 <vlan_mtu,wol_ucast,wol_magic>ether 00:16:35:06:de:b9 inet 192.168.0.55 netmask 0xffffff00 broadcast 192.168.0.255 inet6 fe80::216:35ff:fe06:deb9%vr0 prefixlen 64 scopeid 0x1 inet 192.168.100.1 netmask 0xffffff00 broadcast 192.168.100.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active re0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=389b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_ucast,wol_mcast,wol_magic>ether 00:c0:49:fa:42:d1 inet6 fe80::2c0:49ff:fefa:42d1%re0 prefixlen 64 scopeid 0x2 inet 98.148.127.153 netmask 0xfffff800 broadcast 255.255.255.255 media: Ethernet autoselect (100baseTX <full-duplex>) status: active enc0: flags=0<> metric 0 mtu 1536 lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 pfsync0: flags=41 <up,running>metric 0 mtu 1460 pfsync: syncdev: lo0 syncpeer: 224.0.0.240 maxupd: 128 pflog0: flags=100 <promisc>metric 0 mtu 33204</promisc></up,running></up,loopback,running,multicast></full-duplex></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_ucast,wol_mcast,wol_magic></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu,wol_ucast,wol_magic></up,broadcast,running,simplex,multicast>
I have the full syslog saved from the last 2 occurrences of this, together with the full routing tables, if it helps.
Does anyone have any idea why I would get an IP from my cable modem, and then, when it "corrects" itself, the routing table isn't fixed.
Cheers.
-
That is a problem but I have a weird thought. What if it's your cable modem and not your pfsense box? Your link goes down, when it comes up it does a DHCP renew (which is supposed to happen) and more….
Are you sure your cable modem isn't going bad? I know it's a strange way to go but is it possible to eliminate pfsense from the problem?
-
Well, the cable modem only hands out a 192.168.100/24 address when it doesn't have an Internet connection. I'd agree with TommyBoy180 that it could be your modem going bad.
-
Firstly, why would the cable modem hand out an IP address, It's not a router, it's just a bridge.
Secondly, and more importantly, when the situation corrects itself a couple of minutes later, and I get an IP from my ISP, why isn't the routing table set to the correct gateway. That's the more troubling part.
Cheers.
-
Firstly, why would the cable modem hand out an IP address, It's not a router, it's just a bridge.
Secondly, and more importantly, when the situation corrects itself a couple of minutes later, and I get an IP from my ISP, why isn't the routing table set to the correct gateway. That's the more troubling part.
Cheers.
All modems hand out a private IP address. The DHCP lease is usually set to expire in 60 seconds. This allows the Cable modem to sync with the ISP. Then it will hand you a Public IP. However you are loosing that connection. This makes me think the Cable modem is going bad or your ISP needs troubleshoot with you.
If your pfsense box isn't getting the correct configuration from your ISP then you need to troubleshoot what is happening. Just like I pointed out in my first post, try to eliminate some variables first. Take the pfsense box out of the picture and see if you still have the same problem. Go from there.
-
All modems hand out a private IP address. The DHCP lease is usually set to expire in 60 seconds. This allows the Cable modem to sync with the ISP. Then it will hand you a Public IP. However you are loosing that connection. This makes me think the Cable modem is going bad or your ISP needs troubleshoot with you.
Or it just lost connection for a while, gave me the 192.168.100.10 for a short period, and then I got the IP from the ISP, as can be seen in the logs I quoted. OK, despite the fact I've never seen that before, on either my previous Linux gateway, or pfSense, I can live with this part.
@tommyboy180:If your pfsense box isn't getting the correct configuration from your ISP then you need to troubleshoot what is happening.
But it is. My ISP wouldn't hand out a gateway address of 192.168.100.1. And it works perfectly well for every other acquire and renew of the lease.
The issue with the gateway only happens after I have the "short term" lease from the cable modem.
BTW, I forgot to give the contents of re_router, when I captured all the other information:
98.148.120.1
So again, please explain why the gateway is not being set correctly when I get the correct lease from my ISP.
Cheers.
-
I could be completely wrong, but I don't think your issue is pfsense. I think you have another problem. Do you have the same problem when you connect a PC directly to the modem?
-
No problems. Other machines connect correctly.
pfSense connects correctly when I re-boot, or power cycle it following these occurrences.
pfSense renews it's lease correctly, every time except when it follows the 192.168.100.10 lease.
Cheers.
-
Can you try a 30-30-30 reset on your modem?
-
Can you try a 30-30-30 reset on your modem?
I'm out of town for a couple of days, so can't really do too much until I return. When I get back, I'll try that. But, unfortunately, I can't predict if, or when, the issue might happen again.
I'm also going to pull the cable, from the modem, to see if I can simulate an outage, to see if that reproduces it.
Cheers.
-
OK, now I'm back home, I can play with this again. And guess what, I was able to re-produce the issue perfectly. I pulled the cable, from the modem, without powering it off. Obviously I lost all internet connection. When I plugged the cable back in, and checked what was happening in pfSense, I saw exactly the same issues.
Now, as an experiment, I removed this line from my configuration:
<shellcmd>ifconfig vr0 inet 192.168.100.1 netmask 255.255.255.0 alias</shellcmd>
Rebooted, and pulled the cable again.
This time, things worked differently. My IP again went to 192.168.100.10, and here is my routing table:
Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 192.168.100.1 UGS 0 58 re0 98.148.127.153 127.0.0.1 UGHS 0 2 lo0 127.0.0.1 127.0.0.1 UH 2 0 lo0 192.168.0.0/24 link#1 UC 0 0 vr0 ...
Notice this time, that the gateway still points to the WAN interface, not the LAN.
A couple of minutes later, the IP reverted back to my ISP's IP, and now the routing table is:
Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 98.148.120.1 UGS 0 21 re0 98.148.120.0/21 link#2 UC 0 0 re0 98.148.120.1 00:01:5c:31:76:01 UHLW 2 0 re0 1200 98.148.127.153 127.0.0.1 UGHS 0 4 lo0 127.0.0.1 127.0.0.1 UH 2 0 lo0 192.168.0.0/24 link#1 UC 0 0 vr0 ...
All correct, and I am able to access the internet again, without issues.
So, it looks like the alias of 192.168.100.1 I set up, for the LAN interface causes the issues with the gateway.
However, removing that alias now means that my log gets flooded with this error:
kernel: arplookup 192.168.100.1 failed: host is not on local network
Which, makes it difficult to look back through what's happened, should I need to. Is there any other way to suppress that particular error message.
Cheers.
-
Good post back. Thanks for sharing your findings!
As far as the arplookup, not sure why your modem is sending arp requests since it already has an IP, perhaps it's how the ISP needs their modems to function. pfSense sees the arp requests and according to its config it is impossible to have this IP on this interface and reacts with the message in log. Check you NAT entries to make sure nothing is mis-configured.I do have a question though, why would you set up an alias for a node that is outside of your network? What is the benefit? It just sounds really weird.
-
The reason for the alias can be found here. It was a way to stop the errors being generated.
I just noticed that in the "bounty" post, it was suggested to use an alias of 192.168.100.10, not the 192.168.100.1 that I used. Maybe I'll try that next, especially as I can reproduce this now.
Cheers.