WAN interface keeps dropping after upgrade to 1.2.1
-
After upgrading to 1.2.1 a couple days ago (not sure if that's related) the WAN interface has been dying on me. It looks like my dhcp lease time is 8 hours and dhclient is never able to renew the lease, the system log fills with these:
Jan 4 11:52:16 dhclient[328]: SENDING DIRECT
Jan 4 11:52:16 dhclient[328]: DHCPREQUEST on fxp0 to [address removed] port 67Reboot helps. Any ideas how to start troubleshooting this, I'm fairly clueless with BSDs. :/
-
I have not had trouble with DHCP on my WAN interface (vr NIC or udav).
On startup the DHCP will send a message to a broadcast address; once its been assigned an address it will request a lease extension from the server that assigned it an address. It appears you may not be getting the lease extension: Here's what the lease extension looks like in my logs:
Jan 3 09:15:01 pfsense dhclient[333]: DHCPREQUEST on vr0 to [address removed] port 67
Jan 3 09:15:01 pfsense dhclient[333]: SENDING DIRECT
Jan 3 09:15:01 pfsense dhclient[333]: DHCPACK from [address removed]
Jan 3 09:15:01 pfsense dhclient[333]: bound to [address removed] – renewal in 129600 seconds.Is the DHCP server receiving the request? Is there a log on the DHCP server you can look at? Is the DHCP server sending a response? Is it logged. Is pfsense receiving a response to its request? (You could open a ssh session from the LAN side and use tcpdump to look at received packets:
tcpdump -i fxp0 udp portrange 67-68
will show traffic on interface fxp0, udp protocol using ports 67 to 68, the ports used by DHCP.
-
Thanks for your response. On reboot dhclient logs this:
Jan 4 12:06:09 dhclient[257]: bound to [address removed] – renewal in 21600 seconds.
Jan 4 12:06:09 dhclient[257]: DHCPACK from [address removed]
Jan 4 12:06:09 dhclient[257]: DHCPREQUEST on fxp0 to 255.255.255.255 port 67
Jan 4 12:06:07 dhclient[257]: DHCPOFFER from [address removed]
Jan 4 12:06:06 dhclient[257]: DHCPDISCOVER on fxp0 to 255.255.255.255 port 67 interval 1
Jan 4 12:05:50 dhclient[257]: DHCPREQUEST on fxp0 to 255.255.255.255 port 6721600 seconds is 6 hours, add an hour of unsuccessful renewal attempts and the pattern matches exactly with the connection dropouts. I haven't been able to see what's going on since the webgui only shows last 50 messages and only saves to ramdisk, guess I'll have to wait for the next drop and check the syslog file. I'm guessing it's under /var/log/?
As of the dhcp server I have no clue, it's my ISP's server for my broadband connection at home.
-
You'll find more log details than the web GUI provides in the file /var/log/system.log
-
Unfortunately there's nothing in the logs, after the 6 hour lease time dhclient tries to renew the lease but never gets a response. Last night this happened at 5AM and it kept on trying until at 11AM it successfully renewed the lease without me touching anything. I suppose it's a possibility there's a problem on the ISP's end, but the fact that the problems started immediately after upgrading to 1.2.1, and the fact that the problem is always fixed with a reboot suggests otherwise imo. Any ideas or should I just downgrade back to 1.2?
-
Did you try to get a tcpdump trace of the DHCP interactions? It could be helpful to know if you are getting a DHCP response from the ISP. If there is no response at all then you should probably discuss this with your ISP. If you are getting a response then we can look further into why dhclient is apparently not seeing it.
Did you look in the firewall logs to see if a DHCP response was blocked by the firewall?
Regarding downgrade to 1.2 - I guess that depends on how much time you are prepared to put into investigating this problem, whether or not 1.2.1 has any new features that are important to you and how much pain the problem is causing. As time passes, its more and more likely that if you report another problem people will encourage you to upgrade to 1.2.1 (or later).
-
Just had another dropout so took a trace and sure enough, no responses from the DHCP server, and they're not being blocked by the firewall. I do see the outgoing requests though.
A wild stab in the dark, it looks like my isp's server always immediately answers to a DHCPDISCOVER broadcast and the following DHCPREQUEST, but not to the DHCPREQUEST renewal. I'm probably way off base here but could it be that it requires the initial broadcast "handshake"?
I guess I'll have a chat with my ISP's tech support tomorrow, thanks for your help.
-
A wild stab in the dark, it looks like my isp's server always immediately answers to a DHCPDISCOVER broadcast and the following DHCPREQUEST, but not to the DHCPREQUEST renewal. I'm probably way off base here but could it be that it requires the initial broadcast "handshake"?
If it does its broken!
The tcpdump trace shows packets before they go to firewall.
-
Opened a ticket at my isp's support, from their end the line seems functional and they're not aware of any problems with their DHCP service, as in no other complaints.
I wonder if FreeBSD7 doesn't like my Intel 82557 Pro/100.
-
Opened a ticket at my isp's support, from their end the line seems functional and they're not aware of any problems with their DHCP service, as in no other complaints.
That's different from confirming they are actually sending DHCP responses back to you.
I wonder if FreeBSD7 doesn't like my Intel 82557 Pro/100.
The fxp has been around for a long time and gigabit interfaces have been available quite cheaply for some years so I suppose its possible the fxp driver has suffered some sort of regression that has hasn't been picked up because it hasn't been exercised in the necessary way. But I find it hard to imagine what kind of regression (apparently) affects DHCP responses and only DHCP responses and causes them to be discarded before they are traced.
Have you checked the interface error counters (# netstat -i ) for input errors on your WAN interface?
-
Currently seems to be 298 Ierrs out of 12635041 Ipkts since the last reboot (so this period doesn't include dhcp problems), this sounds a bit high?
Update: Had another connection drop, no additional Ierrs and nothing in the logs.
-
I presume your first hop from the pfsense box to the ISP is to a modem of some sort. If so, your error count is probably OK.
Suggestion: Next time pfSense attempts to renew its DHCP lease make sure there is no other traffic on the WAN link no FTPs, no web browsing etc. Then do a
tcpdump -i fxp0
for as long as it takes to see a few DHCP renewal requests go to the ISP. This will show ALL traffic on that hop. You're looking to see if there is any response at all. Check the interface counters before and after to see if there were any received packets or any errors. Perhaps you are getting a DHCP response that is not accepted by dhclient but is accepted by some other systems (e.g. Windows). If you see a consistent response try again with the command
tcpdump -p -i fxp0
Do you see a different pattern?
If you can, capture the traces and post them here.
-
Ok I'll try to do a trace of all the trafiic. And no modems involved, just direct ethernet connection (fibre-optic).
-
My gut feel, based on reported characteristics of fibre links, is that about 300 errors in about 12,000,000 packets is a "high" error rate. I doubt your system has been up long enough for the packet counter to wrap :-)
-
Ok I did the trace and still no response. After that I downgraded to 1.2 and now for the first time in a week the connection has been up for 16 hours straight, the renewals work again. I would say there's something funny with 1.2.1 and my setup, but unfortunately I don't have the time nor energy to look into it further at the moment.
Thanks for your help!
-
Apparently I'm having the same problem after upgrading from 1.2.RELEASE to 1.2.2.
The affected interface is rl1 (WAN2), which gets its external IP address from a Zyxel P-660R-D1 working in half-bridge mode.
Here is the relevant log:
Jan 15 14:40:02 dhclient[307]: SENDING DIRECT Jan 15 14:40:02 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:39:33 dhclient[307]: SENDING DIRECT Jan 15 14:39:33 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:39:20 dhclient[307]: SENDING DIRECT Jan 15 14:39:20 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:39:12 dhclient[307]: SENDING DIRECT Jan 15 14:39:12 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:39:04 dhclient[307]: SENDING DIRECT Jan 15 14:39:04 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:39:01 dhclient[307]: SENDING DIRECT Jan 15 14:39:01 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:38:58 dhclient[307]: SENDING DIRECT Jan 15 14:38:58 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:38:56 dhclient[307]: SENDING DIRECT Jan 15 14:38:56 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 14:11:12 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 14:08:56 dhclient[307]: bound to [EXTERNAL_IPADDRESS] -- renewal in 1800 seconds. Jan 15 14:08:56 dhclient[307]: DHCPACK from 192.168.1.1 Jan 15 14:08:56 dhclient[307]: DHCPREQUEST on rl1 to 255.255.255.255 port 67 Jan 15 14:08:16 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 14:05:58 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 14:03:57 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 14:02:06 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 14:01:11 dhclient[307]: SENDING DIRECT Jan 15 14:01:11 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 13:59:33 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 13:57:37 dhclient[307]: SENDING DIRECT Jan 15 13:57:37 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 13:56:13 dnsmasq[848]: reading /var/dhcpd/var/db/dhcpd.leases Jan 15 13:53:07 dhclient[307]: SENDING DIRECT Jan 15 13:53:07 dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 15 13:51:14 dhclient[307]: SENDING DIRECT [...]
-
Looks familiar, is it fixed with a reboot and starts again when your dhcp lease expires? Which network card does the interface have?
-
The firewall is a Linitx FX5620 (Fabiatech) with one GB RTL8110S network card and five RTL8100C. It used to appear on the pfsense recommended hardware list. I'm having the problem on one of the RTL8100C ports.
The connection has been cut only four or five times since the upgrade. It may take place when the dhcp lease expires, but not every time. It seems to restablish itself after some time (not yet sure of how much time neither). And yes, a reboot "fixes" temporarily the problem.
After rebooting the ADSL router and the pfsense box, the connection has been stable for the last 20 hours or so. Here goes a commented log:
[...] Jan 16 06:49:29 pfsense dhclient[307]: DHCPREQUEST on rl1 to 255.255.255.255 port 67 Jan 16 06:49:29 pfsense dhclient[307]: DHCPACK from 192.168.1.1 Jan 16 06:49:29 pfsense dhclient[307]: bound to EXTERNAL_IPADDRESS -- renewal in 1800 seconds. #(gets address) Jan 16 07:19:29 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 #(at the end of the lease, asks for renewal) Jan 16 07:19:29 pfsense dhclient[307]: SENDING DIRECT Jan 16 07:19:30 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 #(it doesn't get it and keeps asking) Jan 16 07:19:30 pfsense dhclient[307]: SENDING DIRECT Jan 16 07:19:32 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 16 07:19:32 pfsense dhclient[307]: SENDING DIRECT Jan 16 07:19:34 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 16 07:19:34 pfsense dhclient[307]: SENDING DIRECT Jan 16 07:19:36 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 [...] Jan 16 07:47:43 pfsense dhclient[307]: DHCPREQUEST on rl1 to 255.255.255.255 port 67 Jan 16 07:47:43 pfsense dhclient[307]: DHCPACK from 192.168.1.1 Jan 16 07:47:43 pfsense dhclient[307]: bound to EXTERNAL_IPADDRESS -- renewal in 1800 seconds. #(finally it gets a new lease, after about 24 minutes) Jan 16 08:17:43 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 #(and the cycle starts again) Jan 16 08:17:43 pfsense dhclient[307]: SENDING DIRECT Jan 16 08:17:45 pfsense dhclient[307]: DHCPREQUEST on rl1 to 192.168.1.1 port 67 Jan 16 08:17:45 pfsense dhclient[307]: SENDING DIRECT [...]
Could the DHCP renewal problem be provoked by the fact that dhclient is sending the renewal requests directly to the address of the DHCP server instead of the broadcast address? Is there a way to ask dhclient to avoid sending "direct" renewal requests?
-
The OPT1 (rl1) connection is falling more or less once every day and now I'm also getting this messages:
Jan 20 12:38:45 kernel: arpresolve: can't allocate route for OPT1_EXTERNAL_IPADDRESS Jan 20 12:38:45 kernel: arplookup OPT1_EXTERNAL_IPADDRESS failed: host is not on local network
They seem to be related to the load balancing monitor IP address of the OPT1 interface (wich, to make a long story short, is OPT1_EXTERNAL_IPADDRESS).
I'm beginning to get into trouble at work, so I think that I will have to downgrade to 1.2 very soon. Has anyone done a successful downgrade from 1.2.2 to 1.2 with the firmware update option?
-
Could the DHCP renewal problem be provoked by the fact that dhclient is sending the renewal requests directly to the address of the DHCP server instead of the broadcast address?
That's the way it is supposed to work: when you ask for lease renewal you ask the party that granted the lease.
Is there a way to ask dhclient to avoid sending "direct" renewal requests?
I suggest you lookup the FreeBSD man pages: go to http://www.freebsd.org/cgi/man.cgi and type dhclient in the Man page or Keyword Search box and click Submit.