(SOLVED) Link State Up Down
Seemingly at random I am having my LAN interface drop out. It immediately disconnects all of my VOIP calls. It drops out 5 to 20 times a day. It only happens during working hours, so it must be under load.
Here is one DOWN/UP grouping from the log:
Jun 12 14:14:08 check_reload_status: Linkup starting rl1
Jun 12 14:14:08 kernel: rl1: link state changed to DOWN
Jun 12 14:14:10 check_reload_status: Linkup starting rl1
Jun 12 14:14:10 kernel: rl1: link state changed to UP
Jun 12 14:14:10 php: rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.0.1 )
Jun 12 14:14:12 php: rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.0.1 )
Jun 12 14:14:12 check_reload_status: rc.newwanip starting rl1
Jun 12 14:14:15 php: rc.newwanip: rc.newwanip: Informational is starting rl1.
Jun 12 14:14:15 php: rc.newwanip: rc.newwanip: on (IP address: 192.168.0.1) (interface: LAN[lan]) (real interface: rl1).
Jun 12 14:14:15 check_reload_status: Reloading filter
I am currently set up Modem(WAN)->pfsense->DIR-615 4 Port Router(LAN). The WAN and LAN cards are both RealTek and I am on 2.1.3.
What should I be looking at to fix it? I just replaced the cables. I have had my ISP doing tests on my modem and they are coming back clear. It doesn't look like the router is power cycling or anything. Somewhere in my hardware it looks like my LAN is just dropping out then reconnecting. It started very intermittently (once or twice a day) in April when I upgraded to 2.1.3, now it is pretty regular.
I have ordered 2 new Intel Pro1000's for the firewall and they arrive in a couple of days. Is there anything I should look into to get some functionality until they arrive?
Thank you in advance,
is there any good reason to put the dlink-router behind pfsense?
have you considered, that the dlink might be broken?
I should run some testing on that router, good suggestion.
To explain the "why", I didn't have additional slots available in the case I am using, I build it out of an old micro Dell (Dimension 4500S) and scrap parts (hence the Realtek NIC's). I wanted to install it with 3 LAN and 1 WAN, but I don't have the parts or the slots. Because of the case form factor it has space for 2 PCI cards installed in a cage. The two PCI RealTek NIC's I had laying around.
The Dlink is just working like a router should, taking one incoming connection and giving me 4 routed ports. I needed two static IPs for my phone system and one connection for my internet (goes out to a 24 port switch), so i put the firewall after the modem with one LAN out of Pfsense. All the traffic gets pushed to the Dlink and routed either to the phone system or the network. I know multi-lan in pfsense is a better solution (3 LAN/1WAN), but I don't have a computer with at least 3 PCI lying around.
That's a pretty cheap router so it might be frazzing out and generating a "Link Lost" event on the PF side. I will connect direct to the router and do a continuous packet test on it. I will do the same test on the modem side and see if I am getting any packet loss. That is actually a good thing to check.
Realtek NICs supported by the rl(4) driver are pretty much the bottom of the list in terms of preference.
You could use a VLAN capable switch to get more interfaces into pfSense instead of the router. If you're feeling ambitious, and you have the right version, you could probably put OpenWRT on the D-Link and use that with VLANs. ;)
Actually Realtek 8139 isn't that bad because it doesn't support any of the advanced features like offloading that the engineers could have screwed up even more. It works pretty well assuming the CPU is powerful enough to pass the traffic that the firewall needs to pass. Well enough for home firewall that never has to even approach the 100Mbit/sec limit. The FreeBSD driver is very mature as well and rl(4) devices rarely cause any trouble.
The problem in this case is very likely a speed autonegotiation issue with the modem. I would try fixing the speed on the pfSense WAN at 10/100Mbit manually.
I have two intel pro1000's coming, they will be going in stat. I am going to be testing the D-link tomorrow, I tend to think the problem is with the Realtek rl() driver and card quality in FreeBSD/pfsense 2.1.3.
I just want to make sure that I am not overlooking something simple.
I will test the d-link, and put in the two new cards. I will also check on the speed negotiation as well, that's a good idea too. Anything else I should try?
Giving an update.
I changed the speed to 100 duplex. I also moved my network traffic off the firewall. I kept the two static IP's for the phone connected to the firewall. I am getting them dropping calls either way. It doesn't seem to happen as often with less traffic on the line (understandably).
I am replacing the cards in my pfsense box with the intel pro1000 cards I ordered. But I don't know, or haven't figured out, if the Realtek cards are acting strange under the load, or if the D-Link is acting weird for the same reason. It does appear that once a packet or two gets rejected the calls all behave like they have been hung up. The continuous ping I have been running shows two missed pings, then back to normal.
Is there a graph for dropped packets with time? I know what times it drops out, and I know what happens when it drops (rl1 shows DOWN, then UP) But I don't know what I should be looking for when looking for the reason that it disconnects/reconnects.
Still doing it with the new adapters, is only on the LAN side. It must be the router outside my firewall. Any ideas to check it?
Jun 13 16:21:37 check_reload_status: Linkup starting em1
Jun 13 16:21:37 kernel: em1: link state changed to DOWN
Jun 13 16:21:38 check_reload_status: Linkup starting em1
Jun 13 16:21:38 kernel: em1: link state changed to UP
Jun 13 16:21:39 php: rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.0.1 )
Jun 13 16:21:41 php: rc.linkup: Hotplug event detected for LAN(lan) but ignoring since interface is configured with static IP (192.168.0.1 )
Jun 13 16:21:41 check_reload_status: rc.newwanip starting em1
Jun 13 16:21:44 php: rc.newwanip: rc.newwanip: Informational is starting em1.
Jun 13 16:21:44 php: rc.newwanip: rc.newwanip: on (IP address: 192.168.0.1) (interface: LAN[lan]) (real interface: em1).
Jun 13 16:21:44 check_reload_status: Reloading filter
Sorry for going off-topic, but where did you get that log?
If I were you, I would connect a client to LAN side of router and do ping host -t
I have personally experienced a few d-link routers with bad PSUs that did 'weird' stuff.
Try removing the d-link router or running it purely as a switch. You'll only have one subnet, though it's not obvious reading back whether that's what you have anyway.
OrientalSniper: its under Status/System Logs, its the main log. There are several other logs in there too.
Steve, I am going to try it with a gigabit switch. That makes sense. I do only have one subnet, so it should work. For the short-term I may leave the data side of the network off the firewall and focus on getting the phones constant, maybe I will put it all on that switch and see if it still goes bonkers. People don't complain about the internet dropping nearly as hard as they complain about having phone calls drop.
Are you using the DIR-615 purely to get extra ports to connect to the pfSense LAN interface then? As a wireless access point also?
If so you should just chnage it into an access point and switch by connecting the pfSense LAN directly to one of the D-Link LAN jacks and disabling DHCP on the D-link. Removing any routing/firewall function from the D-link is one less thing that might be causing a problem. See:
It's like you can see my setup. I actually was using the dir615 as a wireless access point with the Wan port disconnected just like you said. I had DHCP turned off and had the two phone, one data, and the LAN from pfsense all hooked up to LAN ports with nothing on the wan. Basically using it as a switch with wireless access. I should have mentioned that.
I still think the dir615 is still the failing component.
I went back in today and put in a Netgear Gigabit switch, everything was testing ok. No packet loss and i was able to stay on a call for over an hour. I won't know for certain until the system is under high load on Monday. I connected the dir615 to my data network strictly as an access point.
Thank you for hanging with me on this one, I appreciate it more than you can imagine.
Some wireless routers have an access-point only mode, that might be worth investigating. The D-link might be doing something odd if it thinks its WAN is down all the time. If the Netgear works out OK it seems you've found the problem. :) You can always connect the D-link to it get wifi back.
Finally back with one week of testing.
I replaced my wireless router with a gigabit switch and then connected the router back into the network as a wireless access point farther downstream.
Seems to have worked well. The router must have gotten all weird with heavy load.
Thank you all for suggestions and help. It was the difference between failure and getting this thing fixed.