DNS/DHCP stop working suddenly
-
@stephenw10
Happened again a few hours later.Only difference is that kernel{if_io_tqg_3}] is now kernel{if_io_tqg_0}]
dtrace was the same and stopped abruptly.
-
You may have an ISP line problem. Something like that happened to me about a year ago. It turned out that the old external coax line was really bad. pfSense did not handle those short and frequent outages too well. My old ASUS router was better at it. Initially I thought that my new NETGATE box was failing completely, but no. It was actually working but very, very slowly. It was taking 3-4 minutes to log into the Web UI and the Internet download speed was 2-3 Mb/s. I think I would be a good idea to check the internal Internet line first and, if that’s fine, ask the ISP to check the external line.
-
@michmoor said in DNS/DHCP stop working suddenly:
Seeing how the dtrace stopped no more than 15s into it due to the high load.."dtrace: processing aborted: Abort due to systemic unresponsiveness"
Ah sorry I completely missed that. That's obviously not normal!
Seeing the load on a different task queue is probably just coincidence.
-
@stephenw10
Could we be looking at something corrupted in the firmware? Would a re-install be a pathway here to resolving this? -
You could try that but it doesn't look like that to me. A damaged filesystem would almost certainly fail immediately.
-
@stephenw10
Incident just happened about an hour ago.
Ive checked each interface and there was no sudden sharp increase in bandwidth so this doesn't seem to be traffic-related.I think....I may have found something very interesting and to your point traffic related....
Restricted vlan is time based. It is not available yet.... that's a lot of packets being processed? how?Assuming the RRD graphs are accurate in terms of counting, the Restricted VLAN is the only vlan with millions of packets being recorded even compared to the WAN which doesn't register that many packets.
-
Hmm, I think that's actually milli-packets per second not millions. Compared to the in-block value it's very low on the graph.
-
Update
Moving the WAN connection from IX3 to IX1 i continued to have the firewall lock up.
Moving the internet connectivity to the igc* interface seems to have resolved the issue. pfSense has remained stable with no lockups.
IX and IGC are different nics on the 6100.
At first, this may seem like perhaps a failing ix* nic card. That's what i thought. Why else does everything work great on the igc nics?
As a test, i found another PC and connected it directly to ix3. Ran a series of speedtests daily. pfSense hasnt locked up at all.The physical connectivity had the ATT Cable modem directly attached to ix3. No patch panel, no switch. The connectivity today is the same except its directly connected to an igc interface.
The only conclusion that i can reach is that for some odd reason, there is something strange taking place on the wire between the modem and pfsense ix* interface which causes the system to freak out and lock up. The cable modem is directly connected to igc* and its normal operation.I cant explain this odd behavior. I appreciate everyones input. @stephenw10 appreciate the additional assistance out of band..
-
Hmm, yeah that's odd. You also tested with a switch on the WAN side between the IX NIC and modem?
And you are still using the IX NICs internally?
-
@stephenw10
I did test with the switch on the WAN side between IX nic and modem.Right now I'm only using ix* interface for testing to directly connected server. WAN is no longer on ix interface.
-
Hmm, not something at the link layer then.
Unless that switch just happened to have the same issue, which seems very unlikely.
-
@stephenw10
This is such a crazy mystery because by all accounts its not a burst in traffic or even a pfsense specific problem per see. I don't think its a cable modem issue. Everything is working as it should be.
This happened in February. Nothing of note happened in the physical set up.The last thing I'm willing to try is a tcpdump on the ix interface when the problem happens. Not sure what that will tell me or what i would be looking for.
-
Yup. It might be something obvious, with any luck! The NIC LEDs indicate something is being passed but whatever it is pfSense doesn't see it. So it could be be some invalid packet type perhaps. Or something in a loop somehow.
-
Just an update for you stephen. ATT offered to replace the att gateway (router). I didn't think it would help/hurt so i ended up replacing it. Ill be....
Its been stable for well over a week.
Replacing ATTs equipment ended up solving the "issue". Why? I dunno.
Why did a bad gateway ended up crashing my 6100? I dunno.
Its fixed tho..... yay.