VDSL connectivity issues
-
Hi,
Am running latest PFsense on an Supermicro 1U atom server. Setup as follows: Netgear DM200 VDSL modem (WAN interface) > PFsense Router > PoE switches
I have multiple VLANs and port forwarding for a CCTV system. Every now and then (sometimes multiple times a week sometimes nothing in a given week). All internet and routing dies.
I do not know the logging well enough to know where to identify the issue. But I do know that:
- CCTV Vlan does not have access to any other VLANs (including the management VLAN)
- The pfsense WAN interface is shown as offline, even if the WAN link is functional.
- According to my client rebooting the VDSL modem doesnt solve the problem.
The only thing that sorts it is to reboot everything. So can someone please provide me with any insight to investigate?
- I thought perhaps a DOS attack from the CCTV was causing the issue, but these issues were existent long before I opened up port forwarding for the CCTV, so perhaps not entirely connected to that.
Thanks in advance and hopefully this is enough info as a starter..
-
Not a ton of info there, but a few things to check -
When you lose "internet and routing" is the pfSense box still accessible via the LAN and/or the console?
Anything showing under "Status->System Logs"?
Have you tried a "Diagnostics->States>Reset States" instead of a reboot to try and resolve the issue?
Does physically unplugging the WAN cable and then reconnecting make any difference?
What type of NIC is the WAN interface? -
Its an intel NIC as per the supermicro board.
The customer advises me power cycling the VDSL modem doesn't do the trick
Am not sure if the pfsense box is still available - customer cannot access it as its on the MGMT interface. However they advise me other network components are not available after the VDSL modem power cycle.
I haven't tried a state reset. however its a SOHO environment and the state table generally is about 12% full. Should I enable kill states to see if that makes a difference?
What should I be looking for under system logs? There are hundreds of entries even in a small time window.
Thanks.
-
I'd be looking for odd log entries around the time of the failure.
Perhaps under Routing or DNS?It may also be worthwhile searching the forums for other incidents w/Supermicro and or NIC configurations.
I seem to recall seeing a few people have some particular configuration issues.Might be worth swapping LAN/WAN NIC assignments especially if there's a different PCI/e NIC card available.
-
Thanks - its just gone down again apparently and im going there tonight.
The server config has the two intel nic's onboard the mobo. Nothing more.
I couldnt find anything specific to supmicro nics apart from this post about hitting the upper limit on the nic. https://forum.pfsense.org/index.php?topic=42853.0 i dont think thats my problem though, the VDSL speed is only 50mb/s and there are no file servers etc lagging up the network.
I'll feed back on the logs tonight.
-
If you've got any kind of NIC card you could plug into that machine it would probably be worth isolating out the problem.
Given you WAN is no faster than 50Mbps almost any 100mbps card should do.
That should help to eliminate the WAN piece of the puzzle. -
So update as follows:
- "All internet went down again" and luckily the client left it without rebooting. No device had any LAN / WAN connectivity.
- Did not try SSH, however tried accessing MGMT interface both via switch and Supermicro LAN NIC port. No joy, could not get DHCP lease.
- Rebooted pfsense, no improvement.
- Rebooted switch, all came back online.
During the next couple of hours, I went through logs (attached) nothing particularly catches my novice eye as abnormal with exceptions of references to
For logs under DNS, there is nothing (to me) suspicious apart from multiple entries at 14:26, 19:26, 20:31 and 22:19 - the exact times of router reboot.
Under general - again considering a heap of messages at each reboot (times above) the only items I noticed were:
- atkbd0: [GIANT-LOCKED]
- [wan] IPADDR 172.16.11.145 under PPP process - I have nothing on my network using this IP nor subnet - could this be a client logging into their workplace VPN??
- references to UDP 6969 - we believe this could be the nanny using p2p - either that or someone port scanning perhaps? So the logs show an entry to block such ports as well.
KEY QUESTION: I need help in ruling in / out pfsense as the possible route cause. Can anyone can help me determine if anything in the logs (xls attached) looks untoward?
Why the Key Question?
I also saw very odd switch behaviour. My switch setup is as follows (one connects to the other - in a chain):
Pfsense Router (serves DHCP of MGMT, LAN and GUEST vlans)
"Large" 48 Port switch (utilises MGMT, LAN and GUEST vlans)
"Small" PoE Switch (utilises MGMT, LAN and GUEST vlans) - Wifi APs connected hereI noticed after a period of time the port connecting the "large" switch to the "small" switch would fail. No NIC activity light on the large switch port in question, meaning all devices on the small switch rendered useless. - APs couldn't serve IP addresses, as it wasn't getting any via the large switch.
The large switch was still serving other ports with WAN and LAN traffic (and lit up). I tried configuring another unused port on the large switch to serve the small switch. The newly assigned port made everything work for a while, but then after 5 mins it caused that port to fail as well.
Trying to understand "Could this be a switch problem, then which switch is to blame?" I then tried connecting an AP direct to the large switch, setting all the right VLAN tags etc and powering it by AC rather than PoE - couldnt get it to stay online for more than a minute.
None the wiser, and after 4 hours, I rebooted everything, while this fixes things temporarily Im not happy as I need to get to the root cause.
Thank you all for reading!
[Feb 1 Gen and DNS logs.xls](/public/imported_attachments/1/Feb 1 Gen and DNS logs.xls)
-
What type/model are those switches and what do their logs show?
Assuming it's a switch problem, can you remotely login to pfSense or at least ping it on WAN when in error mode?
Are the switches pingable from pfSense then? -
They are
Netgear GS748TV5 (48 Port)
Netgear JGS516PE (16 Port)I can hit the pfsense box via the 48 port switch, but i cant ping the 16 port switch or any devices on it - nor see those static IPs i am expecting in the pfsense ARP table.
-
Look at the cable between those two switches. The 16P switch after that.
Since I personally have never had good experiences with Netgear switches I would replace those first thing in the morning… -
interesting, ive not had issues with netgear like this before. My supplier only has like for like replacements.
Does anyone see anything weird in the logs I sent earlier?
-
If you can ping pfSense and your 48P switch but not the 16P switch then it's unlikely that an unbound log shows anything interesting.
Have you checked the cables already? -
Good Morning,
Yes, the cables between the switches were fine. That same night I saw that odd behaviour even with new cables.
Switches being replaced this weekend. We'll see what happens.
-
Apologies, I really thought I had responded to this.
The issue did in fact prove to be a faulty GS748Tv5 switch, where a fan had failed - yet somehow did not flag up in the GUI or in the status light in the front panel.
For the past 4 weeks or so, on restarts have been required.
Thanks for all your help.