[Solved] pfSense 2.1 Becomes unresponsive
-
You might want to try disabling VTx and multi-cores in the BIOS. I also had system hangs with 2.1, and changing the BIOS solved the problem for me.
-
hmmm I might have to take a look at those settings as well.
For starters I pulled the Dell 2708 switch and swapped it with my netgear 8 port gigabit switch. Watch the logs and see if I still see that makes a differenc. If not, roll the lan port to a different port on the pfsense box.
Past that, good idea on trying that next.
After that, did hear on another thread to try this:
to add the following to /boot/loader.conf
hw.msk.msi_disable="1"
hw.pci.enable_msi=0
hw.pci.enable_msix=0 -
I was about to dismiss the VT-x suggestion but just looked it up and found that there were some P4s that had it. None at 3.2GHz though so I doubt your box will have that option. Maybe worth disabling HyperThreading though, another easy thing to try. I doubt it will do anything.
Disabling MSI or MSI-X could have some baring here, that does seem to help some systems.
hw.msk.msi_disable="1" won't do anything for you as your NICs are all sk(4) not msk(4). Right?What are your NICs reported as in dmesg? What do they look like in pciconf? You should be able to get the 'firmware' revision from pciconf. Here's what I did on an msk interface: https://forum.pfsense.org/index.php/topic,20095.msg203322.html#msg203322
Steve
-
For reference - these are the nics that are on that box:
- mskc0: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
- mskc1: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
- mskc2: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
- mskc3: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
- skc0: Marvell Gigabit Ethernet (LED mod 0.9)
- skc1: Marvell Gigabit Ethernet (LED mod 0.9)
Sk0 being WAN side and skc1 being LAN. Currently not utilizing the others. WAN side hasn't had any issues which is why I have been questioning the dell switch at this point.
$ pciconf -l|grep sk mskc0@pci0:1:0:0: class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00 mskc1@pci0:2:0:0: class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00 mskc2@pci0:3:0:0: class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00 mskc3@pci0:4:0:0: class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00 skc0@pci0:5:3:0: class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00 skc1@pci0:5:4:0: class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00
So it could certainly be a combo of that firmware not playing nice with the dell switch (I have heard others having issues with those), but it certainly looks like I have the same firmware you mentioned, though the rev0x19 ports are currently unused, just using the sk ones at the moment. If switching over to this netgear switch doesn't straighten things out, I guess the next easiest thing to try is to roll the LAN side over to one of the msk ports and try that - at least before going crazy digging into firmware etc.
-
Ah, Ok.
I see you're using my modified drivers. They where only intended for the Watchguard X-e box. Do they correctly drive the LEDs on your box? The LED configuration is only change from the standard driver.Your interfaces appear identical to those in the firebox (probably came out of the same factory in Taiwan) and in that box the sk interfaces have never given any trouble. Only the msk interfaces have a bug, which is easily worked around.
When the box locks up do you still have serial console access?
Steve
-
I will have to watch that next time, I would suspect I do.
As far as the LED indicator lights on the ports, yes blink with traffic etc - so seem to be working just fine.
The only interface that seems to have that up/down issue is sk1… though I see you saw the other post on what happened when I tried to swap devices without thinking it through and scrambled the whole lan side.
I am back to where I was in this thread at least.
I can bring the LAN port back up by simply pulling the cable, and plugging it back in. Detects a hotplug event and brings everything back up on the LAN side... the most recent event on the below log is me actually physically unplugging the cable for a moment and plugging it back in.
Dec 17 07:57:31 check_reload_status: updating dyndns lan Dec 17 07:57:24 php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address' Dec 17 07:57:24 php: rc.linkup: HOTPLUG: Configuring interface lan Dec 17 07:57:24 php: rc.linkup: DEVD Ethernet attached event for lan Dec 17 07:57:22 kernel: sk1: link state changed to UP Dec 17 07:57:22 check_reload_status: Linkup starting sk1 Dec 17 07:57:20 php: rc.linkup: DEVD Ethernet detached event for lan Dec 17 07:57:18 kernel: sk1: link state changed to DOWN Dec 17 07:57:18 check_reload_status: Linkup starting sk1 Dec 17 07:56:25 check_reload_status: updating dyndns lan Dec 17 07:56:18 php: rc.linkup: DEVD Ethernet detached event for lan Dec 17 07:56:18 php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address' Dec 17 07:56:18 php: rc.linkup: HOTPLUG: Configuring interface lan Dec 17 07:56:18 php: rc.linkup: DEVD Ethernet attached event for lan Dec 17 07:56:16 check_reload_status: Linkup starting sk1 Dec 17 07:56:16 kernel: sk1: link state changed to UP Dec 17 07:56:16 kernel: sk1: link state changed to DOWN Dec 17 07:56:16 check_reload_status: Linkup starting sk1 Dec 17 07:52:49 login: login on console as root
-
Well this time I was able to successfully migrate the LAN from sk1 to msk0. Not sure where it went wrong last time, but I had things broken up and could watch the console if something went wrong… so was a little more prepared to "try and see what happens."
Since the disconnect events were still happening, even with swapping out the switch, and hard setting the connection type to 1000baseT Full-Duplex, figured trying a different port (all the msk ones report a different chipset/driver config) made the most sense to try next.
-
Make sure you have disabled MSI for the msk NICs or you'll probably experience the 'watchdog tuimeout' errors. I would always recommend using:
hw.msk.msi_disable="1"
Since it leave msi/msi-x available for everything else on the pcibus. Using:
hw.pci.enable_msi="0" hw.pci.enable_msix="0"
disables it globally for everything.
Steve
-
Ok I will give that a shot - so far 8hrs and no errors, but now I am starting to use the connection, so I will wait and see. (Have had longer times without errors.)
I did enter in those commands manually, and then also add them to /boot/loader.conf
hw.msk.msi_disable="1" hw.pci.enable_msi="0" hw.pci.enable_msix="0"
If sk1 is just a bad or flaky port, hopefully this will resolve things.
-
Well its been a record 24hrs without a single loss of the lan side, so I am tempted to call this issue resolved.
At the end of all of it, I think we can conclude the errors were due to a flaky sk1 (original LAN) port on the pfsense box. Perhaps there would have been a quicker way to reach that conclusion, though as part of the "adventure" I certainly have a far better understanding of pfsense than I did when I dropped it in as the gatekeeper of my network.
Thanks all for the help in troubleshooting! I am going to go ahead and mark this thread as solved for now. I can always change it back if I am wrong and the error comes back…. but it has survived 24hrs and a stress test without kicking out a single error... so I am going to go with it! :)