Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [Solved] pfSense 2.1 Becomes unresponsive

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    17 Posts 3 Posters 6.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fatsailor
      last edited by

      You might want to try disabling VTx and multi-cores in the BIOS. I also had system hangs with 2.1, and changing the BIOS solved the problem for me.

      1 Reply Last reply Reply Quote 0
      • O
        olorinpc
        last edited by

        hmmm I might have to take a look at those settings as well.

        For starters I pulled the Dell 2708 switch and swapped it with my netgear 8 port gigabit switch.  Watch the logs and see if I still see that makes a differenc.  If not, roll the lan port to a different port on the pfsense box.

        Past that, good idea on trying that next.

        After that, did hear on another thread to try this:

        to add the following to /boot/loader.conf

        hw.msk.msi_disable="1"
        hw.pci.enable_msi=0
        hw.pci.enable_msix=0

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          I was about to dismiss the VT-x suggestion but just looked it up and found that there were some P4s that had it. None at 3.2GHz though so I doubt your box will have that option. Maybe worth disabling HyperThreading though, another easy thing to try. I doubt it will do anything.

          Disabling MSI or MSI-X could have some baring here, that does seem to help some systems.
          hw.msk.msi_disable="1" won't do anything for you as your NICs are all sk(4) not msk(4). Right?

          What are your NICs reported as in dmesg? What do they look like in pciconf? You should be able to get the 'firmware' revision from pciconf. Here's what I did on an msk interface: https://forum.pfsense.org/index.php/topic,20095.msg203322.html#msg203322

          Steve

          1 Reply Last reply Reply Quote 0
          • O
            olorinpc
            last edited by

            For reference - these are the nics that are on that box:

            • mskc0: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
            • mskc1: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
            • mskc2: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
            • mskc3: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
            • skc0: Marvell Gigabit Ethernet (LED mod 0.9)
            • skc1: Marvell Gigabit Ethernet (LED mod 0.9)

            Sk0 being WAN side and skc1 being LAN.  Currently not utilizing the others.  WAN side hasn't had any issues which is why I have been questioning the dell switch at this point.

            
            $ pciconf -l|grep sk
            mskc0@pci0:1:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
            mskc1@pci0:2:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
            mskc2@pci0:3:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
            mskc3@pci0:4:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
            skc0@pci0:5:3:0:	class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00
            skc1@pci0:5:4:0:	class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00
            
            

            So it could certainly be a combo of that firmware not playing nice with the dell switch (I have heard others having issues with those), but it certainly looks like I have the same firmware you mentioned, though the rev0x19 ports are currently unused, just using the sk ones at the moment.  If switching over to this netgear switch doesn't straighten things out, I guess the next easiest thing to try is to roll the LAN side over to one of the msk ports and try that - at least before going crazy digging into firmware etc.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Ah, Ok.
              I see you're using my modified drivers. They where only intended for the Watchguard X-e box. Do they correctly drive the LEDs on your box? The LED configuration is only change from the standard driver.

              Your interfaces appear identical to those in the firebox (probably came out of the same factory in Taiwan) and in that box the sk interfaces have never given any trouble. Only the msk interfaces have a bug, which is easily worked around.

              When the box locks up do you still have serial console access?

              Steve

              1 Reply Last reply Reply Quote 0
              • O
                olorinpc
                last edited by

                I will have to watch that next time, I would suspect I do.

                As far as the LED indicator lights on the ports, yes blink with traffic etc - so seem to be working just fine.

                The only interface that seems to have that up/down issue is sk1… though I see you saw the other post on what happened when I tried to swap devices without thinking it through and scrambled the whole lan side.

                I am back to where I was in this thread at least.

                I can bring the LAN port back up by simply pulling the cable, and plugging it back in.  Detects a hotplug event and brings everything back up on the LAN side...  the most recent event on the below log is me actually physically unplugging the cable for a moment and plugging it back in.

                
                Dec 17 07:57:31	check_reload_status: updating dyndns lan
                Dec 17 07:57:24	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
                Dec 17 07:57:24	php: rc.linkup: HOTPLUG: Configuring interface lan
                Dec 17 07:57:24	php: rc.linkup: DEVD Ethernet attached event for lan
                Dec 17 07:57:22	kernel: sk1: link state changed to UP
                Dec 17 07:57:22	check_reload_status: Linkup starting sk1
                Dec 17 07:57:20	php: rc.linkup: DEVD Ethernet detached event for lan
                Dec 17 07:57:18	kernel: sk1: link state changed to DOWN
                Dec 17 07:57:18	check_reload_status: Linkup starting sk1
                Dec 17 07:56:25	check_reload_status: updating dyndns lan
                Dec 17 07:56:18	php: rc.linkup: DEVD Ethernet detached event for lan
                Dec 17 07:56:18	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
                Dec 17 07:56:18	php: rc.linkup: HOTPLUG: Configuring interface lan
                Dec 17 07:56:18	php: rc.linkup: DEVD Ethernet attached event for lan
                Dec 17 07:56:16	check_reload_status: Linkup starting sk1
                Dec 17 07:56:16	kernel: sk1: link state changed to UP
                Dec 17 07:56:16	kernel: sk1: link state changed to DOWN
                Dec 17 07:56:16	check_reload_status: Linkup starting sk1
                Dec 17 07:52:49	login: login on console as root
                
                
                1 Reply Last reply Reply Quote 0
                • O
                  olorinpc
                  last edited by

                  Well this time I was able to successfully migrate the LAN from sk1 to msk0.  Not sure where it went wrong last time, but I had things broken up and could watch the console if something went wrong… so was a little more prepared to "try and see what happens."

                  Since the disconnect events were still happening, even with swapping out the switch, and hard setting the connection type to 1000baseT Full-Duplex, figured trying a different port (all the msk ones report a different chipset/driver config) made the most sense to try next.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Make sure you have disabled MSI for the msk NICs or you'll probably experience the 'watchdog tuimeout' errors. I would always recommend using:

                    hw.msk.msi_disable="1"
                    

                    Since it leave msi/msi-x available for everything else on the pcibus. Using:

                    hw.pci.enable_msi="0"
                    hw.pci.enable_msix="0"
                    

                    disables it globally for everything.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • O
                      olorinpc
                      last edited by

                      Ok I will give that a shot - so far 8hrs and no errors, but now I am starting to use the connection, so I will wait and see.  (Have had longer times without errors.)

                      I did enter in those commands manually, and then also add them to /boot/loader.conf

                      
                      hw.msk.msi_disable="1"
                      hw.pci.enable_msi="0"
                      hw.pci.enable_msix="0"
                      
                      

                      If sk1 is just a bad or flaky port, hopefully this will resolve things.

                      1 Reply Last reply Reply Quote 0
                      • O
                        olorinpc
                        last edited by

                        Well its been a record 24hrs without a single loss of the lan side, so I am tempted to call this issue resolved.

                        At the end of all of it, I think we can conclude the errors were due to a flaky sk1 (original LAN) port on the pfsense box.  Perhaps there would have been a quicker way to reach that conclusion, though as part of the "adventure" I certainly have a far better understanding of pfsense than I did when I dropped it in as the gatekeeper of my network.

                        Thanks all for the help in troubleshooting!  I am going to go ahead and mark this thread as solved for now.  I can always change it back if I am wrong and the error comes back…. but it has survived 24hrs and a stress test without kicking out a single error... so I am going to go with it! :)

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.