[Solved] pfSense 2.1 Becomes unresponsive



  • Hey all, new to the forums and pfsense in general.  Been trying to troubleshoot an issue I have been seeing where pfsense becomes totally unresponsive and I need to do a hard reboot to regain internet/access to it, but then it comes up fine.

    The system is based on the Portwell NAR-5500 (Intel Pentium 4 HT 3.2GHz - 2GB DDR Ram, 80GB SATA HDD, 6x Gigabit Ethernet ports ).

    I was finally able to catch something in the logs, so have included that below.  I so far have pulled off snort, squid, and pfblocker in case those were causing issues - though obviously not as the root problem is still happening.

    Last thought I had was that the wireless router I have bridged to the switch (lan/sk1 is hooked to the switch, wireless router might have been being flaky.

    However in the logs I finally caught evidence that somehow sk1 (lan port) is going down and going into a reset cycle, and then just staying down.

    I am currently trying the recommendation in this old post, turning the interface off of auto to 1000baset full duplex, but not sure if thats it or not
    (Post: http://forum.pfsense.org/index.php?topic=48272.10;wap2 )

    It hasn't been long enough for pfsense to lock back up, unless of course that was the issue and the problem is resolved… but thought I would post here and see if anyone has any ideas of something else to check.

    
    Dec 14 14:30:00 	php: servicewatchdog_cron.php: Service Watchdog detected service dhcpd stopped. Restarting dhcpd (DHCP Service)
    Dec 14 14:29:37 	check_reload_status: updating dyndns lan
    Dec 14 14:29:31 	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 14 14:29:31 	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 14 14:29:31 	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 14 14:29:31 	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 14 14:29:28 	check_reload_status: Linkup starting sk1
    Dec 14 14:29:28 	kernel: sk1: link state changed to UP
    Dec 14 14:29:28 	kernel: sk1: link state changed to DOWN
    Dec 14 14:29:28 	check_reload_status: Linkup starting sk1
    Dec 14 14:27:36 	check_reload_status: updating dyndns lan
    Dec 14 14:27:30 	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 14 14:27:30 	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 14 14:27:30 	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 14 14:27:30 	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 14 14:27:27 	check_reload_status: Linkup starting sk1
    Dec 14 14:27:27 	kernel: sk1: link state changed to UP
    Dec 14 14:27:27 	kernel: sk1: link state changed to DOWN
    Dec 14 14:27:27 	check_reload_status: Linkup starting sk1
    Dec 14 14:27:14 	check_reload_status: updating dyndns lan
    Dec 14 14:27:08 	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 14 14:27:08 	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 14 14:27:08 	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 14 14:27:08 	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 14 14:27:05 	check_reload_status: Linkup starting sk1
    Dec 14 14:27:05 	kernel: sk1: link state changed to UP
    Dec 14 14:27:05 	kernel: sk1: link state changed to DOWN
    Dec 14 14:27:05 	check_reload_status: Linkup starting sk1
    
    

    I also removed the DNS package in case that was causing issues.  Minor issue, but I noticed it was preventing pfsense from checking the current version, so not quite sure what was up with that - but figured it was best to pull the bells and whistles off till I have it stable.

    That actually seemed to make things worse and not pass any dns requests, so re-enabled… at least for the moment everything seems back working, though on the dashboard get the error: "Unable to check for updates." for pfsense errors.  Likely unrelated to whatever is causing the LAN to go down, but thought I would mention.

    Additional Update:  The version dns error was resolved by flipping the following option:

    On the general settings page...

    Check the box that says to not use the dns forwarder for pfsense.


  • Netgate Administrator

    If you're using bridging at all you should try this:
    https://forum.pfsense.org/index.php/topic,66908.msg367991.html#msg367991

    This doesn't sound like the typical 'flapping' symptoms but worth trying.

    Steve



  • Thanks for the reply, don't think that is it as I am not actually using bridging mode on pfsense - but on the wireless router, setting the static WAN IP to match what I assigned via pfsense, turning off dhcp on the wireless router, then moving the "wan" cable to the lan side of the hub.

    That way dhcp requests actually pass through to pfsense and its managing the network no matter if its wireless or wired clients.

    14hrs uptime so far, so re-set that up, and still up with pfsense issuing dhcp leases through the wireless router…

    So far, it seems that hard setting the 1000baseT full-duplex has done the trick, though that confuses me a bit.  I had always read that with gigabit, you were "always" supposed to leave to to auto.  The lan port (sk1) on the pfsense box goes into a Dell Powerconnect 2708 8 port gigabit switch (in unmanaged mode), so everything should auto to full gigabit anyways.

    14 hrs though is the longest uptime so far without having to do a hard reset...


  • Netgate Administrator

    Yep, sorry totally mis-read that. Not enough coffee yet.  ::)

    Steve



  • No worries - me either at that point!

    Well 24hrs uptime, which is a new record on this box - though going through the logs, still seeing sk1 (lan port) going through a cycle, though now it seems to come back up at least instead of staying down.

    Not sure what is causing it to go into a up/down cycle though - hard setting the type seems to at least allow it to recover, but why its cycling every few hours is weird.

    Logs below - maybe I will try swapping out the cable itself just in case.  Well another update, happened twice since posting this, though hard going down.  Swapped the cable the first time, came right back up without having to do a hard reboot.  Second time, I'm trying moving it ports on the switch side just in case.

    Not sure what else to try.  I can try configuring a second LAN interface on one of the other ports in case its a prob with that internal nic, but not sure what else to try at this point to keep it from killing the network periodically and requiring manual intervention.

    
    Dec 15 18:01:46	check_reload_status: Reloading filter
    Dec 15 18:01:46	check_reload_status: Syncing firewall
    Dec 15 09:09:50	check_reload_status: updating dyndns lan
    Dec 15 09:09:48	dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such process.
    Dec 15 09:09:43	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 09:09:43	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 09:09:43	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 09:09:41	kernel: sk1: link state changed to UP
    Dec 15 09:09:41	kernel: tes coalesced
    Dec 15 09:09:41	check_reload_status: Linkup starting sk1
    Dec 15 09:09:41	kernel: sk1: 2 link sta
    Dec 15 09:08:55	check_reload_status: updating dyndns lan
    Dec 15 09:08:48	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 09:08:48	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 09:08:48	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 09:08:46	kernel: sk1: link state changed to UP
    Dec 15 09:08:46	kernel: es coalesced
    Dec 15 09:08:46	check_reload_status: Linkup starting sk1
    Dec 15 09:08:46	kernel: sk1: 2 link stat
    Dec 15 08:53:52	php: /index.php: Successful login for user 'admin' from: 192.168.1.10
    Dec 15 08:53:52	php: /index.php: Successful login for user 'admin' from: 192.168.1.10
    Dec 15 07:48:02	php: servicewatchdog_cron.php: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid sk1' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.2.5-P1 Copyright 2004-2013 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 6 leases to leases file. Listening on BPF/sk1/00:90:fb:1e:18:67/192.168.1.0/24 Sending on BPF/sk1/00:90:fb:1e:18:67/192.168.1.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you did not get this software from ftp.isc.org, please get the latest from ftp.isc.org and install that before requesting help. If you
    Dec 15 07:48:02	check_reload_status: updating dyndns lan
    Dec 15 07:48:00	php: servicewatchdog_cron.php: Service Watchdog detected service dhcpd stopped. Restarting dhcpd (DHCP Service)
    Dec 15 07:47:55	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 07:47:55	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 07:47:55	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 07:47:53	kernel: sk1: link state changed to UP
    Dec 15 07:47:53	kernel: states coalesced
    Dec 15 07:47:53	kernel: sk1: 2 link
    Dec 15 07:47:53	check_reload_status: Linkup starting sk1
    Dec 15 06:33:47	check_reload_status: updating dyndns lan
    Dec 15 06:33:44	dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such process.
    Dec 15 06:33:40	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 06:33:40	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 06:33:40	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 06:33:40	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 06:33:38	check_reload_status: Linkup starting sk1
    Dec 15 06:33:38	kernel: sk1: link state changed to UP
    Dec 15 06:33:38	kernel: sk1: link state changed to DOWN
    Dec 15 06:33:38	check_reload_status: Linkup starting sk1
    Dec 15 05:08:42	check_reload_status: updating dyndns lan
    Dec 15 05:08:36	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 05:08:36	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 05:08:36	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 05:08:34	kernel: sk1: link state changed to UP
    Dec 15 05:08:34	kernel: ates coalesced
    Dec 15 05:08:34	kernel: sk1: 2 link st
    Dec 15 05:08:34	check_reload_status: Linkup starting sk1
    Dec 15 03:52:37	check_reload_status: updating dyndns lan
    Dec 15 03:52:30	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 03:52:30	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 03:52:30	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 03:52:30	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 03:52:30	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 03:52:27	check_reload_status: Linkup starting sk1
    Dec 15 03:52:27	check_reload_status: Linkup starting sk1
    Dec 15 03:52:27	kernel: sk1: link state changed to UP
    Dec 15 03:52:27	kernel: sk1: link state changed to DOWN
    Dec 15 03:52:27	kernel: coalesced
    Dec 15 03:52:27	check_reload_status: Linkup starting sk1
    Dec 15 03:52:27	kernel: sk1: 2 link states
    Dec 15 03:52:27	kernel: sk1: link state changed to DOWN
    Dec 15 02:34:08	check_reload_status: updating dyndns lan
    Dec 15 02:34:02	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 02:34:02	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 02:34:02	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 02:33:59	kernel: sk1: link state changed to UP
    Dec 15 02:33:59	kernel: tates coalesced
    Dec 15 02:33:59	kernel: sk1: 2 link s
    Dec 15 02:33:59	check_reload_status: Linkup starting sk1
    Dec 15 01:15:29	check_reload_status: updating dyndns lan
    Dec 15 01:15:23	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 01:15:23	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 01:15:23	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 01:15:23	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 01:15:20	check_reload_status: Linkup starting sk1
    Dec 15 01:15:20	check_reload_status: Linkup starting sk1
    Dec 15 01:15:20	kernel: sk1: link state changed to UP
    Dec 15 01:15:20	kernel: sk1: link state changed to DOWN
    
    

    Logs from the 2 hard crashes that was resolved by pulling the cable and putting it back in.

    
    Dec 15 21:58:14	check_reload_status: updating dyndns lan
    Dec 15 21:58:07	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 21:58:07	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 21:58:07	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 21:58:05	kernel: sk1: link state changed to UP
    Dec 15 21:58:05	check_reload_status: Linkup starting sk1
    Dec 15 21:58:02	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 21:58:01	php: servicewatchdog_cron.php: Service Watchdog detected service dhcpd stopped. Restarting dhcpd (DHCP Service)
    Dec 15 21:58:00	kernel: sk1: link state changed to DOWN
    Dec 15 21:58:00	check_reload_status: Linkup starting sk1
    Dec 15 21:57:00	php: servicewatchdog_cron.php: Service Watchdog detected service dhcpd stopped. Restarting dhcpd (DHCP Service)
    Dec 15 21:56:28	check_reload_status: updating dyndns lan
    Dec 15 21:56:26	dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such process.
    Dec 15 21:56:22	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 15 21:56:22	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 15 21:56:22	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 15 21:56:21	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 21:56:21	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 15 21:56:19	check_reload_status: Linkup starting sk1
    Dec 15 21:56:19	check_reload_status: Linkup starting sk1
    Dec 15 21:56:19	kernel: sk1: link state changed to UP
    Dec 15 21:56:19	kernel: sk1: link state changed to DOWN
    Dec 15 21:56:19	kernel: sced
    Dec 15 21:56:19	check_reload_status: Linkup starting sk1
    Dec 15 21:56:19	kernel: sk1: 2 link states coale
    Dec 15 21:56:19	kernel: sk1: link state changed to DOWN
    Dec 15 20:53:52	check_reload_status: updating dyndns lan
    
    

  • Netgate Administrator

    Hmm, odd. You may as well try a different NIC as it's relatively easy to do. Do you have a different switch you can try? I've seen some hardware that just doesn't play nicely together.
    Those Marvell NICs have a firmware of sorts that you could attempt to upgrade or at least check the version of. I managed it on a box i had, took ages and didn't help!

    Steve



  • Yea at this point, I am thinking it has to be a hardware handling issue - I can't point to a single setting that I have flipped or not flipped that would cause pfsense to freak out on that nic (while not also taking down the wan side).

    What is interesting is I moved the port on the internal router side, and haven't seen an error since… so perhaps a bad port on the switch?  If its flaky, that could indeed manage to scramble the port and get into that up/down error loop.

    I do have another 8 port gigabit netgear switch though that I will swap to if I see it again... but its been 8hrs without a single error showing up in the logs.  Going to let it go to a full 24hrs before making any additional configuration tweaks to make sure things have stabilized.



  • You might want to try disabling VTx and multi-cores in the BIOS. I also had system hangs with 2.1, and changing the BIOS solved the problem for me.



  • hmmm I might have to take a look at those settings as well.

    For starters I pulled the Dell 2708 switch and swapped it with my netgear 8 port gigabit switch.  Watch the logs and see if I still see that makes a differenc.  If not, roll the lan port to a different port on the pfsense box.

    Past that, good idea on trying that next.

    After that, did hear on another thread to try this:

    to add the following to /boot/loader.conf

    hw.msk.msi_disable="1"
    hw.pci.enable_msi=0
    hw.pci.enable_msix=0


  • Netgate Administrator

    I was about to dismiss the VT-x suggestion but just looked it up and found that there were some P4s that had it. None at 3.2GHz though so I doubt your box will have that option. Maybe worth disabling HyperThreading though, another easy thing to try. I doubt it will do anything.

    Disabling MSI or MSI-X could have some baring here, that does seem to help some systems.
    hw.msk.msi_disable="1" won't do anything for you as your NICs are all sk(4) not msk(4). Right?

    What are your NICs reported as in dmesg? What do they look like in pciconf? You should be able to get the 'firmware' revision from pciconf. Here's what I did on an msk interface: https://forum.pfsense.org/index.php/topic,20095.msg203322.html#msg203322

    Steve



  • For reference - these are the nics that are on that box:

    • mskc0: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
    • mskc1: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
    • mskc2: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
    • mskc3: Marvell Yukon 88E8053 Gigabit Ethernet (LED mod 1.3)
    • skc0: Marvell Gigabit Ethernet (LED mod 0.9)
    • skc1: Marvell Gigabit Ethernet (LED mod 0.9)

    Sk0 being WAN side and skc1 being LAN.  Currently not utilizing the others.  WAN side hasn't had any issues which is why I have been questioning the dell switch at this point.

    
    $ pciconf -l|grep sk
    mskc0@pci0:1:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
    mskc1@pci0:2:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
    mskc2@pci0:3:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
    mskc3@pci0:4:0:0:	class=0x020000 card=0x43401148 chip=0x436211ab rev=0x19 hdr=0x00
    skc0@pci0:5:3:0:	class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00
    skc1@pci0:5:4:0:	class=0x020000 card=0x43401148 chip=0x432011ab rev=0x13 hdr=0x00
    
    

    So it could certainly be a combo of that firmware not playing nice with the dell switch (I have heard others having issues with those), but it certainly looks like I have the same firmware you mentioned, though the rev0x19 ports are currently unused, just using the sk ones at the moment.  If switching over to this netgear switch doesn't straighten things out, I guess the next easiest thing to try is to roll the LAN side over to one of the msk ports and try that - at least before going crazy digging into firmware etc.


  • Netgate Administrator

    Ah, Ok.
    I see you're using my modified drivers. They where only intended for the Watchguard X-e box. Do they correctly drive the LEDs on your box? The LED configuration is only change from the standard driver.

    Your interfaces appear identical to those in the firebox (probably came out of the same factory in Taiwan) and in that box the sk interfaces have never given any trouble. Only the msk interfaces have a bug, which is easily worked around.

    When the box locks up do you still have serial console access?

    Steve



  • I will have to watch that next time, I would suspect I do.

    As far as the LED indicator lights on the ports, yes blink with traffic etc - so seem to be working just fine.

    The only interface that seems to have that up/down issue is sk1… though I see you saw the other post on what happened when I tried to swap devices without thinking it through and scrambled the whole lan side.

    I am back to where I was in this thread at least.

    I can bring the LAN port back up by simply pulling the cable, and plugging it back in.  Detects a hotplug event and brings everything back up on the LAN side...  the most recent event on the below log is me actually physically unplugging the cable for a moment and plugging it back in.

    
    Dec 17 07:57:31	check_reload_status: updating dyndns lan
    Dec 17 07:57:24	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 17 07:57:24	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 17 07:57:24	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 17 07:57:22	kernel: sk1: link state changed to UP
    Dec 17 07:57:22	check_reload_status: Linkup starting sk1
    Dec 17 07:57:20	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 17 07:57:18	kernel: sk1: link state changed to DOWN
    Dec 17 07:57:18	check_reload_status: Linkup starting sk1
    Dec 17 07:56:25	check_reload_status: updating dyndns lan
    Dec 17 07:56:18	php: rc.linkup: DEVD Ethernet detached event for lan
    Dec 17 07:56:18	php: rc.linkup: The command '/sbin/ifconfig 'sk1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Dec 17 07:56:18	php: rc.linkup: HOTPLUG: Configuring interface lan
    Dec 17 07:56:18	php: rc.linkup: DEVD Ethernet attached event for lan
    Dec 17 07:56:16	check_reload_status: Linkup starting sk1
    Dec 17 07:56:16	kernel: sk1: link state changed to UP
    Dec 17 07:56:16	kernel: sk1: link state changed to DOWN
    Dec 17 07:56:16	check_reload_status: Linkup starting sk1
    Dec 17 07:52:49	login: login on console as root
    
    


  • Well this time I was able to successfully migrate the LAN from sk1 to msk0.  Not sure where it went wrong last time, but I had things broken up and could watch the console if something went wrong… so was a little more prepared to "try and see what happens."

    Since the disconnect events were still happening, even with swapping out the switch, and hard setting the connection type to 1000baseT Full-Duplex, figured trying a different port (all the msk ones report a different chipset/driver config) made the most sense to try next.


  • Netgate Administrator

    Make sure you have disabled MSI for the msk NICs or you'll probably experience the 'watchdog tuimeout' errors. I would always recommend using:

    hw.msk.msi_disable="1"
    

    Since it leave msi/msi-x available for everything else on the pcibus. Using:

    hw.pci.enable_msi="0"
    hw.pci.enable_msix="0"
    

    disables it globally for everything.

    Steve



  • Ok I will give that a shot - so far 8hrs and no errors, but now I am starting to use the connection, so I will wait and see.  (Have had longer times without errors.)

    I did enter in those commands manually, and then also add them to /boot/loader.conf

    
    hw.msk.msi_disable="1"
    hw.pci.enable_msi="0"
    hw.pci.enable_msix="0"
    
    

    If sk1 is just a bad or flaky port, hopefully this will resolve things.



  • Well its been a record 24hrs without a single loss of the lan side, so I am tempted to call this issue resolved.

    At the end of all of it, I think we can conclude the errors were due to a flaky sk1 (original LAN) port on the pfsense box.  Perhaps there would have been a quicker way to reach that conclusion, though as part of the "adventure" I certainly have a far better understanding of pfsense than I did when I dropped it in as the gatekeeper of my network.

    Thanks all for the help in troubleshooting!  I am going to go ahead and mark this thread as solved for now.  I can always change it back if I am wrong and the error comes back…. but it has survived 24hrs and a stress test without kicking out a single error... so I am going to go with it! :)


Log in to reply