pfSense locking up



  • Hi all,

    I'm new to this forum and relatively new to pfSense.
    The move to pfSense seemed pretty smooth at first, but I ran into some issues pretty quick.

    Every 4-5 days the PC running pfSense have to be rebooted (ungracefully) since it locks up. No traffic is passed through WAN and contacting the web interface from LAN does not work (I get the login screen, but it's unresponsive).

    Since this box is placed quite far away from me (50km), it's a pain to restart it when it does this. During normal operation I can access it fine over the VPN I have set up.

    The logfile will fill up with this (which I have seen several threads about, but they have been able to access it from LAN).

    Jul 27 21:50:13	kernel		arpresolve: can't allocate llinfo for [masked ISP gateway IP] on re1
    Jul 27 21:50:10	kernel		arpresolve: can't allocate llinfo for [masked ISP gateway IP] on re1
    Jul 27 21:50:08	kernel		arpresolve: can't allocate llinfo for [masked ISP gateway IP] on re1
    

    I have a static IP from my ISP (via DHCP), so nothing has been changed in the DHCP lease. MTU is 1500 (I've seen that a low MTU has been the cause for others).

    pfSense version 2.4.4-RELEASE-p3 (amd64).

    Could the mentioned error cause it to lock up this way, or should I hunt for something else?
    Also, what could be the cause to the error in the first place? I know my ISP is not flaky (not a single drop in almost two years on this fibre link).

    Thanks for any help.

    (Hope I posted this in the correct category).



  • @omegatech said in pfSense locking up:

    arpresolve: can't allocate llinfo for

    Compare you situation with these https://www.google.com/search?client=firefox-b-d&channel=trow&q=pfsense+arpresolve:+can't+allocate+llinfo+for&spell=1&sa=X&ved=0ahUKEwiuzaWK-NbjAhWIHRQKHSoWDf4QBQgsKAA&biw=1920&bih=916

    Also : be careful : Realtek (== re1) inside. That NIC is known for spectacular results. If your second NIC isn't Realtek, change the LAN and WAN NIC's.


  • Netgate Administrator

    Are you able to ssh into the firewall or use a local console directly when this happens?

    I would also guess it's a Realtek driver issue though. Check the logs for watchdog timeouts against either re NIC.
    You might try this: https://forum.netgate.com/topic/135850/official-realtek-driver-binary-1-95-for-2-4-4-release
    That has worked well for some Realtek users.

    Steve



  • Thank you for the replies, much appreciated.

    @Gertjan: That's basically the same search as I've already done and picked up some tips here and there, but unfortunately it seems like none of the cases have been conclusive.

    What I've done so far, is to set the Gateway Monitor IP to 8.8.8.8 instead of the ISP gateway. I've also changed the ICMP payload to 1 instead of 0.

    I can see that the Realtek NICs have some problems now and then, so I should probably order a used Intel dual or quad port NIC. Could be nice with some more ports in the future too.

    @stephenw10: I tried to SSH into it from the LAN, but I realized I forgot to enable SSH. It's activated now though, so I will try that next time (if) it happens.

    Thanks for the link about the drivers, will have a look at it.

    Other than that, while not ideal by any means, I may try to set the WAN settings static since I know the DHCP values won't change anyway.


  • Netgate Administrator

    If you can I would swap out those Realtek cards for Intel based NICs.

    Steve



  • Yes I will swap them out as soon as possible. I believe the Intel NC360T would be a good choice. They are readily available for a reasonable price on the used market.



  • A short update:
    I changed the drivers as @stephenw10 mentioned. All seemed quite fine for a couple of days, but it started crashing on me again.

    I have managed to be locally at the router a couple of times when it failed, and the fault is (probably) isolated.
    It is the extra Realtek NIC on my WAN side that fails (re1 in this case). It suddenly believes the media is connected at 10Mbps, not 1000Mbps. No communication is possible over that link. I also placed a switch between the fibre modem and the re1 NIC to see which side it failed at, and indeed it is the Realtek.

    A soft restart won't cure it so I have to pull the power to be able to get it back to normal (even +5Vsb will keep it in fault condition, probably due to WoL capabilities). It can run fine for a day or it can err out during boot (with no start of pfSense). LAN side works ok via SSH even if it failed and the web admin will eventually come up.

    A new dual port Intel NIC should arrive next week, so hopefully that will fix things.


  • Netgate Administrator

    Well at least you have diagnosis and a fix. Not much else you can do there but swap out the card.

    Steve


Log in to reply