Hanging/Crashing every few hours



  • I recently updated pfsense to the current release ( after a few years of not doing so ), after which I experienced a pile of issues, so I wiped the drive and started fresh.  Now every few hours pfsense crashes/hangs, noticed by the ping time from any machine on the LAN to the box pfsense is on, the ping time worsens until there is no response at all, followed by "host is down", so far only a power cycle recovers pfsense.  Any ideas on where/what to look for?



  • What hardware?



  • Hardware is as follows :

    Intel(R) Atom(TM) CPU D525 @ 1.80GHz
    4 CPUs: 1 package(s) x 2 core(s) x 2 HTT threads

    1 x Transcend SO-DIMM DDR3 1600 Memory 2GB

    1 x Emphase Industrial - S1 SATA Flash Module 4 GB

    1 x Jetway 3x 1Gb Realtek LAN Module

    The hardware has run pfSense 2.1 for the past two years or so, recently I updated to the most current version



  • @kryngle:

    Hardware is as follows :

    Intel(R) Atom(TM) CPU D525 @ 1.80GHz
    4 CPUs: 1 package(s) x 2 core(s) x 2 HTT threads

    1 x Transcend SO-DIMM DDR3 1600 Memory 2GB

    1 x Emphase Industrial - S1 SATA Flash Module 4 GB

    1 x Jetway 3x 1Gb Realtek LAN Module

    The hardware has run pfSense 2.1 for the past two years or so, recently I updated to the most current version

    Are you getting any watchdog timeouts on the console screen?

    Can you share system log files before and after the crash/hang?

    Can you share any custom sysctl (system tunables) or loader.conf.local modifications? I noticed when configuring my 2.3.2 pfSense setup that many of the FreeBSD tweaks on the web are wrong for recent version of FreeBSD.



  • I have not touched sysctl.conf or loader.conf.local, and here they are :

    sysctl.conf :

    $FreeBSD$

    #  This file is read when going to multi-user and its contents piped thru
    sysctl'' to adjust kernel values.man 5 sysctl.conf'' for details.

    Uncomment this to prevent users from seeing information about processes that

    are being run under another UID.

    #security.bsd.see_other_uids=0

    loader.conf.local :

    kern.cam.boot_delay=10000

    I see no console messages, the system just grinds to a halt and becomes unresponsive, let me see if I can get the system logs before/after a crash, it may be a trick as I am using cron to auto-reboot every hour as a duct tape/bubblegum workaround

    Thank you



  • @kryngle:

    I have not touched sysctl.conf or loader.conf.local, and here they are :

    sysctl.conf :

    $FreeBSD$

    #  This file is read when going to multi-user and its contents piped thru
    sysctl'' to adjust kernel values.man 5 sysctl.conf'' for details.

    Uncomment this to prevent users from seeing information about processes that

    are being run under another UID.

    #security.bsd.see_other_uids=0

    loader.conf.local :

    kern.cam.boot_delay=10000

    I see no console messages, the system just grinds to a halt and becomes unresponsive, let me see if I can get the system logs before/after a crash, it may be a trick as I am using cron to auto-reboot every hour as a duct tape/bubblegum workaround

    Thank you

    That would be good.  Does the machine lock up at the console or does the NIC just fail?

    I have a feeling your Realtek NIC is experiencing a watchdog timeout, or something similar.



  • does this help at all :

    gateways.log:Jul 26 00:46:16 pfSense dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr XX.XX.XX.XX  bind_addr YY.YY.YY.YY  identifier "GW_WAN "

    there are also alot of these :

    dhcpd.log:Jul 27 12:49:32 pfSense dhcpd: DHCPREQUEST for 192.168.2.29 from b0:a7:37:cb:ca:73 via re0: unknown lease 192.168.2.29.



  • @kryngle:

    does this help at all :

    gateways.log:Jul 26 00:46:16 pfSense dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr XX.XX.XX.XX  bind_addr YY.YY.YY.YY  identifier "GW_WAN "

    there are also alot of these :

    dhcpd.log:Jul 27 12:49:32 pfSense dhcpd: DHCPREQUEST for 192.168.2.29 from b0:a7:37:cb:ca:73 via re0: unknown lease 192.168.2.29.

    no, the first gateways.log messages are just dpinger (gateway monitor) telling you that you lost your WAN connection

    the dhcp.log issue is also not the cause of this. Are you losing WAN or LAN or both when this issue occurs?



  • LAN stays up, WAN goes down, and pinging / communicating with pfsense is lost



  • @kryngle:

    LAN stays up, WAN goes down, and pinging / communicating with pfsense is lost

    Can you access the console next time the WAN goes down? I am pretty sure you are getting watchdog timeouts on your WAN ethernet adapter. What type of Realtek adapter are you using? How much traffic are you pushing through your WAN when the interface fails?



  • When the WAN goes down the box is hanging, accessing the webconfigurater or ssh-ing to the console does not respond.

    The system just crashed in between auto-reboots, and looking at system.log the last entery was midnight last nihgt, which does not seem correct.

    As a another clue, the system is up right now and email/web sites responding, but pings result in immediate time outs



  • will get the NIC details shortly



  • @kryngle:

    When the WAN goes down the box is hanging, accessing the webconfigurater or ssh-ing to the console does not respond.

    The system just crashed in between auto-reboots, and looking at system.log the last entery was midnight last nihgt, which does not seem correct.

    As a another clue, the system is up right now and email/web sites responding, but pings result in immediate time outs

    This is probably due to a bad Realtek driver.  Can you turn off the auto reboot? Otherwise, there is no point debugging this



  • I am way from the office ( its a small small company ) for the next week, which is why the auto-reboot is on, the webserver and email server need to be keep up ), when I get back I can turn it off, and reboot when need be



  • @kryngle:

    I am way from the office ( its a small small company ) for the next week, which is why the auto-reboot is on, the webserver and email server need to be keep up ), when I get back I can turn it off, and reboot when need be

    no worries. i've been in your position before - have a nice evening



  • Ok I am back in the office and found the following message reported twice in the console when a crash happened :

    re0: discard frame w/o leading ethernet header ( len 4294967292 pkt len 4294967292 )

    does that help?



  • I'm going to follow this post intensely, as I have a very similar problem.



  • @kryngle:

    Ok I am back in the office and found the following message reported twice in the console when a crash happened :

    re0: discard frame w/o leading ethernet header ( len 4294967292 pkt len 4294967292 )

    does that help?

    Definitely it's not good, but not always causes crash or hang.
    DO you have polling enabled?
    Does reverting back to 2.1 solves problem?



  • I do not have device polling enabled, is it worth turning on?



  • where can I find older versions to try a reversion?



  • a new piece of the puzzle -> I was comparing my old config file to the current one line by line and noticed the old config had IPv6 config type for the WAN set to DHCP6 and the LAN set to track the WAN IPv6 Interface.  I updated the new config to match and now when I see

    re0 : discard frame w/o leading ethernet header (…....
    re2 : watchdog timeout



  • @kryngle:

    I do not have device polling enabled, is it worth turning on?

    No, leave it disabled, it would not help.
    http://mirror.transip.net/pfsense/downloads/ look for old version.
    See if it helps



  • Realtek driver claiming a frame was 4GiB in size? Sounds like a driver issue or memory corruption.



  • Try do disable driver options (see picture). This is more related to poorly supported faulty driver then hardware issue, but it could be both.




  • Following the suggestion on this thread : https://forum.pfsense.org/index.php?topic=101587.msg617211#msg617211 I set WAN to flowcontrol, master and LAN to master.  This eliminated the watchdog timeout, and reduced the frequency of the discard frame from every 3-4 hours to roughly 11 - 12 hours.

    I will try disabling driver options and see what happens


Log in to reply