PF was wedged/busy and has been reset.


  • I have an SG-2220 that has just been upgraded from 2.2.6 to 2.3. Following the upgrade, I am receiving the following two notices each time the unit boots:

    
    pf_busy
    
     PF was wedged/busy and has been reset.	@ 2016-05-03 11:58:53
    
    Filter Reload
    
     There were error(s) loading the rules: pfctl: DIOCXCOMMIT: Device busy - The line in question reads [0]: @ 2016-05-03 11:58:54
    
    

    The system log entries are:

    
    May 3 11:58:53	php-fpm	272	rc.newwanip: New alert found: There were error(s) loading the rules: pfctl: DIOCXCOMMIT: Device busy - The line in question reads [0]:
    May 3 11:58:53	php-fpm	272	rc.newwanip: New alert found: PF was wedged/busy and has been reset.
    
    

    Both WAN addresses (IPv4 & IPv6) are DHCP. If I disable IPv6 on the WAN interface, the problem still occurs.

    The only packages that are installed are AutoConfigBackup and Service_Watchdog.

    Suggestions welcome.


  • That consistently happen on every reboot? I've seen it on rare occasion but never with enough reproducibility to track it down.

    It's not harmful, but there's something that should be tracked down and fixed.


  • Can't guarantee every time, but pretty close. I rebooted the unit a lot today as part of testing. I'm not sure of the exact number, but somewhere around 20 times. Only twice did it not happen.


  • Hi Folks,
    i've got the same issue here. We have 3 SG-2220 here. All three of them give this message after reboot, but as far as i can tell its also not every reboot.

    @cmb: is it possible, that its just a task taking longer on these kind of machines, causing a timer to hit and causing this error without actually having a problem?


  • Hi Folks,

    nearly*** the same here: I got these messages right after reboot - ratio: 3 reboots / 2 times (1 time without a message).

    Cheers
    Dora

    ***Nearly means: I'm running pfsense 2.3.1, but on pcengine's apu2c4


  • I've seen wedged notification a few times too.  But not consistently.

    2.3-RELEASE (i386)
    built on Mon Apr 11 18:12:06 CDT 2016
    FreeBSD 10.3-RELEASE

    Intel(R) Pentium(R) 4 CPU 2.66GHz, 2 GB RAM


  • Anything I can offer to help track this down?


  • Could you email me a sanitized copy of your config, Denny? Via status.php would be fine, that'll trim out the most sensitive parts.


  • @tga:

    @cmb: is it possible, that its just a task taking longer on these kind of machines, causing a timer to hit and causing this error without actually having a problem?

    I don't think it's hardware-specific. It could be something where the slower the CPU the more likely you are to hit it, but a 2220 CPU runs circles around an ALIX and I've never seen it on an ALIX.

    Anyone else that would be willing to share a config backup, please email to cmb at pfsense dot org with a link to this thread.


  • Done.


  • Thanks Denny.

    I have a system internally now that's replicating that issue, though only with ALTQ and OpenVPN enabled. I think the root cause for you and others is probably the same though. Could you try adding:

    if (platform_booting()) {
       return;
    }
    
    

    to the top of /etc/rc.newwanip and rc.newwanipv6 under the require_once lines, and see if that stops happening?


  • Yes, that appears to fix it. I tested ten times or so with and without the change. With the platform_booting() check in /etc/rc.newwanip, the problem did not occur. Without the platform_booting() check in /etc/rc.newwanip, the problem occurred each time. The change for IPv6 had no effect in my testing.


  • I've also inserted the conditional-reboot-return-code and tested some reboots:
    The messages are gone and the problem seems to be solved. Thanks!

    This is the (hopefully correct applied) patch I've used:

    
    --- /etc/rc.newwanip.orig       2016-05-06 09:04:29.768137000 +0200
    +++ /etc/rc.newwanip    2016-05-06 09:05:24.885368000 +0200
    @@ -43,6 +43,11 @@
     require_once("IPv6.inc");
     require_once("rrd.inc");
    
    +if (platform_booting()) {
    +   /* see https://forum.pfsense.org/index.php?topic=111236.0 */
    +   return;
    +}
    +
     function restart_packages() {
            global $oldip, $curwanip, $g;
    
    --- /etc/rc.newwanipv6.orig     2016-05-06 09:05:57.001394000 +0200
    +++ /etc/rc.newwanipv6  2016-05-06 09:06:12.371604000 +0200
    @@ -44,6 +44,11 @@
     require_once("services.inc");
     require_once("rrd.inc");
    
    +if (platform_booting()) {
    +   /* see https://forum.pfsense.org/index.php?topic=111236.0 */
    +   return;
    +}
    +
     function restart_packages() {
            global $oldipv6, $curwanipv6, $g;
    
    

  • Great, thanks for the feedback.

    @dorap:

    This is the (hopefully correct applied) patch I've used:

    
    --- /etc/rc.newwanip.orig       2016-05-06 09:04:29.768137000 +0200
    +++ /etc/rc.newwanip    2016-05-06 09:05:24.885368000 +0200
    @@ -43,6 +43,11 @@
     require_once("IPv6.inc");
     require_once("rrd.inc");
    
    +if (platform_booting()) {
    +   /* see https://forum.pfsense.org/index.php?topic=111236.0 */
    +   return;
    +}
    +
     function restart_packages() {
            global $oldip, $curwanip, $g;
    
    --- /etc/rc.newwanipv6.orig     2016-05-06 09:05:57.001394000 +0200
    +++ /etc/rc.newwanipv6  2016-05-06 09:06:12.371604000 +0200
    @@ -44,6 +44,11 @@
     require_once("services.inc");
     require_once("rrd.inc");
    
    +if (platform_booting()) {
    +   /* see https://forum.pfsense.org/index.php?topic=111236.0 */
    +   return;
    +}
    +
     function restart_packages() {
            global $oldipv6, $curwanipv6, $g;
    
    

    That's correct, yes. It won't be what the actual fix is, but it'll workaround the issue and be fine in most circumstances.


  • hmm, some minutes ago I rebooted (only) the switch all Ports (and vlans) from my pfsense (2.3.1) box are connected to, and pfsense reports again (the above patch is still applied):

    
    pf_busy
    
        PF was wedged/busy and has been reset. @ 2016-05-08 13:04:57
        PF was wedged/busy and has been reset. @ 2016-05-08 13:05:58
    
    Filter Reload
    
        There were error(s) loading the rules: pfctl: DIOCXCOMMIT: Device busy - The line in question reads [0]: @ 2016-05-08 13:04:58
        There were error(s) loading the rules: pfctl: DIOCXCOMMIT: Device busy - The line in question reads [0]: @ 2016-05-08 13:05:59
    
    

  • That change only affects hitting that issue during booting, as that was the only time we seemed to see it occur.

    The latest 2.3.1 has a fix for the booting issue, which is what everyone else here was hitting. That appears to be the same root cause, but upon linkup of multiple NICs at exactly the same time. There's a larger architectural issue to be addressed there in the future to fully address the root issue. In the mean time, the message is ugly, but doesn't impact functionality.