Upgrade 2.4.3 to 2.4.3_1 error in firewall rules



  • I recently upgraded my pfsense SG-2440 to 2.4.3_1. When the pfsense box came back up it started coming up with error messages like

    There were error(s) loading the rules: /tmp/rules.debug:273: syntax error - The line in question reads [273]: pass out route-to ( pppoe0 x.x.x.x ) from to !/ tracker 1000006967 keep state allow-opts label "let out anything from firewall host itself"
    @ 2018-05-15 07:32:39

    where x.x.x.x is the gateway IP on my pppoe interface. I'm struggling knowing where to start, obviously it's something with the firewall… it causes this error everytime the rules are reloaded. Why didn't this happen before the upgrade.

    My setup is multi WAN pppoe on one interface with multiple ipv4 IP addresses and the other interface which is DHCP.

    Thanks
    Tim



  • Same here, updated the secondary cluster member first and I get:

    pass out  route-to ( bce1.676 194.***.***.254 ) from  to !/ tracker 1000003813 keep state allow-opts label "let out anything from firewall host itself"
    

    How would we fix this?



  • Got the same issue with  2.4.3_1.

    Looks like it is to do with NAT rules and breaks NAT.

    I also have openVPN running, the NAT IP's are also CARP VIP's, or something other combination that causes this rule to be created.

    If you edit the file '/tmp/rules.debug' and remove the following line NAT works again;

    pass out  route-to ( igb0 xxx.xxx.xxx.xxx ) from  to !/ tracker 1000004864 keep state allow-opts label "let out anything from firewall host itself"

    However, pfsense keeps putting back in the rule and breaking NAT.



  • Forgot to mention that I also removed/commented out the line and loaded the ruleset by hand. CARP was still broken afterwards, as some of the IPs on the secondary showed to master and the primary also thought it was master.
    Too bad we are not able to just revert to the older pfSense version like we could with previous releases.



  • These show stopping bugs are happening all too often.

    Out of the 20 or so pfSense boxes every other update will brick at lease one or more.

    pfSense need to bring back the way to revert to the previous release.



  • This bug also affects me. Strangely, if I activate the backup router by putting the primary into persistent maintenance mode, NAT and all other router services work fine through the backup.



  • After a little digging around on github, it looks like there were changes in src/etc/inc/interfaces.inc recently. I can't be sure, but it looks like what is essentially an empty string is getting validated as an IPv4 network address.

    Here's the diff: https://github.com/pfsense/pfsense/compare/master…RELENG_2_4_3



  • I am looking also to debug this as I am affected.



  • According to this bug report, there is a work around: https://redmine.pfsense.org/issues/8518

    Delete and re-add the default gateways.



  • I confirm the fact I had to delete an IPV6 Default Gateway (on the interface causing the wrong rule) to get the system back to normal behaviour. This interface also have an IPV4 gateway that is NOT a default gateway.

    After recreating the default IPV6 gateway, problem did not occur anymore.

    The wrong rule was created by line 3623 of filter.inc



  • Erratum.
    Problem still here.
    investigating



  • I tried deleting and re-adding the default IPv4 and IPv6 gateways, the CARP VIP's and editing the '/tmp/rules.debug'.

    While all these work they only work for a short while as any changes cause pfSense to add the rule back in again.

    I have now rolled back the effected pfSense boxes to 2.4.2 which does not have this issue.



  • I understand what happening, I have an interface using IPV4 + IPV6 in a cluster configuration.
    I have a CARP VIP for IPV4 and IPV6

    It looks like the code parsing the VIPs misunderstand the IPv6 CARP VIP as a ipV4 VIP so it enter the ipv4 loop and because " $gw = get_interface_gateway($ifdescr)" returns the IPV4 GW, then tries to generate the pass out rule on empty values…

    I removed my IPV6 CARP on the WAN interface and there is no more problem.



  • checking the diff between 2.4.2 and 2.4.3 P1

    before
    if (is_ipaddrv4($gw) && is_ipaddrv4($ifcfg['ip'])) {

    After
            if (is_ipaddrv4($gw) && is_ipaddrv4($ifcfg['ip']) && is_subnetv4("{$ifcfg['sa']}/{$ifcfg['sn']}")) {



  • Good to see that you have been able to track down the cause of the issue.

    I presume that the next release will have a fix for this?



  • All is related to this bug  https://redmine.pfsense.org/issues/8408

    https://github.com/pfsense/pfsense/pull/3924

    looks like not eveything merged to current ?


  • Rebel Alliance Developer Netgate

    The commit from that PR is in master and RELENG_2_4_3, and is in 2.4.3-p1.

    I could reproduce the problem before that commit but not now. What exactly does your configuration look like (config.xml entries, at least) for the affected VIPs and gateway?

    I wanted to put some extra safety belts around that rule to make sure it couldn't be blank but following through the code it already appeared to be validated higher up.


  • Rebel Alliance Developer Netgate

    Since I can't reproduce this still, and I don't have any config samples to work from, try this patch:

    https://gist.githubusercontent.com/jim-p/f5fa7cf5fdfc8166f54394262386682f/raw/1ff237a9a52cef67c03db532c80fcc757969e711/8518.diff

    That doesn't fix the root cause but it will prevent the broken rules from being placed in the ruleset.

    It's still not clear how a blank entry is making into that v4 VIP array in the first place since it explicitly tests for v4 or v6 when making the array. That's why I need to see the config samples so I can get closer to the root of the problem.



  • Hi jimp,

    I just send you a PM with my config snippets. I figured you might need them unredaced, so I did not post them here.

    Thanks for looking into this!



  • just did the same :-)


  • Rebel Alliance Developer Netgate

    ok, that did the trick.

    Somehow when a PR was merged back from master to RELENG_2_3 it missed part of a commit that led to this happening. The safety belt patch above also helps, so I committed that as well.

    I couldn't reproduce it initially because I was trying on 2.4.4 and the commit was OK there (master), but it was wrong on 2.4.3-p1.

    This is the real fix:
    https://github.com/pfsense/pfsense/commit/c9159949e06cc91f6931bf2326672df7cad706f4

    This is the safety belt:
    https://github.com/pfsense/pfsense/commit/63b2c4c878655746f903565dec3f34b3d410153f

    You can apply the first (or both) via the system patches package and that should get things back to normal.



  • Will try this tomorrow !

    Thank you!



  • @jimp:

    This is the real fix:
    https://github.com/pfsense/pfsense/commit/c9159949e06cc91f6931bf2326672df7cad706f4

    This is the safety belt:
    https://github.com/pfsense/pfsense/commit/63b2c4c878655746f903565dec3f34b3d410153f

    You can apply the first (or both) via the system patches package and that should get things back to normal.

    I've applied this as you described and my system is working again.

    Thank you, and the other contributors to this thread, for fixing this so quickly.

    Thanks
    Tim



  • Thanks! I applied the "real" fix only, rules loaded fine after that. I had to reboot the system to get CARP to work again without problems, though. Without a reboot the secondary still showed "Master" for some IPs (IPv4 and also IPv6, WAN and LAN). I could not find a pattern in this.



  • I confirm the real fix seems to does the trick.  :D ;)

    Thank you Jim.



  • What is the process for upgrading to 2.4.4 in the future?  Will I need to revert the patch and then issue the upgrade or will I simply just upgrade to the next release as usual?

    Do most people wait a while to upgrade usually?  I'm kind of nervous now to do upgrades given this bug which basically broke NAT.

    I will say though that I should have noticed the bug on the backup prior to upgrading the master, lessons learned.


  • Rebel Alliance Developer Netgate

    @rfowler:

    What is the process for upgrading to 2.4.4 in the future?  Will I need to revert the patch and then issue the upgrade or will I simply just upgrade to the next release as usual?

    Do most people wait a while to upgrade usually?  I'm kind of nervous now to do upgrades given this bug which basically broke NAT.

    This bug was never present in 2.4.4, only 2.4.3-p1. You can upgrade as usual. The patch won't reapply itself automatically unless you went out of your way to set it that way, and since the patch won't apply on 2.4.4 anyhow it wouldn't matter if you did.



  • I have not yet upgraded and am unsure how to proceed. Is this a niche issue or is every configuration affected? Will this be addressed in a 2.4.3_2 release, or would I be waiting for 2.4.4?


  • Rebel Alliance Developer Netgate

    That is unclear yet. Apply the patch with the System Patches package and you will have the fix immediately and won't have to upgrade to get it (or wait for a release)



  • Problem is not solved. The patches are just working to solve the problem with rules.debug. But I have a scenario, where OpenVPN is used and when the error occurs, the IPv4 traffic is blocked over the tunnel, before I installed the patches. After patch Installation, the error message about rules.debug disappeared, but OpenVPNs IPv4 traffic is still blocked (seems to be that ruleset isnt completely loaded).

    The Problem came from an IPv6 Virtual-IP, which I added to the WAN Interface. I have tested this, with an without the patch. If I remove the IPv6 virtual IP the ruleset is completly loaded and OpenVPN works out of the box. If I add the IPv6 virtual IP again, the error occurs on the unpatched box and OpenVPNs IPv4 traffic is blocked on both boxes (no changes in rulesets and yes, routing works).

    Please have a look at IPv6 virtual IP handling.


  • Rebel Alliance Developer Netgate

    @ollli said in Upgrade 2.4.3 to 2.4.3_1 error in firewall rules:

    Problem is not solved. The patches are just working to solve the problem with rules.debug. But I have a scenario, where OpenVPN is used and when the error occurs, the IPv4 traffic is blocked over the tunnel, before I installed the patches. After patch Installation, the error message about rules.debug disappeared, but OpenVPNs IPv4 traffic is still blocked (seems to be that ruleset isnt completely loaded).

    The Problem came from an IPv6 Virtual-IP, which I added to the WAN Interface. I have tested this, with an without the patch. If I remove the IPv6 virtual IP the ruleset is completly loaded and OpenVPN works out of the box. If I add the IPv6 virtual IP again, the error occurs on the unpatched box and OpenVPNs IPv4 traffic is blocked on both boxes (no changes in rulesets and yes, routing works).

    Please have a look at IPv6 virtual IP handling.

    I have, and all problems that could be identified so far have been fixed. If something else is happening in your case, you have not provided nearly enough detail to speculate if it's even related to this.

    Try on a 2.4.4 snapshot and see if the problem can be replicated there. If so, try to find a minimal configuration that can replicate the problem exactly so we can track it down. Just having an IPv6 VIP is not enough to trigger it.



  • Hello I have same problem. When i add IPv6 Virtual IP in CARP there is added this line to /tmp/rules.debug

    pass out  route-to ( em0 XX.XX.XX.XX ) from  to !/ tracker 1000005913 keep state allow-opts label "let out anything from firewall host itself"
    

    This is the line causing syntax error. As you can see there is missing source and destination IP addresses.
    XX.XX.XX.XX is IPv4 address of default gateway.
    Recover steps are to disable IPv6 on network adapter and reenable it. Until I add IPv6 CARP VIP, everything works fine.


  • Rebel Alliance Developer Netgate

    @dano-pogac said in Upgrade 2.4.3 to 2.4.3_1 error in firewall rules:

    Hello I have same problem. When i add IPv6 Virtual IP in CARP there is added this line to /tmp/rules.debug

    The real fix is posted farther up in the thread. No need for workarounds.



  • Just wanted to vent my frustration in that I was impacted by this problem as well. Worst part was I have an HA setup, I had updated the Backup unit, and was going to upgrade the primary outside of business hours. Before I could apply the update to the primary I had a hardware failure, so when all traffic moved over to my backup unit we lost connectivity because of this bug.

    At the time my fix was to change over to a development/Snapshot release. I really wish they would include an easy option to revert versions. Wasted a Saturday afternoon at the office trying to figure things out when typically I could have waited until Monday to troubleshoot the hardware failure.



  • hi, this caused me mass headaches too. Ive reverted back to 2.4.3 no p1. Didnt fancy patching things. Id have pulled the release and re-issued as I noticed that even with the issue the firewall was still passing traffic but was just completely open in some instances. :/