Bug in traffic shaper's configuration saving corrupts pfsense's tables.



  • Running pfsense 2.1 on a production system, with pfblocker, snort, apc ups monitor.

    Went through the traffic shaping wizard twice on separate setups, and upon applying at the end of the wizard, then editing a floating rule created by the wizard and applying changes, all pfsense tables were corrupted. For example, snort2c was not found. A reboot of the system makes all the tables dissapear, the pfblocker table did not exist, tables page shows a blank dropdown. The result no traffic going through the firewall, only inter-interface traffic was ok (lan1 to lan1). Routing tables seem to also be affected.
    As I said this was reproduced on 2 separate setups, with one setup being a CARP cluster. The corruption was replicated to the secondary system as well, so it's something in the configuration that gets replicated to the secondary system as well.

    The solution for this problem was removing the shaper and manually deleting the floating rule left behind (the one that was edited manually). After this, the network came back up with everything is running as it should. Tables exist and are populated.

    Anyone else run into this problem?



  • I am having the exact same problem, totally baffled… All outbound traffic stops after I run through the shaper wizard. I can ssh in to pfsense and ping/tracert anything on the internet with 100% success, but nothing can cross from the LAN to the WAN, I can also use the ping diagnostic tool in the gui with 100% success.

    I simply removed the shaper and traffic started routing again, re-running the shaper wizard causes the same blocking again.

    This was an upgraded 2.0.3 system, I have reset to factory defaults using the wizard and still same issue.

    Here are the shaper rules from my config (i use a firewall alias for "VoipGateways"):

    	 <ezshaper><step1><numberofconnections>1</numberofconnections></step1> 
    		 <step3><enable>on</enable>
    			<provider>Asterisk</provider>
    
    <address>VoipGateways</address>
    
    			<download>600</download>
    			<downloadspeed>Kb</downloadspeed>
    			<conn0upload>600</conn0upload>
    			<conn0uploadspeed>Kb</conn0uploadspeed></step3> 
    		 <step2><downloadscheduler>HFSC</downloadscheduler>
    			<conn0uploadscheduler>HFSC</conn0uploadscheduler>
    			<conn0upload>2700</conn0upload>
    			<conn0uploadspeed>Kb</conn0uploadspeed>
    			<conn0download>50000</conn0download>
    			<conn0downloadspeed>Mb</conn0downloadspeed>
    			<conn0interface>wan</conn0interface></step2></ezshaper> 
    


  • anyone have any ideas as to how to track this down?



  • I found a solution for my issue that I think is similar. I was using the shaping wizard for prioritizing VoIP traffic, then added bitTorrent just for testing with.
    I found under Firewall -> Rules -> Floating
    the DiffServ/Lowdelay/Upload rule pointing at qVoIP had no port specified.
    I edited that floating rule with Destination Port Range pointing at the sip port
    saved, cleared states and my traffic started to work correctly.

    An aside, my internet traffic was working already, it was only my VoIP traffic that was failing to pass, and this helped.

    Hopefully that helps,
    Cheers,
    ~Kc



  • The reason I mentioned the bittorrent rule was i used it for comparison in the floating rules. It pointed at the lower priority queue but also specified the incoming and outgoing port. That led me to believe the SIP rule needed to match the SIP port.
    I haven't submitted a ticket as i'm not sure this is a bug or not. Let me know if this works for you, it may be a work around, or I may have gotten lucky.
    Thanks,



  • I think you just got lucky. In my case, as I said in my OP, all pfsense tables were corrupted.



  • Hi, same problem here today, yesterday i setup the traffic shapper and today i have no out traffic, after remove the traffic shapper the net beging work well. PF 2.2.4



  • Having the same issue. Saved traffic shaping for VoIP, everything was great, came back Monday and LAN traffic not going out to WAN. Anyone had any progress?



  • Yup similar issue here as well.

    Ill configure the traffic shaper, it'll work for a few days and then it randomly stops working and halts all net traffic until I remove it.



  • You are all referring to the traffic-shaping wizard, yeah?

    I have no troubles, but I do not use the wizard.



  • I've experienced the same issue. In my case it appears to only be an issue when WAN is configured using DHCP and the Traffic Shaper is configured using the wizard. To verify the issue, try running this from Diagnostics > Command prompt when the issue is occurring and the pfSense is not passing traffic from internal networks:

    Diagnostics > Command prompt > pfctl -f /tmp/rules.debug
    $ pfctl -f /tmp/rules.debug
    bandwidth for qInternet higher than interface
    /tmp/rules.debug:63: errors in queue definition
    parent qInternet not found for qACK
    /tmp/rules.debug:64: errors in queue definition
    parent qInternet not found for qP2P
    /tmp/rules.debug:65: errors in queue definition
    parent qInternet not found for qVoIP
    /tmp/rules.debug:66: errors in queue definition
    parent qInternet not found for qOthersHigh
    /tmp/rules.debug:67: errors in queue definition
    parent qInternet not found for qOthersLow
    /tmp/rules.debug:68: errors in queue definition
    bandwidth for qInternet higher than interface
    /tmp/rules.debug:73: errors in queue definition
    parent qInternet not found for qACK
    /tmp/rules.debug:74: errors in queue definition
    parent qInternet not found for qP2P
    /tmp/rules.debug:75: errors in queue definition
    parent qInternet not found for qVoIP
    /tmp/rules.debug:76: errors in queue definition
    parent qInternet not found for qOthersHigh
    /tmp/rules.debug:77: errors in queue definition
    parent qInternet not found for qOthersLow
    /tmp/rules.debug:78: errors in queue definition
    pfctl: Syntax error in config file: pf rules not loaded

    This prevents the rules from loading correctly, thus preventing traffic from being correctly NAT'd. This is why removing the shaper resolves the issue. Can anyone else confirm that they see the same errors and also please let us know how you configure your WAN interface (DHCP, static, etc).

    The config was built with the WAN interface having a 1Gbps connection and the bandwidth of the uplink configured as 100 Mbps. The interface speed has not changed, so it would seem odd that such an error would be reported since 100 Mbps < 1Gbps.



  • parent qInternet not found for qVoIP

    The WAN doesn't have a qInternet. Whatever in the wizzard is setting up the queues should just remove the reference to qInternet and that should fix it.



  • I've actually narrowed down the issue on my side. It has nothing to do with the WAN being configured DHCP - that was just a coincidence. The issue is that if the traffic shaper was configured with a set of interfaces and any one of them is down, the rule set breaks. I believe this is because a down interface is assigned 0 bandwidth and thus the rule exceeds the available bandwidth for the interface (as per the error message). This may make sense in theory, but in practice segments of a network could be down for a variety of reasons and thus booting into this bad state seems like not the desired behavior (and especially losing all inbound filtering on WAN and leaving it in a fully open state). As such, I don't believe my issue (which is deterministic in nature and due to an interface being down) is the same as the ones described/reported by others, which seems to be more like:

    https://redmine.pfsense.org/issues/4856

    For anyone that does encounter the issue, I'd recommend trying to run the pfctl command I posted above to capture the error output (if any), as it'd likely be useful to the pfSense team in resolving the issue.



  • I had the weirdest this today.  All of a sudden at 12:40 this afternoon, our internet link went down. wasn't our supplier.  the traffic shaper was running perfectly for over a week.  the weird part was that no traffic would pass, nothing hit the firewall rules, however users VPN was working fine, ie they could get logged in.  After rebooting a couple of times I saw these weird entries in the log.

    Oct 21 13:31:02 php: rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-queues.rrd -t qLink:qInternet:qACK:qP2P:qVoIP N:U:U:U:U:U' returned exit code '1', the output was 'ERROR: tmplt contains more DS definitions than RRD'
    Oct 21 13:31:02 php: rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-queuedrops.rrd -t qLink:qInternet:qACK:qP2P:qVoIP N:U:U:U:U:U' returned exit code '1', the output was 'ERROR: tmplt contains more DS definitions than RRD'
    Oct 21 13:31:02 php: rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-packets.rrd N:U:U:U:U:U:U:U:U' returned exit code '1', the output was 'ERROR: expected 4 data source readings (got 8) from N:U:U:U:U:U:U:U:U'

    I saw the words lan-queues and thought traffic shaper maybe?  lets try turning it off.  Bang all traffic starts flowing again.  I am wondering what happened and I am not turning it on again.

    Frank



  • Those are graphs that pfSense records to internally. Although they may be a symptom of the same root cause, I don't think anything to do with RRD would be the cause itself. Did you happen to try running the pfctl command?

    Diagnostics > Command Prompt > pfctl -f /tmp/rules.debug

    When you have an issue with traffic flow, try running that command before clearing the shaper config.

    There is a known issue with HFSC (and maybe other schedulers) where the Wizard will create a configuration that has a blank value for the root queue of each non-WAN interface and the "bandwidth" is automatically determined, presumably based on the link speed. Unfortunately, if the interface is down, the link speed (I believe) is 0, thus resulting in an error (which you would see with the pfctl command) about a sub-queue specifying bandwidth greater than the interface.

    See https://redmine.pfsense.org/issues/5325 for reference.



  • Yes, however at the time my entire network had stopped working, prime time and all.  First and foremost that I needed to get the link back up.  I intend to revisit this again, I have another pfsense router coming in so I will be able to do some testing.  We are a software development company and Internet is important lol.  I am going to look at the blank value when I redo this.


Log in to reply