PfSense freezing with CBQ-shapers



  • Hi,

    we have a problem with our pfSense: it freezes (no ping and no reaction on console) after a while or when we change something on some interfaces. It is the first of 4 pfSenses we want to go live with.

    We use HP Proliant DL20 Gen9 servers with 2*4port network cards (means each server has 10 network ports).
    The two 4 port network cards are HPE 366T and 366FLR with Intel I350 chips (means the igb-drivers are used). We use the tweaks mentioned here (with and without  msi and msix enabled).
    The two onboard ports use a 332i chip (means bge-drivers are used), but they aren't used in pfSense.
    The network setup is (all with fixed ip):
    igb3 = wan
    igb4 = lan1
    igb5 = lan2
    igb6 = lan3
    igb7 = lan4
    vlan3 on igb4 = lan5
    vlan4 on igb4 = lan6
    bge0 = HP ILO (not used in pfSense)
    All other ports are not used atm.

    We tested with pfSense version 2.3.2, 2.3.3, 2.3.3_1 and 2.4beta (all amd64). We also tested with and without additional packages (Cron, mailreport, pfBlockerNG, snort).

    Our pf-setup includes

    • CBQ-traffic shapers for each interface
    • IPsec-vpns between the pfSenses and ZyXel ZyWalls USG300, IKE V1
    • virtual ips on wan1 and lan1
    • dhcp server on all lan interfaces
    • dns resolver for all lan interfaces
    • snort enabled without blocking on all interfaces

    On our first try, it freezed after 2 days. And after that it keept crashing/freezing whenever we tried to use them as our live-gateway. It didn't even matter if the vpns or the internet connection were active or if we did anything on the firewall.

    On our last go-live-try two weeks ago we had crashes when we changed IPs or other things on the interfaces (i attached 2 crashlogs).
    After reading some threads about problems with CBQ-shapers, i deleted the shapers and made new ones (without wizard). After that, no more crashes but pfSense still freezes and the server needs a hard restart. And since that is not a crash, there are no newer crashlogs.

    As long as the 4 pfSenses run in our test-lab they run stable and are connected to each other with an IPsec IKE v2 VPN. Routing and everything works without problem, only when we make them the live-firewall again (by switching the ips to the network gateways and enabling the dhcp server) it crashes/freezes again.

    Maybe someone can see in the crashlog what happened or has some ideas how to avoid the freezes.

    Crash_2.3.3_1.txt
    Crash_2.4.txt



  • Seems we found the problem:
    whenever we set a traffic shaper on the vlan-interfaces (even if its only a default queue) the freezing/crashes happen. It doesnt even matter if there is traffic on that vlan.
    We deleted the shapers on these 2 interfaces and so far no freeze or crash happened. Lets hope, it stays that way :)

    Btw: we moved from CBQ to HSFC and the freezes and crashes still happened.



  • I had similar issues on an SG-1000 with HFSC applied to a fresh install using 2.4 firmware. Using the console reported a kernel panic and a reboot of the device shortly thereafter, occurring in a cycle/loop. Would be interested in a solution to this too..



  • Do you get crashes with dummynet limiters?



  • We don't use limiters atm, but when i had two limiters active, there were no problems with them. But the limiters were only active on the "normal" interface igb4, not on a vlan.

    We are using the pfSense atm without shaper/limiter on the vlans and it still works without problems.
    I think we will keep it this way as long as the users on the vlans don't generate too much traffic. If the traffic gets too high i will try with limiters.



  • I have also had a box freeze up when adjusting altq, on 2.3.3

    There clearly has to be some kind of bug here



  • @moscato359:

    I have also had a box freeze up when adjusting altq, on 2.3.3

    There clearly has to be some kind of bug here

    Back when I was frequently tweaking my queues (~2.2.x or 2.1.x ?) I'd occasionally have a freeze. I think it happened when enabling or disabling ALTQ on an interface. I forget whether the entire system froze or if it was just the GUI. I think it was the whole system, which froze for 2-3 minutes or sometimes forever.

    Maybe it was fixed? Dunno.



  • Same problem here enabled  codel on vlan and had to re install pfsense as it froze and never came out of it.



  • @Chrismallia:

    Same problem here enabled  codel on vlan and had to re install pfsense as it froze and never came out of it.

    In my case a restart was the the most impactful thing I had to do. No re-install needed.



  • Rebooting did not work for me had no choice looking at  console  it kept freezing while booting. Also setting up  traffic shaper wizard sometimes freezes pfsense but that  is solved with a reboot



  • @Chrismallia:

    Rebooting did not work for me had no choice looking at  console  it kept freezing while booting. Also setting up  traffic shaper wizard sometimes freezes pfsense but that  is solved with a reboot

    had that problem too: just unplug the interface with the vlans and it booted again. if the vlan-interface is the lan-interface: then unplug it, let it start, replug it and be fast with deactivating the shapers

    i was also able to reproduce the error on a new installed vmware:
    i gave the vm 3 adapters, 1 wan, 1 lan and 1 opt. i set up vlans and shapers on lan and opt and i could let it crash on every ip-change.



  • The real question is why does the thing freeze at all.

    It's happened to me too.

    I run a business network with 100+ users, so a firewall reboot is a bit of a pain. Have like 20 people come to me and be like WHYYY DID MY STUFF GO DOWN, WAAH

    It's not fun.

    I'm actually afraid of updating my QoS settings.

    I set fairq, with codel, and stopped.



  • @moscato359:

    The real question is why does the thing freeze at all.

    It's happened to me too.

    I run a business network with 100+ users, so a firewall reboot is a bit of a pain. Have like 20 people come to me and be like WHYYY DID MY STUFF GO DOWN, WAAH

    It's not fun.

    I'm actually afraid of updating my QoS settings.

    I set fairq, with codel, and stopped.

    I hear you. I ended up swapping pfsense at a location with a different product that have also added fq_codel  and what can I say clients  are much  happier  no freezing no things that breake  much better reporting and as I say the clients them selves felt a better quality connection, sorry for saying this but this is true for me, infacct clients are opening a new location and want this new firewall in the second location



  • just tried around a little bit and found out that it seems codel is my problem:
    if i only activate red and ecn on the vlan-interfaces there are at least no instant freezes/crashes anymore when i change the ips.

    maybe you can try that too. that way we could narrow down the reason for the freezes/crashes.



  • I can set codel sometimes, and have zero issues. I can also set hfsc sometimes and just have a total lockup.

    I've also had a crash setting ipv6 before.

    If I don't touch firewall or interface settings, the firewall runs forever without issues.



  • I'm glad I came across this thread.  I have been having similar issues as those described here.  I setup Codel on the WAN interface and all LAN interfaces, including two VLAN's (which share one physical interface).  I also ran into trouble with intermittent freezing, followed by a crash and automatic reboot.  For me this mostly occurred during the upload portion of the speed test over at DSL Reports, but it also occurred from time to time during the traffic shaping setup, and usually when configuring one of the VLAN interfaces.  I originally thought that I maybe used bandwidth values that were too high, but even by changing those to lower the freeze ups still occurred.  What made the issue interesting is that the freeze ups did not occur each time the speed test was run or a bandwidth reconfiguration was done on the interfaces, it was a bit random.  The latest one occurred today and like some of the other posters here I had trouble getting getting pfSense to boot back up after that crash.  I feared that a complete reinstall might have been in order, but then after disconnecting all the interfaces except one of the LAN interfaces it finally booted back up and (thankfully) everything was fine (thank you to the OP for the suggestion to unplug the cables).

    So it seems like the root cause of the instability might be related to enabling traffic shaping on VLAN's.  Has anyone been able to look into this some more as to why it might be occurring?  Is this a bug or is there an easy fix available?  In my case, the the VLAN's handle wireless traffic so I have the shaping disabled on them for now.  Perhaps I should not even have traffic shaping enabled on them in the first place?  In any case, hopefully I will see no more freeze up's going forward.  It would still be nice though to get this to work without freeze up's on VLAN's.  Thanks in advance for any advice and/or insight, I really appreciate it.



  • I am also having this issue on the May 18th build of 2.4.  I was previously on 2.3.4 and had no problems.  LAN interface is VLAN, traffic shaper enabled using wizard.  The only issue on 2.3.4 was that floating rules wouldn't trigger the shaper queues so I had to create a LAN rule in order for it to work, but no crashing.

    The other day I figured I would try the 2.4 beta out since it seemed to be making progress, but immediately after updating, pfsense would crash about every 1-2 minutes after booting with the shaper enabled on the interfaces and if my LAN rule was  enabled that directs traffic to the shaper queues.  If I disable the shaper from the WAN and LAN (dont need to actually remove the shaper completely) interfaces and disable the LAN rule the system is stable.

    Hardware info:
    Supermircro C2578
    8GB memory
    Intel 256GB SSD

    pfsense - 2.4.0-BETA (amd64)
    built on Thu May 18 15:36:14 CDT 2017
    FreeBSD 11.0-RELEASE-p10

    I did not install pfsense with swap space so to my understanding I cannot save a crash dump.  Anyone know of a fix?  Otherwise I might have to go back to 2.3.4 until its resolved.

    Thanks!



  • i think the reason is something like this snort-problem:

    Snort puts the interface it runs on in promiscuous mode, so this means it sees everything.  Snort uses libpcap to grab copies of the packets as they fly through the interface.  Snort is also positioned within the packet chain in such a way as to see data before the VLAN routing is applied.  So since the VLANs reside on your physical LAN interface, Snort is seeing the traffic as just coming from the LAN.

    the shaper of the lan "sees" all packets too, like snort. that way a packet might get into two (different) shapers, the one of the vlan and the one of the lan. that might cause the errors.

    has anyone tried only shapers on the vlans without shapers on the interface itself? if i'm right, that should work.



  • @Birke:

    the shaper of the lan "sees" all packets too, like snort. that way a packet might get into two (different) shapers, the one of the vlan and the one of the lan. that might cause the errors.
    has anyone tried only shapers on the vlans without shapers on the interface itself? if i'm right, that should work.

    For me my LAN interface is actually a LAG group.  I could not apply the shaper to the interface without first creating a VLAN and assigning the LAN interface to the vlan of the LAG.
    Im not sure if my issue is exactly the same though.  I have used the shaper wizard to set it up and its using HSFC queuing.  On 2.3.4 I had no issues with pfSense crashing but after updating to 2.4 I cant leave the shaper and shaper rules enabled without it crashing every 1-2 mins after pfSense starts up.  Bummer for sure.



  • Has anyone come up with a fix as this is still happening to me? I was modifying the traffic shaper tonight and the server crashed again. Seen this https://redmine.pfsense.org/issues/7351 on redmine and is they say its hardware. Is there a recommended network card that we should be using then or what as I am needing a fix or I am going to have to switch to fortigate.



  • @haaser:

    Has anyone come up with a fix as this is still happening to me? I was modifying the traffic shaper tonight and the server crashed again. Seen this https://redmine.pfsense.org/issues/7351 on redmine and is they say its hardware. Is there a recommended network card that we should be using then or what as I am needing a fix or I am going to have to switch to fortigate.

    How are you setting up the shaper ? how does it crash ? does it crash while changing a setting?  you must share more info, and what NIcs are  you using? Intel are suggested



  • @Chrismallia:

    @haaser:

    Has anyone come up with a fix as this is still happening to me? I was modifying the traffic shaper tonight and the server crashed again. Seen this https://redmine.pfsense.org/issues/7351 on redmine and is they say its hardware. Is there a recommended network card that we should be using then or what as I am needing a fix or I am going to have to switch to fortigate.

    How are you setting up the shaper ? how does it crash ? does it crash while changing a setting?  you must share more info, and what NIcs are  you using? Intel are suggested

    I was trying to get this working on an Intel NUC, single NIC using VLANs. I import my config and the NUC hard froze, no kernel panic on-screen. I reboot and within a few seconds of boot, will freeze. I then disabled the shaper in the config and re-imported and can reproduce a hard crash by enabling the shaper.

    Intel NUC, Intel (em) NIC, em0=LAN, em0.100=WAN, CBQ shapers

    So I moved on to using a Dell Optiplex 380 (Core 2 Duo) single NIC. On importing the config, I get a spam of text on-screen. I had to take a video as a photo would show unreadable overlapping text.

    See the short video at https://youtu.be/-LcRSjzZLt4

    My pfSense box at the time was a Xen PVM with Intel emulated NICs bridged to VLANs on the host. So pfSense itself wasn't aware that it was on VLANs. However every few days either the WAN/LAN would stop receiving traffic. ifconfig <nic>down then up would bring it back up, so I was determined to get this working on a physical host. Worth noting that my venture into traffic shaping is recent and the Xen HVM setup has been working fine for a couple of years.

    My next try was the same Optiplex 380, with an Intel PCI-E dual-NIC card. Now with no VLANs, instead using the switch to do the VLAN'ing. Touch wood, I've had no issues for 7 hours, time will tell if its stable.

    So to summarise:

    Xen PVM with CBQ: unstable
    Intel NUC CBQ & VLAN: unstable
    Optiplex 380 CBQ & VLAN: unstable
    Optiplex 380 CBQ no VLAN: stable – so far.

    I'm beginning to think that maybe the whole ALTQ portion of FreeBSD needs to be avoided. I even tried OPNSense in desperation, and whilst it worked, without ALTQ queues the QoS just isn't nearly as good. Now working again on pfSense, and I hope it stays stable.</nic>



  • Hi,

    I've experienced the same full crash last weekend. I've completely changed my main gateway to pfSense, with pfBlockerNG, Snort, ntopng … that worked perfectly. My last ToDo was to implement the traffic shaping with CBQ. I'm also using VLAN, my pfSense is running on ESXi 6.0U3.

    Shortly after creating the CBQ queues and during the first speedtests, suddenly my Zabbix sent me some alarms even before I noticed that all outbound connections went down. The webserver was not reacting anymore, I could not gain access via the shell, no ping working to my pfSense. Finally I fired the vSphere remote console to get direct access to the system and - it was not reacting to any keys as well!

    So the complete system just fully crashed! It did not even produce a crash log, it just froze and I needed to hard reset the VM.  :o

    I went back to a snapshot I've created before my traffic shaper configuration and started reading ... for now I'm using traffic limiter to achieve my QOS and guest VLAN limitation. These are working properly, although CBQ is more advanced.

    I'm wondering that there is nothing listed on the roadmap in terms of fixing this ... I can't imagine there are only a few people using the ALTQ stuff and VLAN's?



  • I can't imagine there are only a few people using the ALTQ stuff and VLAN's

    Perhaps, like me, they've had to find workarounds. I'd love to use the NUC as a PFSense gateway, but with a single NIC and PFSense's VLAN issue, its a no-go.

    There are some bug reports related to shaping and VLANs however I believe they are super low priority for the dev team.

    https://redmine.pfsense.org/issues/6295
    https://redmine.pfsense.org/issues/7351
    https://redmine.pfsense.org/issues/7606

    In your case, couldn't you create ESX virtual NICs and VLAN from the Hypervisor instead? I did that with Xen and worked great for a couple of years. My issues only came about once I added traffic shaping to the mix. On a side note, my old Dell Optiplex is still running okay. The NUC is sat on top of that screaming "use me" :P