Crash when enabling traffic shaper on more than 1 port
-
@avink:
Did someone test the traffic shaping on 2.4.1?
I didn’t have the time to do it this week, and next week I don’t have the time either.it needs you guys with the problem to test, no point in me testing as I dont get the crash on 2.4 anyway.
Its far easier for you to test than for someone to backport the driver, (note the fixes are also in the kernel rather than the driver).
Do a config backup.
Download 2.4.1 snapshot from pfsense servers.
Install 2.4.1
Restore the backup. -
but the main question is how stable is 2.4.1 coz on my box on 2.4 it doesnt crash and on this i can test but where it crashes is a client box in a remote location so if 2.4.1 isnt stable then it will be cause a lot of headache for me bringing it back to 2.4
-
I tried 2.4.1 with 'em' E1000 nics on ESXi.. for me it crashes..
Version 2.4.1-DEVELOPMENT (amd64) built on Sat Sep 23 19:46:59 CDT 2017 FreeBSD 11.1-RELEASE-p1
amd64 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #299 r313908+13ee0afae40(RELENG_2_4): Sat Sep 23 19:54:08 CDT 2017 root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense Crash report details: No PHP errors found. Filename: /var/crash/bounds 1 Filename: /var/crash/info.0 Dump header from device: /dev/da0s1b Architecture: amd64 Architecture Version: 1 Dump Length: 86016 Blocksize: 512 Dumptime: Sun Sep 24 19:07:58 2017 Hostname: pfsense_3.local Magic: FreeBSD Text Dump Version String: FreeBSD 11.1-RELEASE-p1 #299 r313908+13ee0afae40(RELENG_2_4): Sat Sep 23 19:54:08 CDT 2017 root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense Panic String: hfsc_dequeue: Dump Parity: 1670814533 Bounds: 0 Dump Status: good Filename: /var/crash/info.last Dump header from device: /dev/da0s1b Architecture: amd64 Architecture Version: 1 Dump Length: 86016 Blocksize: 512 Dumptime: Sun Sep 24 19:07:58 2017 Hostname: pfsense_3.local Magic: FreeBSD Text Dump Version String: FreeBSD 11.1-RELEASE-p1 #299 r313908+13ee0afae40(RELENG_2_4): Sat Sep 23 19:54:08 CDT 2017 root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense Panic String: hfsc_dequeue: Dump Parity: 1670814533 Bounds: 0 Dump Status: good Filename: /var/crash/minfree 2048 Filename: /var/crash/textdump.tar.0 ddb.txt06000014000013161763156 7103 ustarrootwheeldb:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command db:1:locks> show alllocks No such command db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe00981a3200 curthread = 0xfffff80003215560: pid 0 "em4 taskq" curpcb = 0xfffffe002b4f4b80 fpcurthread = none idlethread = 0xfffff800031df560: tid 100004 "idle: cpu1" curpmap = 0xffffffff82a3a7c0 tssp = 0xffffffff82a738f8 commontssp = 0xffffffff82a738f8 rsp0 = 0xfffffe002b4f4b80 gs32p = 0xffffffff82a7a150 ldt = 0xffffffff82a7a190 tss = 0xffffffff82a7a180 db:0:kdb.enter.default> bt Tracing pid 0 tid 100036 td 0xfffff80003215560 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe002b4f3cd0 vpanic() at vpanic+0x1a3/frame 0xfffffe002b4f3d50 panic() at panic+0x43/frame 0xfffffe002b4f3db0 hfsc_dequeue() at hfsc_dequeue+0x1a6/frame 0xfffffe002b4f3df0 tbr_dequeue() at tbr_dequeue+0xea/frame 0xfffffe002b4f3e40 vlan_start() at vlan_start+0x196/frame 0xfffffe002b4f3e90 if_transmit() at if_transmit+0x15c/frame 0xfffffe002b4f3ed0 ether_output() at ether_output+0x718/frame 0xfffffe002b4f3f60 ip_output() at ip_output+0x154a/frame 0xfffffe002b4f40b0 ip_forward() at ip_forward+0x323/frame 0xfffffe002b4f4150 ip_input() at ip_input+0x75a/frame 0xfffffe002b4f41b0 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe002b4f4200 ng_iface_rcvdata() at ng_iface_rcvdata+0x11a/frame 0xfffffe002b4f4230 ng_apply_item() at ng_apply_item+0xde/frame 0xfffffe002b4f42b0 ng_snd_item() at ng_snd_item+0x18c/frame 0xfffffe002b4f42f0 ng_apply_item() at ng_apply_item+0xde/frame 0xfffffe002b4f4370 ng_snd_item() at ng_snd_item+0x18c/frame 0xfffffe002b4f43b0 ng_ppp_rcvdata() at ng_ppp_rcvdata+0xe6b/frame 0xfffffe002b4f4430 ng_apply_item() at ng_apply_item+0xde/frame 0xfffffe002b4f44b0 ng_snd_item() at ng_snd_item+0x18c/frame 0xfffffe002b4f44f0 ng_apply_item() at ng_apply_item+0xde/frame 0xfffffe002b4f4570 ng_snd_item() at ng_snd_item+0x18c/frame 0xfffffe002b4f45b0 ng_apply_item() at ng_apply_item+0xde/frame 0xfffffe002b4f4630 ng_snd_item() at ng_snd_item+0x18c/frame 0xfffffe002b4f4670 ether_demux() at ether_demux+0x240/frame 0xfffffe002b4f46a0 ether_nh_input() at ether_nh_input+0x310/frame 0xfffffe002b4f4700 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe002b4f4750 ether_input() at ether_input+0x26/frame 0xfffffe002b4f4770 vlan_input() at vlan_input+0x1f0/frame 0xfffffe002b4f47f0 ether_demux() at ether_demux+0x156/frame 0xfffffe002b4f4820 ether_nh_input() at ether_nh_input+0x310/frame 0xfffffe002b4f4880 netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe002b4f48d0 ether_input() at ether_input+0x26/frame 0xfffffe002b4f48f0 if_input() at if_input+0xa/frame 0xfffffe002b4f4900 lem_rxeof() at lem_rxeof+0x3df/frame 0xfffffe002b4f49a0 lem_handle_rxtx() at lem_handle_rxtx+0x32/frame 0xfffffe002b4f49e0 taskqueue_run_locked() at taskqueue_run_locked+0x127/frame 0xfffffe002b4f4a40 taskqueue_thread_loop() at taskqueue_thread_loop+0xc8/frame 0xfffffe002b4f4a70 fork_exit() at fork_exit+0x85/frame 0xfffffe002b4f4ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002b4f4ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
-
Afraid this is really an uphill battle and altq is pretty much dead… Not sure what's the plan in FreeBSD but this buggy mess is not maintainable by Netgate alone.
-
Yes its reliant on FreeBSD upstream, pfSense may refine the code, but you entering a dangerous situation if you start making large en masse changes from the upstream product as this would need maintaining outside of upstream and be a pain on every upstream update.
FreeBSD made a decision to not keep up with openbsd in regards to PF and ALTQ so they made the rod on their own back with that decision, the good news is that they have started to fix FreeBSD specific PF issues now and I would hope this includes ALTQ as well.
The problem in this case there is no specific confirmed bug, just a few guys reporting crashes at high load, it needs lots of testing to dig deeper.
My advice was on the basis I am aware of scheduler panic fixes and igb panic fixes in FreeBSD 11.1 upstream hence what I posted.
-
FreeBSD made a decision to not keep up with openbsd in regards to PF and ALTQ so they made the rod on their own back with that decision, the good news is that they have started to fix FreeBSD specific PF issues now and I would hope this includes ALTQ as well.
ALTQ has been gone from OpenBSD for years. http://undeadly.org/cgi?action=article&sid=20140419151959
-
my crash log attached as follows, im also having same issue on client box when i enable shaping, sorry the log was too long to capture completely
-
A quick update from my side.
Both ALTQ and HFSC crash on 2.4.1 as well.
It doesn’t have to have any load. It crashes only seconds after reboot.Will try to upload the crash info tonight.
-
FreeBSD made a decision to not keep up with openbsd in regards to PF and ALTQ so they made the rod on their own back with that decision, the good news is that they have started to fix FreeBSD specific PF issues now and I would hope this includes ALTQ as well.
ALTQ has been gone from OpenBSD for years. http://undeadly.org/cgi?action=article&sid=20140419151959
It is gone now yes, but before it was removed, it was updated and improved over what the version in FreeBSD is.
-
Luiz fixed it.!. ;D
https://redmine.pfsense.org/issues/7879#note-3
At least after initial testing with and pushing some traffic through my queue's which seemed like a sure way to crash it before, i'm gone try and see how it holds up tomorrow :)
2.4.0-RC (amd64) built on Mon Sep 25 15:04:23 CDT 2017 FreeBSD 11.0-RELEASE-p12
Perhaps some of you guy's here can also give it another testrun.? 8)
-
i tried on 2 apu2 boxes and now works all good, can any1 point me to the actual commit that fixed this?
-
It's running for a few hours now.
No crash since the reboot, so this looks promising.