6100 Boot Loop w/ Traffic Shaper on PPPoE WAN
-
@stephenw10 I do not have it enabled. Century Link is my ISP and I believe they offer it, though? Unsure.
-
There are some issue with v6+pppoe that might have come into play here but seems unlikely.
Just trying to replicate it and looking for anything unusual you might have set.
-
@stephenw10 the only other thing I can think of are a handful of packages I've got installed. I might be missing something but I didn't configure a whole lot after performing the factory reset. Packages were retained obviously, so if they have the potential to cause a crash, there could be an issue there. I can do some uninformed testing when I'm feeling grouchy enough to inconvenience my users/friends. I'll make sure to connect to the console and grab some of the output this time around.
-
Yeah if you can grab the console output when it boot loops that would confirm it. I'll see if I can replicate it here.
-
@stephenw10 Here is the output from the crash: crashlog.txt
-
Ok, great. And it's the same backtrace every time?
The odd thing there is that it doesn't appear to be in the traffic shaper.
-
@stephenw10 I can't say for certain, I figured a crash log would be a crash log, so I didn't really try to give it more than one go.
Pfsense became unresponsive for quite some time after enabling a few shaper queues, so I'd have to pull the plug to reboot it, let it boot without WAN plugged in, plug in WAN, then receive that dump. So I suppose it could be related to me bringing the OS to an abrupt halt, but that leaves me still stuck on my traffic shaper woes. Afterwards, I reboot, factory reset then restore my backup that doesn't utilize traffic shaper. And just in case there are any known issues I might have not known about, I'm utilizing current versions of the following packages:
- Netgate_Firmware_Upgrade
- pfBlockerNG-devel
- Service_Watchdog
- WireGuard
-
Do you have any of the console output while it was looping? It would be good to see where it panics in the boot process and if it's the same panic as that in the crash report.
-
@stephenw10 DM'd more crash logs that were triggered by adding new queues. Unfortunately, it doesn't seem any of the changes are being committed to memory this time as things return to the most recent setting and boot normally after the crash.
-
Do you have any further details of the queues you enabled and how they were configured?
Simply enabling the shaper with a few PRIQ queues on a PPPoE WAN is not triggering it here.
-
@stephenw10 I enabled three queues each on the WAN and LAN interfaces (last forced crash happened specifically when adding the LAN ones, funny enough). Priorities 3, 7 and 13 on each side I think.
All Codel Active Queue, with one default queue on each interface. The queue limit was 50 most of the time I believe, I also did try setting it to 1000 originally when things orginally crashed. Bandwidth set to 940 mbps either interface.
-
Hmm, do you have the actual config queues section that was generated?
-
@stephenw10 not on hand, the last crash would revert the save so I wouldn't have the full thing. I can try to grab something again in a few days here.
-
OK, great. I haven't been able to replicate it here yet.
-
@stephenw10 Sorry for the delayed reply, I wish I had an easier way to test this without inconveniencing some people. Alright, I sent you a log file and a config file. I created the shaper config, saved a copy of it, then applied it. The router stalled for 15 minutes, at which point I disconnected the power and replugged it back in. The boot stalls at "boostrapping clock" for more than a handful of minutes, then I send an 'enter' keystroke to putty's console connection and the attached crash begins. After collecting the evidence I unplugged the sfp+ connection, rebooted the router again, let the console fully come up, remove the traffic shaper via php shell, plug in sfp+ connection, everythings back to normal.
I grabbed the full router config this time before forcing the crash, so please let me know if you need anything else.
-
Ok that looks like something we should be able to work with:
Bootstrapping clock... codel_should_drop: could not found the packet mtag! Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 04 fault virtual address = 0x5010410 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cd789d stack pointer = 0x0:0xfffffe00c4c04ae0 frame pointer = 0x0:0xfffffe00c4c04b60 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1
-
Opened a bug to track it: https://redmine.pfsense.org/issues/14497
-
@stephenw10 awesome, glad to hear it. And thanks for tending to this and walking me through it as well.
-
@TheGrimPickler I'm failing to reproduce this problem so far.
The backtrace suggests that the unmapped pages feature is in use. Can you confirm the value of
sysctl kern.ipc.mb_use_ext_pgs
?Also, I appear to have missed what version you're running here.
-
@kprovost via Diagnostics>Command Prompt:
kern.ipc.mb_use_ext_pgs: 0
Edit- I'm on version 23.05 of pfsense plus