Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Super Micro C2758 crashes

    Scheduled Pinned Locked Moved 2.4 Development Snapshots
    17 Posts 4 Posters 5.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      athurdent
      last edited by

      Fatal trap 12: page fault while in kernel mode
      cpuid = 3; apic id = 06
      fault virtual address = 0x78
      fault code = supervisor read data, page not present
      instruction pointer = 0x20:0xffffffff80e33224
      stack pointer         = 0x28:0xfffffe01ed0977e0
      frame pointer         = 0x28:0xfffffe01ed097860
      code segment = base 0x0, limit 0xfffff, type 0x1b
      = DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags = interrupt enabled, resume, IOPL = 0
      current process = 12 (irq269: igb1:que 3)

      I have uploaded the crash info, this is the second time it happened in the last 24 hours.
      Uploading IP ends with .143.1.27

      Edit: kern.ipc.nmbclusters="1000000" has been configured

      1 Reply Last reply Reply Quote 0
      • chrismacmahonC
        chrismacmahon
        last edited by

        looking at both crash reports (yesterday/today), you are having different call outs.

        Potentially add: hw.igb.num_queues=1 in loader.conf

        Hope that helps.

        Need help fast? Our support is available 24/7 https://www.netgate.com/support/

        Do Not PM For Help!

        1 Reply Last reply Reply Quote 0
        • A
          athurdent
          last edited by

          Thanks, I have added this and rebooted. Will report back.

          1 Reply Last reply Reply Quote 0
          • w0wW
            w0w
            last edited by

            If "safe" hw.igb.num_queues=1 works for for you, then you can try
            hw.igb.num_queues=2
            C2758 is 8 core CPU and you have 4 intel ports, according to freebsd tuning guide https://calomel.org/freebsd_network_tuning.html

            1 Reply Last reply Reply Quote 0
            • A
              athurdent
              last edited by

              The board has been stable, so I modified the loader conf to use 2 queues. While there I also installed the latest snapshot and rebooted.
              On Sunday I upgraded to BIOS 1.1a, the board was on 1.1 before, forgot to mention that. Don't know if anything relevant has changed with this BIOS, as I cannot find any changelog / release notes from Supermicro.

              1 Reply Last reply Reply Quote 0
              • A
                athurdent
                last edited by

                OK, no luck with 2 queues.

                I can reboot the board with

                hping3 -c 100 -d 120 -S -w 64 -p 443 --flood 192.168.x.10
                

                from LAN to my DMZ (produces about 130000 states).
                I just send in the Crash Report for that.

                With only one queue I can even use "–rand-source" until the state table is full, no reboot.

                1 Reply Last reply Reply Quote 0
                • w0wW
                  w0w
                  last edited by

                  I'll do the test on similar hardware later this week.

                  1 Reply Last reply Reply Quote 0
                  • A
                    athurdent
                    last edited by

                    Thanks w0w, looking forward to your results.

                    BTW the hping3 crash looked different:

                    panic: bpf_mcopy
                    cpuid = 6
                    KDB: enter: panic

                    Edit: Just found this on Reddit, so I guess only one queue per interface should be OK for everyone just passing traffic trough?

                    https://www.reddit.com/r/PFSENSE/comments/5obhlm/what_are_the_ramifications_of_less_nic_queues/dci77px/

                    1 Reply Last reply Reply Quote 0
                    • w0wW
                      w0w
                      last edited by

                      Tested, but I could not replicate the issue.
                      How much physical RAM do you have installed?
                      Do you have any custom tunables enabled, other than you have provided in this topic?
                      Below you will see network stack tunes I have enabled:
                      #some magic numbers :)
                      kern.ipc.nmbjumbo9="20000"
                      kern.ipc.nmbclusters="1000000"
                      kern.ipc.maxsockbuf="256000000"
                      #some more igb tune for GIG links
                      hw.igb.rxd="4096"
                      hw.igb.txd="4096"
                      net.inet.tcp.syncache.hashsize=1024
                      net.inet.tcp.syncache.bucketlimit=100
                      net.isr.defaultqlimit=4096
                      net.link.ifqmaxlen=10240
                      hw.igb.rx_process_limit="-1"
                      hw.igb.num_queues=2
                      #disable flow control on all igb interfaces
                      dev.igb.0.fc=0
                      dev.igb.1.fc=0
                      dev.igb.2.fc=0
                      dev.igb.3.fc=0
                      I'll do some test with all tunables disabled and will try to utilize all bandwidth I have on WAN…

                      1 Reply Last reply Reply Quote 0
                      • A
                        athurdent
                        last edited by

                        Thanks for looking into this. I tend to stick with the defaults until I hit a problem. So I only have nmbclusters defined. I have also disabled Hardware Checksum Offloading, can't remember why, though.
                        Amongst others I'm using CARP, VLANs, the on-board SNMP and Snort. Those could also be resonsible for problems I guess.

                        1 Reply Last reply Reply Quote 0
                        • C
                          chrcoluk
                          last edited by

                          did you try with 1 igb queue? if not please test with it.

                          –edit--

                          I see you did, thanks for reporting the findings, it does seem igb currently on FreeBSD 11 has issues with multi queue.

                          pfSense CE 2.8.0

                          1 Reply Last reply Reply Quote 0
                          • A
                            athurdent
                            last edited by

                            @chrcoluk:

                            I see you did, thanks for reporting the findings, it does seem igb currently on FreeBSD 11 has issues with multi queue.

                            Thanks, do you have any FreeBSD Bugtracker or forum reference for this?

                            1 Reply Last reply Reply Quote 0
                            • C
                              chrcoluk
                              last edited by

                              sadly I did not bookmark it but will have a look later and if I find it will post here.

                              pfSense CE 2.8.0

                              1 Reply Last reply Reply Quote 0
                              • w0wW
                                w0w
                                last edited by

                                Tested with states over 196015, but nothing.  But I have found that what I am doing is a little bit wrong. The problem is that my testing machine have PPPoE link enabled on WAN testing interface and that means that WAN uses only one queue by freebsd design.
                                Sorry, I can not test it in other way, but it looks like igb(4) drivers are not so good, but at least are not so bad as stock realtek freebsd drivers;)
                                I did not researched yet, but there are some custom patches can be found over freebsd community.
                                But if your SuperMicro C2758 doing its job and you have not performance issues than just leave it  alone :)

                                1 Reply Last reply Reply Quote 0
                                • C
                                  chrcoluk
                                  last edited by

                                  The problem seems to be triggered by one of these two things or perhaps even both.

                                  FreeBSD 11 changes ref igb driver.

                                  RSS awareness has been added to the igb(4) driver (r268028)
                                  Automatic disabling of multi queue if ALTQ is enabled in kernel.

                                  The latter I think is the most likely as it seems its not fully disabling multi queue, as evident by the fact the loader tunable is defaulting to match number of cpu cores, and then the problem goes away when its forced to a 1 value.

                                  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212413

                                  That problem report also links to another couple as well.

                                  pfSense CE 2.8.0

                                  1 Reply Last reply Reply Quote 0
                                  • A
                                    athurdent
                                    last edited by

                                    @chrcoluk: Thank you!

                                    1 Reply Last reply Reply Quote 0
                                    • A
                                      athurdent
                                      last edited by

                                      Might be fixed now: https://redmine.pfsense.org/issues/7149

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.