Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    New Version 2.4.4 - Interface Error --> aq_add_macvlan err -53, aq_error 14

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    59 Posts 11 Posters 10.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • X
      xciter327 @Juve
      last edited by xciter327

      @Juve

      I have got to say in my case the firewall did freeze eventually after seeing those "mcvlan" errors. I've disabled WOL in BIOS which apparently controls like 10 different power saving options. It can also be disabled by an Intel utility without reboot. Since then I've put a firewall under testing on a 10g link and have not been able to crash/freeze it.

      1 Reply Last reply Reply Quote 1
      • J
        Juve
        last edited by Juve

        Interesting. The hardware I'm working with is a HPE Proliant DL360. The bios tuning is
        maximal performance, no virtualization, no C-States, no hyper-threading, no boot on lan

        looks like IXL and LACP don't play well together at least on 11.x : https://lists.freebsd.org/pipermail/freebsd-net/2016-April/045091.html

        @stephenw10 , I will do so when I will have tested the system for enougth time to tell it is working as expected. We need to be sure.

        X 1 Reply Last reply Reply Quote 0
        • X
          xciter327 @Juve
          last edited by

          @Juve said in New Version 2.4.4 - Interface Error --> aq_add_macvlan err -53, aq_error 14:

          Interesting. The hardware I'm working with is a HPE Proliant DL360. The bios tuning is
          maximal performance, no virtualization, no C-States, no hyper-threading, no boot on lan

          looks like IXL and LACP don't play well together at least on 11.x : https://lists.freebsd.org/pipermail/freebsd-net/2016-April/045091.html

          @stephenw10 , I will do so when I will have tested the system for enougth time to tell it is working as expected. We need to be sure.

          I'm under the same impression. Check out the advisory document from Intel regarding bugs with that adapter. Many many thing are broken.

          1 Reply Last reply Reply Quote 0
          • J
            Juve
            last edited by

            On day of testing my setup:

            • 2xHPE DL360 with HPE NC562 SFP+ (Intel X710 DA2), 6core @3,4Ghz, 48 GB of RAM.

            I stressed the boxes with iperf3 because that was all I got at hand.

            Common tuning:
            - earlyshellcmd doing "ifconfig ixl -vlanhwtso"
            - tunable hw.intr_storm_threshold set at 0
            - IXL cards are configured in two LAGGS configured in failover mode
            - each interface is a vlan tagged on one of those laggs
            - /boot/loader.conf.local with
            net.pf.source_nodes_hashsize="1048576"
            net.pf.states_hashsize="67108864"

            testing the box with 1.9.9-k driver (stock FreeSD 11.2):
            - iperf3 5 thread during 900s with pf enable: 6,13Gbit/s, peak at 4,25MPPS
            - iperf3 5 thread during 900s with pf disabled: 9,45Gbit/s
            - one "queue hung" on slave node at boot in dmesg but nothing after
            - no problem with CARP failover, going from master to slave and failing back as expected in a timely manner

            testing the box with 1.11.9 driver (lastest on Intel WebSite):
            - iperf3 5 thread during 900s with pf enable: 6,55Gbit/s, peak at 4,58MPPS
            - iperf3 5 thread during 900s with pf disabled: 9,45Gbit/s
            - some "queue hung" on slave node (mainly at boot)
            - problem with CARP failover, slave node stay Master randomly on different interfaces (IPv4 or IPV6) during few seconds to 10 minutes. A tcpdump on the slave node shows it can't see advertisements packet from master. A tcpdump on master shows both packet from master and slave !

            to sumup, newest driver seems to improve a litlle the performance using pf.
            disabling TSO on vlans seems to have solved a lot of problem for me and I also gained arround 900mbit/s of throughput and 300KPPS
            I decided to stay on stock driver 1.9.9-k without LACP (see LLDP agent problem in Intel document) and with vlanhwtso disabled. I will continue stress testing that setup soon.

            Side note : for those struggling with IPV6 CARP not willing to configure (the file exists issue). The rule is to no use CAP in hexadecimal numbers and to remove leading zeroes. Ok, but that is not a always wining rule. if you have only one useless 0 in the adress, you should keep that zero. You will see in the shell that even you configured the adress using "::" in the UI, FreeBSD impose the 0. If you have more than one 0 you are good.

            1 Reply Last reply Reply Quote 0
            • J
              Juve
              last edited by

              Some followup because it will help others for sure.

              After a lot more testing I can confirm that the only working solution for me (= 10 days uptime without any issue) is:

              • stock driver 1.9.9k
              • failover LAGG
              • no IPV6 at all

              As soon as you will be using IPV6 you will get hanging queues.

              1 Reply Last reply Reply Quote 0
              • X
                xciter327
                last edited by

                Thanks for the update. I'm somewhat convinced we are hitting a similar issue, but in different ways.

                Do You have "Allow IPv6" in the Advanced/Networking enabled or disabled? We don't actually have IPv6 on any of the locations where we have the issue, but in our office, where we do have IPv6, there isn't an issue. Also we are not using LAGG.

                1 Reply Last reply Reply Quote 0
                • J
                  Juve
                  last edited by

                  Yes we do have thave "Allow IPv6" enabled.

                  Remember that I have a shellscript ran at boot that does a "ifconfig -vlanhwtso ixl0..1..2..3"
                  Without disabling vlanhwtso we quicky have hung queues.

                  We are still stable after 12 days of uptime. I'll get you posted if we encounter any issue.

                  X 1 Reply Last reply Reply Quote 1
                  • X
                    xciter327 @Juve
                    last edited by xciter327

                    @Juve said in New Version 2.4.4 - Interface Error --> aq_add_macvlan err -53, aq_error 14:

                    Yes we do have thave "Allow IPv6" enabled.

                    Remember that I have a shellscript ran at boot that does a "ifconfig -vlanhwtso ixl0..1..2..3"
                    Without disabling vlanhwtso we quicky have hung queues.

                    We are still stable after 12 days of uptime. I'll get you posted if we encounter any issue.

                    Do You actually use VLANs on the interfaces?

                    1 Reply Last reply Reply Quote 0
                    • J
                      Juve
                      last edited by

                      Yes we do.
                      There are dozens of vlan used.

                      1 Reply Last reply Reply Quote 0
                      • J
                        Juve
                        last edited by

                        We experienced a hard reboot during VLAN configuration on the slave node.
                        Looks like problem is only occuring when reconfiguring interfaces.

                        I'll keep you updated

                        1 Reply Last reply Reply Quote 0
                        • J
                          Juve
                          last edited by Juve

                          Regarding the crash he message was:
                          Fatal trap 9: general protection fault while in kernel mode
                          cpuid = 0; apic id = 00
                          instruction pointer = 0x20:0xffffffff80e38d40
                          stack pointer = 0x28:0xfffffe0b9ceba130
                          frame pointer = 0x28:0xfffffe0b9ceba170
                          code segment = base 0x0, limit 0xfffff, type 0x1b
                          = DPL 0, pres 1, long 1, def32 0, gran 1
                          processor eflags = interrupt enabled, resume, IOPL = 0
                          current process = 12 (swi4: clock (0))

                          the trace is :
                          Tracing pid 12 tid 100034 td 0xfffff8000a331620
                          carp_master_down_locked() at carp_master_down_locked+0xf0/frame 0xfffffe0b9ceba170
                          carp_master_down() at carp_master_down+0x21/frame 0xfffffe0b9ceba190
                          softclock_call_cc() at softclock_call_cc+0x13a/frame 0xfffffe0b9ceba240
                          softclock() at softclock+0x79/frame 0xfffffe0b9ceba260
                          intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0b9ceba2a0
                          ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0b9ceba2f0
                          fork_exit() at fork_exit+0x83/frame 0xfffffe0b9ceba330
                          fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0b9ceba330
                          --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

                          Regarding IXL driver, I noticed that a 1.11.20 version was released in september but but download page wasn't displaying the good driver (IAVF instead of IXL).
                          So I emailed Intel today and they fixed the issue and we can now download the 1.11.20 version.
                          We will try it tomorrow.

                          1 Reply Last reply Reply Quote 0
                          • J
                            Juve
                            last edited by

                            So here we are after a lot of trial:

                            • we know and are sure that reconfiguring interface capabilities (TSO) while transmitting trafic is what is causing queue hang (like explained in the FreeBSD ticket)

                            • we know adding vlan with 1.9.9k produce a queue hang and a mac vlan error

                            • we know IPV6 was causing hang, but we did not test it again in the last setup (see below)

                            • the hard reboot is caused by an issue not directly related to the interface problem but a result of a lock when a queue hang

                            • we did compile and install version 1.11.20 of the IXL driver and we did remove the execution of our script that was removing HWVLANTSO at boot because when it was executed the NIC was already transmitting trafic and we had one or two queue hung at boot. This would cause issues later on.

                            • we did test adding vlan this morning and nothing bad happened, no error message, no queue hang,

                            So, we start a new phase of monitoring of how it is going.

                            1 Reply Last reply Reply Quote 1
                            • J
                              Juve
                              last edited by

                              A quick FollowUP running 1.11.20 :

                              • 9 days uptime
                              • no error so far.
                              • throuhput is good, no issue
                              • we added 6 new vlan in the last 9 Days and everything works as expected for the moment.
                              • we did test multiple master/slave failover/failback, no issues

                              I'll continue to keep you informed.

                              1 Reply Last reply Reply Quote 2
                              • JeGrJ
                                JeGr LAYER 8 Moderator @stephenw10
                                last edited by

                                @stephenw10 said in New Version 2.4.4 - Interface Error --> aq_add_macvlan err -53, aq_error 14:

                                Please add that info to the bug report if you have confirmed it.
                                https://redmine.pfsense.org/issues/9123

                                Steve

                                @Juve could you add those intel to the ticket, Stephen mentioned?
                                It would provide additional help/info.

                                also @stephenw10 if that driver release seems to solve the problem, how are chances that it be included in a 2.4.5 release? As we're currently having information from two customers they are seeing similar problems (one with "no response - dead" pfsense when adding VLANs, one having a reboot with queue and VLAN errors shown), that would be a great thing.

                                best regards

                                Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                                If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  If it's in 11 stable it would likely make it into a 2.4.X release. The drivers in 12 look to be significantly different, I'm not sure that could be brought back.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    Juve
                                    last edited by

                                    I did update the ticket :-)

                                    1 Reply Last reply Reply Quote 2
                                    • J
                                      Juve
                                      last edited by

                                      Quick update:
                                      50 days uptime, no error so far.
                                      Looks like the driver was the issue.

                                      JeGrJ 1 Reply Last reply Reply Quote 1
                                      • JeGrJ
                                        JeGr LAYER 8 Moderator @Juve
                                        last edited by JeGr

                                        @Juve said in New Version 2.4.4 - Interface Error --> aq_add_macvlan err -53, aq_error 14:

                                        Quick update:
                                        50 days uptime, no error so far.
                                        Looks like the driver was the issue.

                                        What were the latest changes you were running? Latest driver as you wrote (1.11.20) and HWVLANTSO removal?

                                        If so, any chance @stephenw10 to bring that driver version into the mix for the upcoming 2.4.5?

                                        Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                                        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                                        1 Reply Last reply Reply Quote 0
                                        • J
                                          Juve
                                          last edited by

                                          It is still super stable.

                                          The only two things we did at the end are:

                                          • use the latest driver
                                          • use failover LAGG and not LACP lagg (but we did not test LACP with latest driver so we can't confirm is is not working with it)

                                          no VLANTSO removal etc.

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Is anyone still seeing this with the 1.11.9 driver in 2.4.5?

                                            Steve

                                            D 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.