Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    2.3 stops routing traffic every 1 og 2 days.

    Scheduled Pinned Locked Moved General pfSense Questions
    27 Posts 10 Posters 6.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      rlrobs
      last edited by

      I have the same problem with pfSense 2.3 on dell poweredge 2900.

      Dell Power edge 2900
      32GB RAM
      QuadCore
      HD SAS: 512GB
      4 interfaces intel.

      packages:
      Suricata
      PFBlocker
      Zabbix-aget-LTS
      OpenVPN Client Export.

      1 Reply Last reply Reply Quote 0
      • A
        adam65535
        last edited by

        The Dell R320 system I updated from 2.1.5 to 2.3 also had a similar symptom using igb driver.  It is the secondary of a pair of Dell R320's in a HA (primary/secondary failover) setup.  The primary is still on 2.1.5.  The secondary ran fine with it being master on 2.3 for about 10 hours or so and then half of the connections to some IPs stopped working.  It was wierd because I couldn't ping some systems on the network but I could ping others.  Same thing with remote systems.  Some I could ping and others I couldn't.  I could not ping the ISPs router (pfsense's default route) but I could pass traffic through it.  Network traces on the other systems and routers showed that packets went out and were sent back but pfsense didn't see them (or dropped them?).  Interrupts went to about 30% when the problem started around 5am  several days ago.  Even when I switched back to the primary the interrupts were still pegged at 30% even though no traffic was going through the secondary (that I could tell) after moving traffic back to the primary (carp).  I didn't think to do a netstat -i to see which IRQs were maxed or look at dropped packets as it was very early in the morning for me unfortunately.  Keep in mind that this is a backup site that is not active so not much traffic goes through it except transaction logs, etc.

        I noticed that I still had a hw.igb.num_queues set to 2 trying to optimize the drivers on pfsense 2.1.5 to limit nmbclusters from what I remember (my memory is not that good though :)).  It seemed like a big coincidence that is half the CPUs that are on the system and also maybe around half or 1/4 of the connections were failing (hyperthreading disabled in bios).  The driver was creating 4 queues according to netstat -i.

        I removed that setting and hoping it was related.  I will be doing tests during business hours the next week or so to try and determine if the problems is resolved or not.

        Do you have num_queues set also by chance?  It is not needed any more from everything I read.

        (I will update this post with the network card model numbers when I get to work in the morning).
        I have kern.ipc.nmbclusters="131072" set and using about 43000 of them in my setup with 4 cores and 8 interfaces (two 4 port intel cards).
        Running 3 site Ipsec tunnels, openvpn but that was not in use anytime, port forwards, carp(of course), and built in load balancer.

        1 Reply Last reply Reply Quote 0
        • D
          denmly
          last edited by

          Yesterday i tried to upgrade the fw with 4 GB ram…

          But it died last night at about 21.00...

          I'm almost ready to downgrade to 2.2.6, because this is driving me crazy...

          Is there a place to get the old iso files online??

          1 Reply Last reply Reply Quote 0
          • C
            cmb
            last edited by

            @adam65535:

            I noticed that I still had a hw.igb.num_queues set to 2 trying to optimize the drivers on pfsense 2.1.5 to limit nmbclusters from what I remember (my memory is not that good though :)).

            There were problems in igb multi-queue in the old drivers, that's why people ended up setting num_queues to 1 or a small number. In all FreeBSD 10.x and newer base versions (2.2.0-2.3.1+), you shouldn't specify hw.igb.num_queues at all. Remove that from loader.conf and/or loader.conf.local to let it use the default (1 queue per CPU core).

            It's possible setting num_queues to some non-default number causes problems, especially if a low number, as I doubt much testing happens in those circumstances.

            1 Reply Last reply Reply Quote 0
            • C
              cmb
              last edited by

              denmly: I PMed you a link to a kernel to try with instructions.

              1 Reply Last reply Reply Quote 0
              • D
                denmly
                last edited by

                @cmb:

                denmly: I PMed you a link to a kernel to try with instructions.

                I'll try this kernel right away

                1 Reply Last reply Reply Quote 0
                • D
                  denmly
                  last edited by

                  New kernel is installed, and now its just wait and see… :-)

                  1 Reply Last reply Reply Quote 0
                  • U
                    ulicky
                    last edited by

                    I have same problem on 2 same machines, before 2.3 it was ok.

                    Supermicro board + 4x igb interfaces
                    Intel(R) Xeon(R) CPU X3430 @ 2.40GHz - 4 CPUs: 1 package(s) x 4 core(s)
                    Memory usage 2% of 8148 MiB

                    Any solution for that?

                    1 Reply Last reply Reply Quote 0
                    • B
                      byusinger84
                      last edited by

                      Having the same issue on this post: https://forum.pfsense.org/index.php?topic=110710

                      1 Reply Last reply Reply Quote 0
                      • M
                        mer
                        last edited by

                        ulicky and byusinger84, can you console in or ssh to the box?  Assuming the interfaces are igb or em, see if there are any messages related to "watchdog timeout".
                        I don't have any fixes, but if you have those interfaces, it may be related to something a few other folks are seeing.

                        1 Reply Last reply Reply Quote 0
                        • B
                          byusinger84
                          last edited by

                          @mer:

                          ulicky and byusinger84, can you console in or ssh to the box?  Assuming the interfaces are igb or em, see if there are any messages related to "watchdog timeout".
                          I don't have any fixes, but if you have those interfaces, it may be related to something a few other folks are seeing.

                          I'll check this out the next time the LAN interface freezes again.

                          1 Reply Last reply Reply Quote 0
                          • T
                            thx2000
                            last edited by

                            I'm pretty sure I'm experiencing the same problem, as mentioned in this post: https://forum.pfsense.org/index.php?topic=110320.0

                            I've noticed that if I leave the system on long enough the LAN interface will eventually drop offline after 2-3 days even without any SIP traffic through the VPN.  I'll try to check for watchdog timeout messages the next time it occurs.

                            1 Reply Last reply Reply Quote 0
                            • D
                              denmly
                              last edited by

                              Just experienced another of these fw breakdowns even with the new kernel from CMB :-(

                              it came at the same time that a big transfer of data started through a site to site vpn tunnel…

                              1 Reply Last reply Reply Quote 0
                              • C
                                cmb
                                last edited by

                                @denmly:

                                Just experienced another of these fw breakdowns even with the new kernel from CMB :-(

                                it came at the same time that a big transfer of data started through a site to site vpn tunnel…

                                That's not good, maybe something different in your case. Others have had promising results with the no-netmap kernel, though it hasn't been long enough yet to have a lot of confidence. What type of VPN?

                                1 Reply Last reply Reply Quote 0
                                • D
                                  denmly
                                  last edited by

                                  Ipsec VPN to another Pfsense 2.3…

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    cmb
                                    last edited by

                                    Could you get me a status tgz from your system? Browse to status.php and click the link to download the tgz. Email the file or a link to it to cmb at pfsense dot org.

                                    1 Reply Last reply Reply Quote 0
                                    • D
                                      denmly
                                      last edited by

                                      Email is now sent to you.

                                      1 Reply Last reply Reply Quote 0
                                      • B
                                        byusinger84
                                        last edited by

                                        @denmly:

                                        Just experienced another of these fw breakdowns even with the new kernel from CMB :-(

                                        it came at the same time that a big transfer of data started through a site to site vpn tunnel…

                                        I also experienced the same issue even using the new kernel. Also I don't think this is related to SIP traffic because one of the sites that's had the issue doesn't use SIP.

                                        1 Reply Last reply Reply Quote 0
                                        • T
                                          thx2000
                                          last edited by

                                          @byusinger84:

                                          @denmly:

                                          Just experienced another of these fw breakdowns even with the new kernel from CMB :-(

                                          it came at the same time that a big transfer of data started through a site to site vpn tunnel…

                                          I also experienced the same issue even using the new kernel. Also I don't think this is related to SIP traffic because one of the sites that's had the issue doesn't use SIP.

                                          I don't think it's strictly related to SIP traffic, but there are tons of RTP UDP packets that are sent during a call.  So for whatever reason that payload over the VPN is exacerbating the underlying issue.

                                          1 Reply Last reply Reply Quote 0
                                          • D
                                            denmly
                                            last edited by

                                            Just has yet another insident, it seems that everytime it happends it is on the hour, eg. 21.00, 23.00, 01.00. or 05.00 are the times i've noticed this problems starts.

                                            I can see it on my MRTG traffic graphs, when traffic stops comming through the FW.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.