Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    504 Gateway timeout and full network loss periodically

    Scheduled Pinned Locked Moved webGUI
    14 Posts 4 Posters 3.9k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      michmoor LAYER 8 Rebel Alliance @euantorano
      last edited by michmoor

      @euantorano
      I’m having a similar issue and this is what I am watching for

      1. Take a look at the monitoring graph for cpu on pfsense. How does system util look?
      2. What is the top process consuming cpu during the incident. - top aSH

      In my case I’m leaning into a corrupted filesystem because there aren’t any other indicators of what the issue can be aside from the kernel process consuming everything to the point that the network can’t forward packets.

      edit.
      When my system becomes irresponsible to the point that DNS resolution doesn't work and inter-vlan routing is extremely slow this is what my chat looks like.

      f3fcb154-4500-40a2-836a-9455f5b6b047-image.png

      Firewall: NetGate,Palo Alto-VM,Juniper SRX
      Routing: Juniper, Arista, Cisco
      Switching: Juniper, Arista, Cisco
      Wireless: Unifi, Aruba IAP
      JNCIP,CCNP Enterprise

      E 1 Reply Last reply Reply Quote 0
      • E Offline
        euantorano @michmoor
        last edited by

        @michmoor Great to hear I'm not alone at least! I must confess this is my first pfSense system using ZFS - I've always used UFS in the past.

        Looking at the monitoring graph at the moment, the utilisation is pretty low under normal use:

        9fd25c3e-b14b-454b-abdf-c891fcbfff51-image.png

        (the drop in processes corresponds with a system reboot around 12:15)

        I'm hoping that when it next fails I'll be able to either access the system via SSH or via the monitor I've now got hooked up. Based on past experience I shouldn't have to wait too long until it next fails - it tends to only be a week or two at the most between failures.

        1 Reply Last reply Reply Quote 0
        • E Offline
          euantorano
          last edited by

          At some point since Thursday (we had a long weekend here in the UK for public holidays) the system has failed again. SSH is working, so I've managed to grab the output from top aSH:

          last pid: 18852;  load averages:  0.00,  0.00,  0.00                                                                                                      up 11+18:22:31  08:32:33
          54 processes:  1 running, 53 sleeping
          CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.9% idle
          Mem: 24M Active, 376M Inact, 655M Wired, 6360M Free
          ARC: 138M Total, 22M MFU, 111M MRU, 16K Anon, 779K Header, 4804K Other
               102M Compressed, 253M Uncompressed, 2.47:1 Ratio
          Swap: 1024M Total, 1024M Free
          

          I couldn't immediately check the graphs as the "504 Gateway Time-out" error on login prevented me from accessing them. I've since ran /etc/rc.php-fpm_restart and /etc/rc.restart_webgui and now cannot even get to the login page...

          E 1 Reply Last reply Reply Quote 0
          • E Offline
            euantorano @euantorano
            last edited by

            Interestingly, if I try to login from a private window or another browser I do see a log from syslog in my SSH session that the login was successful, but the 504 Gateway Time-out still occurs:

            Message from syslogd ...
            <32>1 2024-04-02T08:53:24.974210+01:00 PFSENSEBOX php-fpm 16785 - - /index.php: Successful login for user 'euant' from: IP_ADDRESS (Local Database)
            

            So php-fpm is at least kind of working up to that point.

            L 1 Reply Last reply Reply Quote 0
            • L Offline
              LaFlamaBlanca @euantorano
              last edited by

              @euantorano I've been running into the same issue for the past couple months. I migrated in late 2023 from a virtualized setup in hyper-v that ran without issue for a couple years. I'm now on a protectli FW4C on CE 2.7.2. I have a pretty small home setup with 1 wan, 1 lan, and another port I'm using with two VLAN. OpenVPN running on UDP and a TCP instance behind HAProxy (for connection from work/locations that block UDP).

              I can't find anything meaningful in System Logs under General, Gateways, DHCP, DNS Resolver, etc. There are some alarms periodically through the day but not around the outage :

              send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr ***** bind_addr ***** identifier "WAN_DHCP "

              Some things I've tried :

              • Place a switch between modem and protectli/pfsense
              • Swapped all cables
              • TSO to disabled/0
              • Disabled Gateway Monitoring Action
              • System -> Advanced -> Networking - KEA DHCP
              • System -> Advanced -> Miscellaneous - Memory Limit - 1024
              • System -> Routing -> Gateways - Monitor IP - 1.1.1.1
              • Default Gateways to WAN_DHCP (from automatic)
              • DHCP Client Configuration to FreeBSD Default
              • Reject Leases from 192.168.100.1 (modem)
              • Lease Requirements and Requests : Options modifiers - supersede dhcp-server-identifier 255.255.255.255
              • Interfaces -> * -> Speed and Duplex set explicitly 1000baseT full-duplex (and 2500 because of 2.5G intel ports)

              I've done a backup restore but I'm trying everything I can to avoid a full fresh install while I try to work on other projects, but the inability to restart pfsense/fix the issue while I'm away from home is breaking me. Have you had any luck?

              E 1 Reply Last reply Reply Quote 0
              • E Offline
                euantorano @LaFlamaBlanca
                last edited by

                @LaFlamaBlanca sounds extremely familiar. I too have tried similar steps including putting a switch between the modem and pfSense and swapping out cables.

                Unfortunately I’ve not had any luck yet, but at the moment it looks like the frequency of it happening has reduced slightly after I enabled some of the hardware offloading settings to turn them on.

                E 1 Reply Last reply Reply Quote 0
                • E Offline
                  euantorano @euantorano
                  last edited by

                  Well everything had been fairly smooth sailing since my previous post in April 2024 until I applied the 25.07-RELEASE update and I'm now back to where I was previously, with sporadic lock-ups happening somewhere between every couple of days and once a week.

                  No other settings were changed, except to move to the new DHCP server (Kea).

                  Any ideas on how to troubleshoot what's happening? There are no obvious logs to report what the problem is.

                  E 1 Reply Last reply Reply Quote 0
                  • E Offline
                    euantorano @euantorano
                    last edited by

                    The system has fallen over again overnight. The login screen loads fine, but as soon as you try to submit the login, the request takes quite a long time then you eventually get this screen:

                    6e57e9f4-9645-4b21-a682-3dc0443698ef-image.png

                    Clicking on the link to the Crash Reporter is pretty useless, as it only contains the following:

                    Crash report begins.  Anonymous machine information:
                    
                    amd64
                    15.0-CURRENT
                    FreeBSD 15.0-CURRENT #0 plus-RELENG_25_07_1-n256513-49844af35a5d: Fri Aug 15 19:21:04 UTC 2025     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-25_07_1-main/obj/amd64/DZizCvOj/var/jenkins/workspace/pfSense-Plus-snapshots-25_07_1-main/sources
                    
                    Crash report details:
                    
                    No PHP errors found.
                    
                    No FreeBSD crash data found.
                    			
                    

                    And I can't access any other pages within the system to check logs or diagnostics etc. until we perform a shutdown and start the system back up again.

                    E 1 Reply Last reply Reply Quote 0
                    • E Offline
                      euantorano @euantorano
                      last edited by

                      Interestingly, I've just SSHed into the system, and running the ifconfig command hangs in the sbwait state for a very long time:

                      96993 root          1  68    0    15M  3504K sbwait  10   0:00   0.00% ifconfig
                      91059 root          1  68    0    15M  3520K sbwait   6   0:00   0.00% ifconfig
                      38972 root          1  68    0    15M  3516K sbwait   8   0:00   0.00% ifconfig
                      

                      I found a topic on the FreeBSD community discussing this, and I wonder if this may be related: https://forums.freebsd.org/threads/ifconfig-needs-16s-mainly-sbwait-on-freebsd-14-2-p1.97931/

                      GertjanG 1 Reply Last reply Reply Quote 0
                      • GertjanG Offline
                        Gertjan @euantorano
                        last edited by Gertjan

                        @euantorano

                        uname -r
                        

                        will tell you what you already knew :

                        a6038559-76e2-46bb-b08f-e048e12fb02a-image.png

                        so, be ware : (Your) pfSense doesn't use FreeBSD 14.2

                        About ifconfig : I never saw the sbwait issue, but it isn't unknown. Look here, and read some of the post, check what matches with what you see.

                        I tend to think : an interface issue ? and somehow this impact your PHP daemon (uses a socket) for the communication between nginsx, the web server, and the PHP interpreter, hence the 50x error.

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        E 1 Reply Last reply Reply Quote 0
                        • E Offline
                          euantorano @Gertjan
                          last edited by

                          @Gertjan Yes, I'm suspecting an interface or driver issue. It's interesting that I can SSH in (from a remote location, over a WireGuard connection), but devices connected directly to the LAN from the pfSense install cannot reach out and PHP/nginx seem to have issues.

                          This box has a 2.5GbE NIC, and a separate Intel PCI card with 4 network interfaces. I've included some of the output from pciconf -lv below:

                          igb0@pci0:1:0:0:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001
                              vendor     = 'Intel Corporation'
                              device     = 'I350 Gigabit Network Connection'
                              class      = network
                              subclass   = ethernet
                          igb1@pci0:1:0:1:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001
                              vendor     = 'Intel Corporation'
                              device     = 'I350 Gigabit Network Connection'
                              class      = network
                              subclass   = ethernet
                          igb2@pci0:1:0:2:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001
                              vendor     = 'Intel Corporation'
                              device     = 'I350 Gigabit Network Connection'
                              class      = network
                              subclass   = ethernet
                          igb3@pci0:1:0:3:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001
                              vendor     = 'Intel Corporation'
                              device     = 'I350 Gigabit Network Connection'
                              class      = network
                              subclass   = ethernet
                          none6@pci0:88:0:0:      class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2725 subvendor=0x8086 subdevice=0x0024
                              vendor     = 'Intel Corporation'
                              device     = 'Wi-Fi 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak]'
                              class      = network
                          igc0@pci0:89:0:0:       class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f2 subvendor=0x8086 subdevice=0x3019
                              vendor     = 'Intel Corporation'
                              device     = 'Ethernet Controller I225-LM'
                              class      = network
                              subclass   = ethernet
                          ``
                          GertjanG 1 Reply Last reply Reply Quote 0
                          • GertjanG Offline
                            Gertjan @euantorano
                            last edited by

                            @euantorano

                            Wild guess : not assigned (?) and if you can live without it : disable in the BIOS :

                            7a57d0e9-85ff-4be6-9bf6-8cd3f5dfb91a-image.png

                            No "help me" PM's please. Use the forum, the community will thank you.
                            Edit : and where are the logs ??

                            E 1 Reply Last reply Reply Quote 0
                            • E Offline
                              euantorano @Gertjan
                              last edited by

                              @Gertjan Yeah, the Wi-Fi isn't assigned at all. I'll try disabling it and see if it has any effect.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.