504 Gateway timeout and full network loss periodically
- 
 We've got a system running Netgate pfSense Plus 23.09.1-RELEASE on an Intel NUC 11 Extreme. We're using root on ZFS, and our resource usage is very low (e.g. 0% CPU usage most of the time according to the dashboard, around 11% memory usage and 1% disk usage). Periodically the system will enter a fully unresponsive state where: - Attempting to login to the web GUI configurator fails with a 504 gateway Timeout error from Nginx.
- Clients on the LAN cannot access the Internet or other networks. However, the WireGuard tunnel that we have configured stays up, and we can ping the box from a host on the other side of the tunnel.
- The system does not show any display output when a monitor is connected to the HDMI port on the unit and a keyboard is connected via USB.
 When this happens, the only way to recover is to hard shutdown the system (by pressing and holding the power button) and then rebooting it. If I check the system logs after this reboot process, I can see errors from Nginx such as the following: Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET / HTTP/2.0" 200 4542 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /vendor/bootstrap/css/bootstrap.min.css HTTP/2.0" 200 25180 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /css/login.css?v=1701893452 HTTP/2.0" 200 1077 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /csrf/csrf-magic.js HTTP/2.0" 200 7313 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /vendor/jquery/jquery-3.5.1.min.js?v=1701893452 HTTP/2.0" 200 89476 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /vendor/bootstrap/js/bootstrap.min.js?v=1701893452 HTTP/2.0" 200 39680 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /js/pfSense.js?v=1701893452 HTTP/2.0" 200 11595 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /css/logo.css HTTP/2.0" 200 106 "https://10.2.7.1/css/login.css?v=1701893452" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:10 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:10 +0000] "GET /favicon.ico HTTP/2.0" 200 15086 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:51:20 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:51:20 +0000] "POST / HTTP/2.0" 302 0 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 08:54:20 PFSENSEBOX nginx: 2024/03/21 08:54:20 [error] 38624#100562: *3 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.2.7.11, server: , request: "GET / HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.2.7.1", referrer: "https://10.2.7.1/" Mar 21 08:54:20 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:08:54:20 +0000] "GET / HTTP/2.0" 504 562 "https://10.2.7.1/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0" Mar 21 10:48:59 PFSENSEBOX nginx: 10.2.7.11 - - [21/Mar/2024:10:48:59 +0000] "GET / HTTP/2.0" 200 4545 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0"So Nginx is clearly running at this point, but it looks like PHP-FPM may not be. I can't find any logs from PHP-FPM at all that may shed any light on what's happened to the PHP-FPM process. The only packages that we have installed are: - openvpn-client-export
- sudo (I've only just installed this today, after the most recent reboot - I enabled SSH at this point so that I might be able to SSH in the next time it fails)
- WireGuard (our WireGuard tunnel is still
 We previously had SSH disabled, but I have enabled it this morning so that next time the box fails I may be able to SSH in to try run any diagnostics. This has been an ongoing problem for a while - at least since the last quart of 2023. We've been struggling along with it and just rebooting as I've not had time to investigate. 
- 
 @euantorano 
 I’m having a similar issue and this is what I am watching for- Take a look at the monitoring graph for cpu on pfsense. How does system util look?
- What is the top process consuming cpu during the incident. - top aSH
 In my case I’m leaning into a corrupted filesystem because there aren’t any other indicators of what the issue can be aside from the kernel process consuming everything to the point that the network can’t forward packets. edit. 
 When my system becomes irresponsible to the point that DNS resolution doesn't work and inter-vlan routing is extremely slow this is what my chat looks like. 
- 
 @michmoor Great to hear I'm not alone at least! I must confess this is my first pfSense system using ZFS - I've always used UFS in the past. Looking at the monitoring graph at the moment, the utilisation is pretty low under normal use:  (the drop in processes corresponds with a system reboot around 12:15) I'm hoping that when it next fails I'll be able to either access the system via SSH or via the monitor I've now got hooked up. Based on past experience I shouldn't have to wait too long until it next fails - it tends to only be a week or two at the most between failures. 
- 
 At some point since Thursday (we had a long weekend here in the UK for public holidays) the system has failed again. SSH is working, so I've managed to grab the output from top aSH:last pid: 18852; load averages: 0.00, 0.00, 0.00 up 11+18:22:31 08:32:33 54 processes: 1 running, 53 sleeping CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle Mem: 24M Active, 376M Inact, 655M Wired, 6360M Free ARC: 138M Total, 22M MFU, 111M MRU, 16K Anon, 779K Header, 4804K Other 102M Compressed, 253M Uncompressed, 2.47:1 Ratio Swap: 1024M Total, 1024M FreeI couldn't immediately check the graphs as the "504 Gateway Time-out" error on login prevented me from accessing them. I've since ran /etc/rc.php-fpm_restartand/etc/rc.restart_webguiand now cannot even get to the login page...
- 
 Interestingly, if I try to login from a private window or another browser I do see a log from syslog in my SSH session that the login was successful, but the 504 Gateway Time-out still occurs: Message from syslogd ... <32>1 2024-04-02T08:53:24.974210+01:00 PFSENSEBOX php-fpm 16785 - - /index.php: Successful login for user 'euant' from: IP_ADDRESS (Local Database)So php-fpm is at least kind of working up to that point. 
- 
 @euantorano I've been running into the same issue for the past couple months. I migrated in late 2023 from a virtualized setup in hyper-v that ran without issue for a couple years. I'm now on a protectli FW4C on CE 2.7.2. I have a pretty small home setup with 1 wan, 1 lan, and another port I'm using with two VLAN. OpenVPN running on UDP and a TCP instance behind HAProxy (for connection from work/locations that block UDP). I can't find anything meaningful in System Logs under General, Gateways, DHCP, DNS Resolver, etc. There are some alarms periodically through the day but not around the outage : send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr ***** bind_addr ***** identifier "WAN_DHCP " Some things I've tried : - Place a switch between modem and protectli/pfsense
- Swapped all cables
- TSO to disabled/0
- Disabled Gateway Monitoring Action
- System -> Advanced -> Networking - KEA DHCP
- System -> Advanced -> Miscellaneous - Memory Limit - 1024
- System -> Routing -> Gateways - Monitor IP - 1.1.1.1
- Default Gateways to WAN_DHCP (from automatic)
- DHCP Client Configuration to FreeBSD Default
- Reject Leases from 192.168.100.1 (modem)
- Lease Requirements and Requests : Options modifiers - supersede dhcp-server-identifier 255.255.255.255
- Interfaces -> * -> Speed and Duplex set explicitly 1000baseT full-duplex (and 2500 because of 2.5G intel ports)
 I've done a backup restore but I'm trying everything I can to avoid a full fresh install while I try to work on other projects, but the inability to restart pfsense/fix the issue while I'm away from home is breaking me. Have you had any luck? 
- 
 @LaFlamaBlanca sounds extremely familiar. I too have tried similar steps including putting a switch between the modem and pfSense and swapping out cables. Unfortunately I’ve not had any luck yet, but at the moment it looks like the frequency of it happening has reduced slightly after I enabled some of the hardware offloading settings to turn them on. 
- 
 Well everything had been fairly smooth sailing since my previous post in April 2024 until I applied the 25.07-RELEASE update and I'm now back to where I was previously, with sporadic lock-ups happening somewhere between every couple of days and once a week. No other settings were changed, except to move to the new DHCP server (Kea). Any ideas on how to troubleshoot what's happening? There are no obvious logs to report what the problem is. 
- 
 The system has fallen over again overnight. The login screen loads fine, but as soon as you try to submit the login, the request takes quite a long time then you eventually get this screen:  Clicking on the link to the Crash Reporter is pretty useless, as it only contains the following: Crash report begins. Anonymous machine information: amd64 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_25_07_1-n256513-49844af35a5d: Fri Aug 15 19:21:04 UTC 2025 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-25_07_1-main/obj/amd64/DZizCvOj/var/jenkins/workspace/pfSense-Plus-snapshots-25_07_1-main/sources Crash report details: No PHP errors found. No FreeBSD crash data found.And I can't access any other pages within the system to check logs or diagnostics etc. until we perform a shutdown and start the system back up again. 
- 
 Interestingly, I've just SSHed into the system, and running the ifconfigcommand hangs in thesbwaitstate for a very long time:96993 root 1 68 0 15M 3504K sbwait 10 0:00 0.00% ifconfig 91059 root 1 68 0 15M 3520K sbwait 6 0:00 0.00% ifconfig 38972 root 1 68 0 15M 3516K sbwait 8 0:00 0.00% ifconfigI found a topic on the FreeBSD community discussing this, and I wonder if this may be related: https://forums.freebsd.org/threads/ifconfig-needs-16s-mainly-sbwait-on-freebsd-14-2-p1.97931/ 
- 
 uname -rwill tell you what you already knew :  so, be ware : (Your) pfSense doesn't use FreeBSD 14.2 About ifconfig : I never saw the sbwait issue, but it isn't unknown. Look here, and read some of the post, check what matches with what you see. I tend to think : an interface issue ? and somehow this impact your PHP daemon (uses a socket) for the communication between nginsx, the web server, and the PHP interpreter, hence the 50x error. 
- 
 @Gertjan Yes, I'm suspecting an interface or driver issue. It's interesting that I can SSH in (from a remote location, over a WireGuard connection), but devices connected directly to the LAN from the pfSense install cannot reach out and PHP/nginx seem to have issues. This box has a 2.5GbE NIC, and a separate Intel PCI card with 4 network interfaces. I've included some of the output from pciconf -lvbelow:igb0@pci0:1:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb1@pci0:1:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb2@pci0:1:0:2: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb3@pci0:1:0:3: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0001 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet none6@pci0:88:0:0: class=0x028000 rev=0x1a hdr=0x00 vendor=0x8086 device=0x2725 subvendor=0x8086 subdevice=0x0024 vendor = 'Intel Corporation' device = 'Wi-Fi 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak]' class = network igc0@pci0:89:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x15f2 subvendor=0x8086 subdevice=0x3019 vendor = 'Intel Corporation' device = 'Ethernet Controller I225-LM' class = network subclass = ethernet ``
- 
 
- 
 @Gertjan Yeah, the Wi-Fi isn't assigned at all. I'll try disabling it and see if it has any effect. 

