WAN interface stops working at random times often after nights or when internet has not been used as much (and we dont use much traffic since we have a 4G modem from Teltonika)
-
This problem started BEFORE change of hardware, so I though it was my Realtek NICs so I changed to another server with use of Dell LOM quad port gigabit nic. And started virtualising it instead, and the problem persists, I swapped my old Dlink 4G modem to a Teltonika RUT950, and problem still exists for me, I tried with and without hardware offloading, my temporary fix is to force another duplex, for example if I have it on default I can change it to 1000BaseT Full-Duplex and it will work, as soon as it stops working I can change it to 1000BaseT instead for example, and it starts working again. Then it stops after some time again, it could be 10 minutes/1 day/4 days. Litterly random times, this is really frustrating since I'm not always on the location this pfSense is hosted.
2022-07-28 23:41:16 notice rtr01 CHECK_RELOAD_STATUS Restarting OpenVPN tunnels/interfaces notice 2022-07-28 23:41:16 notice rtr01 CHECK_RELOAD_STATUS Reloading filter notice 2022-07-28 23:41:19 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:41:20 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:41:24 notice rtr01 RADIUSD (229) Login OK: [arvid] (from client AxelClosetAP01E port 0 via TLS tunnel) notice 2022-07-28 23:41:24 notice rtr01 RADIUSD (230) Login OK: [arvid] (from client AxelClosetAP01E port 0 cli DC-A2-66-66-0A-39) notice 2022-07-28 23:41:48 notice rtr01 RADIUSD (242) Login OK: [arvid] (from client AxelClosetAP01E port 0 via TLS tunnel) notice 2022-07-28 23:41:48 notice rtr01 RADIUSD (243) Login OK: [arvid] (from client AxelClosetAP01E port 0 cli DC-A2-66-66-0A-39) notice 2022-07-28 23:41:49 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:41:50 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:42:19 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:42:21 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:42:50 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:42:51 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:43:20 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:43:21 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:43:50 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:43:51 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:44:00 err rtr01 NGINX 2022/07/28 23:44:00 [error] 78894#100710: *12076 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.14.30.60, server: , request: \"POST /widgets/widgets/system_information.widget.php HTTP/2.0\", upstream: \"fastcgi://unix:/var/run/php-fpm.socket\", host: \"rtr01.prd.se-mmx.zyner.net\", referrer: \"https://rtr01.prd.se-mmx.zyner.net/\" err 2022-07-28 23:44:06 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:44:07 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:44:15 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:44:16 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:44:19 info rtr01 KERNEL igb0: link state changed to DOWN info 2022-07-28 23:44:19 notice rtr01 CHECK_RELOAD_STATUS Linkup starting igb0 notice 2022-07-28 23:44:19 warning rtr01 DPINGER WAN_DHCP 100.105.216.1: sendto error: 50 warning 2022-07-28 23:44:20 warning rtr01 DPINGER WAN_DHCP 100.105.216.1: sendto error: 50 warning 2022-07-28 23:44:20 err rtr01 PHP-FPM /rc.linkup: DEVD Ethernet detached event for wan err 2022-07-28 23:44:20 warning rtr01 DPINGER WAN_DHCP 100.105.216.1: sendto error: 50 warning 2022-07-28 23:44:21 warning rtr01 DPINGER WAN_DHCP 100.105.216.1: sendto error: 50 warning 2022-07-28 23:44:21 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb1: Operation not permitted warning 2022-07-28 23:44:21 notice rtr01 CHECK_RELOAD_STATUS Reloading filter notice 2022-07-28 23:44:21 warning rtr01 DPINGER send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 2a06:a004:7010::1 bind_addr 2a06:a004:7010::2 identifier \"ROUTE48_MALMOv6 \" warning 2022-07-28 23:44:22 warning rtr01 LLDPD unable to send second SONMP packet on real device for igb2: Operation not permitted warning 2022-07-28 23:44:23 notice rtr01 CHECK_RELOAD_STATUS Linkup starting igb0 notice 2022-07-28 23:44:23 info rtr01 KERNEL igb0: link state changed to UP info 2022-07-28 23:44:23 warning rtr01 DPINGER ROUTE48_MALMOv6 2a06:a004:7010::1: Alarm latency 0us stddev 0us loss 100% warning 2022-07-28 23:44:23 info rtr01 RC.GATEWAY_ALARM >>> Gateway alarm: ROUTE48_MALMOv6 (Addr:2a06:a004:7010::1 Alarm:1 RTT:0.000ms RTTsd:0.000ms Loss:100%) info 2022-07-28 23:44:23 notice rtr01 CHECK_RELOAD_STATUS updating dyndns ROUTE48_MALMOv6 notice 2022-07-28 23:44:23 notice rtr01 CHECK_RELOAD_STATUS Restarting IPsec tunnels notice 2022-07-28 23:44:23 notice rtr01 CHECK_RELOAD_STATUS Restarting OpenVPN tunnels/interfaces notice 2022-07-28 23:44:23 notice rtr01 CHECK_RELOAD_STATUS Reloading filter notice 2022-07-28 23:44:24 err rtr01 PHP-FPM /rc.linkup: DEVD Ethernet attached event for wan err 2022-07-28 23:44:24 err rtr01 PHP-FPM /rc.linkup: HOTPLUG: Configuring interface wan err 2022-07-28 23:44:24 notice rtr01 CHECK_RELOAD_STATUS rc.newwanip starting igb0 notice 2022-07-28 23:44:24 err rtr01 PHP-FPM /rc.linkup: calling interface_dhcpv6_configure. err 2022-07-28 23:44:24 err rtr01 PHP-FPM /rc.linkup: Accept router advertisements on interface igb0 err 2022-07-28 23:44:24 err rtr01 PHP-FPM /rc.linkup: Starting rtsold process err 2022-07-28 23:44:25 err rtr01 PHP-FPM /rc.newwanip: rc.newwanip: Info: starting on igb0. err 2022-07-28 23:44:25 err rtr01 PHP-FPM /rc.newwanip: rc.newwanip: on (IP address: 100.105.216.234) (interface: WAN[wan]) (real interface: igb0).
This is the logs when it happens, and no LLDPD message is not the problem, I have fix the LLDP problem and the problem was happening before I made the change in LLDP so they are not related to each other.
I have PCI passthrough my Intel NIC and disabled memory ballooning and set a higher CPU share than other VMs so basicly the VM is like Bare-metal. And this happens ONLY with WAN interface, so if I plug my WAN into igb0 it starts happening after some time, and if I move WAN cable to igb3 for example, it starts working again and then when it stops woring I can change back to igb0 for example. And I also set igb0 or igb3 on WAN interface on pfSense as I should. So I think what's needed is that I take my wan interface and like make it go DOWN and then UP like it should if I would run
ip link set up/down
or what the command is on Linux. I would really really really enjoy some help here, I am really frustrated. Thanks! I can share more if needed. -
Yeah, it shows it actually losing link but then recovering.
At the end of that log are you still unable to pass traffic?
You can run
ifconfig igb0 down; ifconfig igb0 up
and it will likely restore it.You might try not using PCI passthrough if you have not already. If it's losing link because of something upstream that would isolate it from that.
Steve
-
@stephenw10 Hey! Thanks for trying to help me out :D I think PCI passthrough disabled won't make any changes, since it was having problems even when I had it bare-metal before. Any other ideas?
-
Mmm, well in this instance using a VM without PCI pass through would isolate the VM from link changes on the real NIC. So it might well behave differently from bare-metal.
-
@stephenw10 Yeah but that seems like a dumb way of "solving" it, doesn't it?
-
Yes, long term, I agree. But if it makes any difference at all then that's clue as to what the actual cause might be.
Otherwise wait for it to fail again and then start digging into what's actually not working.
What does ifconfig show?
Do you see anything in a pcap?Steve