very strange problem



  • Hi all,

    I've experienced a very odd problem a few times on an SG-3100 running 2.4.4p2.

    What happens is out of nowhere I cannot browse Internet web pages from my LAN (affects both wired and wireless computers). I login on pfsense and everything in the management display shows as normal. It shows no packet loss or high latency, at least at the time I logged in. I have an openvpn tunnel that still shows as being "up". Also, traffic through the VPN seems to be fine during this issue.

    If I unplug the WAN ethernet cable and immediately plug it back in, everything begins working immediately. I have now done this on two occasions previously and this has fixed the issue both times.

    Last night, around 9:20 PM eastern, this happened again. I decided to leave it alone and see what the network would be doing in the morning. In the morning, the problem was still there, no web browsing from the LAN.

    I have monitoring setup on the firewall and attached is a graph of packet loss and latency for the WAN (Spectrum Internet) for the last 2 days. I have highlighted a spike that corresponds to the loss of browsing ability last night around 9:20 PM. As you can see, the stats seem to go back to normal soon after this spike, however, I still could not browse.

    This morning, I replaced the cable between the SG-3100 and my cable modem. This fixed the issue again but I'm anticipating this happening again. It seems to be between 7-10 days between these events so far.

    Does anybody have an idea about this? Would you say this is likely an upstream issue? Could it be a hardware problem?

    Thanks for any advice.

    0_1550763358140_Screen Shot 2019-02-21 at 6.27.59 AM.png



  • @vronp said in very strange problem:

    Does anybody have an idea about this? Would you say this is likely an upstream issue? Could it be a hardware problem?

    No, maybe, and yes.

    Was there anything relevant in the System log? What packages do you have installed? When everything dies, can you still ping out to 8.8.8.8 ie might it be a DNS issue?



  • @kom said in very strange problem:

    @vronp said in very strange problem:

    Does anybody have an idea about this? Would you say this is likely an upstream issue? Could it be a hardware problem?

    No, maybe, and yes.

    Was there anything relevant in the System log? What packages do you have installed? When everything dies, can you still ping out to 8.8.8.8 ie might it be a DNS issue?

    Hi,

    Thank you for the reply.

    Packages are openvpn-client-export, vnstat, and freeradius3 (not doing anything with this yet). I will have to wait for the next time to try a ping. Obviously that was something basic that I should have tried but I didn't....

    I took a closer look at the monitoring graph. I do not know the details of how the graph is created in terms of time slices, etc, but it appears the "event" started at 02:10 UTC, peaked at 02:20 UTC, and concluded at 02:35 UTC. Again, the truly odd thing is that I could not browse hours later.

    Here are 2 sections of System logs that seem to be at least somewhat relevant. They look very much like the logs I see when Spectrum is having issues. Again, the odd thing here is that Spectrum went back to normal but I could not browse hours after the fact.

    section 1

    Feb 21 02:14:25 rc.gateway_alarm 36096 >>> Gateway alarm: OVPNRSV_VPNV4 (Addr:172.30.1.1 Alarm:1 RTT:285.137ms RTTsd:146.835ms Loss:21%)

    Feb 21 02:14:25 check_reload_status updating dyndns OVPNRSV_VPNV4

    Feb 21 02:14:25 check_reload_status Restarting ipsec tunnels

    Feb 21 02:14:25 check_reload_status Restarting OpenVPN tunnels/interfaces

    Feb 21 02:14:25 check_reload_status Reloading filter

    Feb 21 02:14:27 php-fpm 36752 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use OVPNRSV_VPNV4.

    Feb 21 02:14:41 rc.gateway_alarm 70842 >>> Gateway alarm: WAN_DHCP (Addr:XX.XXX.XXX.X Alarm:1 RTT:10.923ms RTTsd:5.455ms Loss:21%)

    Feb 21 02:14:41 check_reload_status updating dyndns WAN_DHCP

    Feb 21 02:14:41 check_reload_status Restarting ipsec tunnels

    Feb 21 02:14:41 check_reload_status Restarting OpenVPN tunnels/interfaces

    Feb 21 02:14:41 check_reload_status Reloading filter

    section 2

    Feb 21 02:14:25 dpinger OVPNRSV_VPNV4 172.30.1.1: Alarm latency 285137us stddev 146835us loss 21%

    Feb 21 02:14:41 dpinger WAN_DHCP X.X.X.X: Alarm latency 10923us stddev 5455us loss 21%

    Feb 21 02:14:56 dpinger WAN_DHCP X.X.X.X: Clear latency 10660us stddev 5058us loss 20%

    Feb 21 02:15:01 dpinger OVPNRSV_VPNV4 172.30.1.1: Clear latency 237297us stddev 152114us loss 15%

    Feb 21 02:17:15 dpinger OVPNRSV_VPNV4 172.30.1.1: Alarm latency 259192us stddev 153655us loss 21%

    Feb 21 02:17:36 dpinger OVPNRSV_VPNV4 172.30.1.1: Clear latency 262989us stddev 142673us loss 20%

    Feb 21 10:21:37 dpinger WAN_DHCP X.X.X.X: sendto error: 65

    Feb 21 10:21:37 dpinger WAN_DHCP X.X.X.X: sendto error: 65

    Feb 21 10:21:38 dpinger WAN_DHCP X.X.X.X: sendto error: 65

    Feb 21 10:21:38 dpinger WAN_DHCP X.X.X.X: sendto error: 65

    Feb 21 10:21:50 dpinger OVPNRSV_VPNV4 172.30.1.1: Alarm latency 17214us stddev 2441us loss 21%

    Feb 21 10:21:50 dpinger WAN_DHCP X.X.X.X: Alarm latency 8056us stddev 953us loss 22%

    Feb 21 10:22:01 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr X.X.X.X bind_addr X.X.X.X identifier "WAN_DHCP "

    Feb 21 10:22:01 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.30.1.2 bind_addr 172.30.1.2 identifier "OVPNGW "

    Feb 21 10:22:01 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.30.1.1 bind_addr 172.30.1.2 identifier "OVPNRSV_VPNV4 "

    Feb 21 10:22:18 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr X.X.X.X bind_addr X.X.X.X identifier "WAN_DHCP "

    Feb 21 10:22:18 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.30.1.2 bind_addr 172.30.1.2 identifier "OVPNGW "

    Feb 21 10:22:18 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 172.30.1.1 bind_addr 172.30.1.2 identifier "OVPNRSV_VPNV4 "



  • That log doesn't look very good. High latency and packet loss.

    What type of NICs are you using?





  • @kom said in very strange problem:

    That log doesn't look very good. High latency and packet loss.

    What type of NICs are you using?

    As stated, this is an SG-3100, from Netgate. So, I believe they are Intel.

    The concern I have is not this "event", but the ongoing result after the packet loss and latency appear to have returned to normal levels.

    I think the next time this happens, after I do what you suggest regarding ping tests, I will disconnect/reconnect the coax cable from the modem and leave the ethernet cable to the pfsense box alone.



  • @nogbadthebad said in very strange problem:

    Realtek by the looks of things

    https://forum.netgate.com/topic/117295/pfsense-ignoring-dhcp-offer-on-wan/7

    Really? That 2-year old post concerned a completely different device, a Partaker fanless PC. And, that particular issue was solved by installing Realtek drivers in pfsense.

    This thread involves a Netgate SG-3100. I am pretty sure that's on the list of supported devices.



  • @vronp

    I stand corrected but I did say "Realtek by the looks of things"



  • @vronp said in very strange problem:

    @kom said in very strange problem:

    That log doesn't look very good. High latency and packet loss.

    What type of NICs are you using?

    As stated, this is an SG-3100, from Netgate. So, I believe they are Intel.

    The concern I have is not this "event", but the ongoing result after the packet loss and latency appear to have returned to normal levels.

    I think the next time this happens, after I do what you suggest regarding ping tests, I will disconnect/reconnect the coax cable from the modem and leave the ethernet cable to the pfsense box alone.

    This appears to be resolved and was apparently due to a faulty modem (Netgear CM600).


Log in to reply