[SOLVED] XG-7100 1U WAN Gateway goes offline after 10-20 min, 100% Packet loss
-
Hi all, thank you for any help in advance..hopefully, I have this post in the correct place on the forums
I am replacing a Unifi USG-Pro with a Netgate XG-7100 1U. My WAN gateway in pfsense keeps dropping and reporting 100% packet loss. Below is a rundown of what is going on and things I've tried, at a loss as to what to try next
the current setup is
CenturyLink CPE Adtran router -> Switch -> pfsense -> LANI know that the Adtran isn't going down because I have a Sonicwall and the USG-Pro plugged into the switch that the pfsense box is plugged into and never experience internet loss for each of those. Other devices still can ping Adtran while this happens
Network setup is (IPs changed for obvious reasons)
- Network: 71.23.12.80/29
- Adtran: 71.23.12.81
- USG-Pro: 71.23.12..82
- Sonicwall: 71.23.12.83
- pfsense: 71.23.12.84
If I reboot pfsense the WAN will be online for a few minutes (10-20) before dropping and if I let it run for a while it will occasionally come back online for a few minutes before dropping again
Things I've tried so far
- Looked into others issues on the forums, including this one
- Put the pfsense WAN behind the USG on its own network to confirm pfsense setup and port were ok
never showed any issues, Online 100% of the time - Disabled Block private networks and loopback addresses and Block bogon networks
- Disabled Gateway Monitoring...internet still dropped
- Changed Monitor IP to a Ubuntu Digital Ocean Droplet
- Disable Gateway Monitoring Action
- Changed Data Payload to 1
- Changed WAN from Port1 to Port3, same symptoms as the original issue
- Manually set the Switch port to 1000baseT full-duplex
- Replaced patch cable from pfsense to switch
Other bits of info
When pfsense reports the gateway is offline I get 100% packet loss on anything I try to ping (not shockingly). So far I've tried to ping:- Adtran: 71.23.12.81
- google.com
I am able to ping the USG-Pro (71.23.12.82) from pfsense box's WAN Port (I enabled IMCP temporarily on WAN Local of the USG to troubleshoot)
Gateway Logs
All of the log messages are similar to this for the gateway when the issues arriseApr 9 13:48:10 dpinger WANGW 71.23.12.81: Alarm latency 0us stddev 0us loss 100% Apr 9 13:48:08 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.23.12.81 bind_addr 71.23.12.84 identifier "WANGW " Apr 9 13:29:09 dpinger WANGW 71.23.12.81: Alarm latency 0us stddev 0us loss 100% Apr 9 13:29:07 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.23.12.81 bind_addr 71.23.12.84 identifier "WANGW " Apr 9 13:24:56 dpinger WANGW 71.23.12.81: Alarm latency 555us stddev 134us loss 22% Apr 9 13:05:06 dpinger WANGW 71.23.12.81: Clear latency 42911us stddev 222477us loss 14%
-
So in doing more digging it is coming back online for ~10min and then dropping again. If I delete the ARP record for the gateway in the ARP Table the ping will go back down and internet will be restored until the problem comes back in a few minutes. So it seems that when that record expires is when the internet comes back on for a few min
I have a call out to CenturyLink techs for Monday
-
I spent most of the day on the phone with CenturyLink techs and a tech that supports the SonicWall that is on-site, the CenturyLink tech noticed that in the ARP table for the Adtran the Sonicwall's MAC address would start reporting itself for all of the IPs in our static block (except .82 for some reason) and when it did that is when the internet drops on the XG-7100. I followed up with the Sonicwall team and they read me the config for the WAN and it all seems fine
Has anyone else heard of anything like this? I am going to follow up with them again tomorrow and verify they don't have any IP aliases set up in there for some reason and if they can't help me will probably just have to 1:1 nat their box behind the pfsense box but wanted to see if anyone out there ran into this before
I have had the SonicWall unplugged for some time now and the issue hasn't happened so I feel confident it is a configuration issue in their box somewhere
-
What subnet are the Sonicwall using on WAN?
-
/29, I too was suspect of that being the issue unfortunately they confirmed to me it was correct
-
I was able to get the login for the SonicWall. nothing in the configuration or logs are jumping out at me. The only thing I noticed was the MTU size was set to 1404 instead of the usually 1500 on the WAN Port. They are having another tech look and see if they see anything
-
Some of the equipment is trying to be the main FW for /29 subnet.
Try using a /32 subnet for the Sonicwall and everything connected to wan.
-
@Cool_Corona This definitely seems the case, we did some digging and after searching for ARP Proxy and SonicWall a lot of other people seem to have this issue (https://www.reddit.com/r/networking/comments/4ijdl7/why_sonicwall_took_over_the_arp_for_the_whole_wan/)
Unfortunately, it doesn't look like Sonicwall's support /32 WAN subnets so I followed some suggestions on that Reddit post and will report back. If this fails and SonicWall support can't help me then we plan on doing a Static ARP table in the Adtran which isn't ideal but is a workaround for the time being
-
pfsense hasn't dropped the internet once since they made a few changes in the Sonicwall. I asked what the Sonicwall Tech had to change. If I hear back I will post the solution for my issue in here in case anyone else runs into something similar. Thank you @Cool_Corona for your input
-
This post is deleted!