WAN Package Loss
-
Hi everyone,
after about 3 weeks of searching I am pretty helpless.
I have an SG-3100 and now bought a 6100. At first I transfered my Backup from the 3100 on a huge ProLiant HPE Server but after about 1 day we had package loss of about 30% or more. Strange is also, that the package loss doesn't stay. It just comes and goes...
So i tought maybe its the old LAN cards from the ProLiant and bought the 6100. I transfered the config and everything seemed fine, but now one day later I have the same isse again. WAN loses packages.
So here are some informations, so that you know, what my configs and so are.- Version: 21.05.2-RELEASE (amd64) (pfsense+)
- Packages: Suricata, pfblockerng-devel, ntopng, iperf
- Gateway: Cause of our provider we have to use a FritzBox as a modem (running in exposed host mode)
- Internet speed: We have a 100/50 fiber connection
- Others: Using traffic shaper (but also had the problems without it)
What I already tired to fix the problem:
- Disable flow control on the new 6100
- Randomly disable packages, cause maybe one causes issues (sometime then it works again for some time but comes back some hours later)
- Switch the cable between FritzBox and 6100
- Tried other ports on the FritzBox
- Disabled the Gateway Monitoring Action in pfSense
I also attached a screenshot what the general and gateway logs look like when the gateway goes down.
It's maybe also good to mention that I sometimes found Error 55 and 64 in the Gateway logs, but not always when the Gateway loses packages. So I don't know if that can be the reason.
Also thought about, that the 3100 is based on a ARM chip while both other systems I transfered the config to are AMD64 chips. But I really don't want to start all over again, cause we have VPN Server running and certifcate for that, so it would be really time consuming to set that all up without restoring the config!!!I am a techie but definitely no pro when it comes to firewalls. I know how to build lans for small businesses and setup a pfsense for normal usage. So I am open for any suggestions and will try them out and the provide feedback. Thanks in advance for any help. I can really need it.
-
@derx05
So you're gateway monitoring ping the FritzBox itself?Check out Status > interfaces to see if there are errors or collisions on WAN.
There should be something in the system log hinting to the issue.
But the section of the log you've posted shows not really much. Maybe you can post the whole log.Also check out the FritzBox log.
-
@viragomann
Thanks for the replay.
So about the monitoring ip. The field for custom monitoring ip is empty and according to the Wireshark the pfsense pings the FritzBox IP which is 192.168.177.1. So that seems to work fine.
I also attached the interfaces screenshot.
About the logs. The screenshot I send is all you can find about the problem. Before the log from 10:39 the is a huge time gap, so that really are all logs about that problem, cause our Internet dropped at 10:39. That‘s also why I can‘t find the cause, cause for me there aren‘t any useful information in the logs, when the problem occurs.
I just found something out which I have to test for the next days.
I just changed the suricata setting, so that the wan interface is now not in promiscious mode. (As far as I understood it, this can cause, that suricata can‘t watch all packets). Cause when I disabled suricata today the problem instantly went away and in the logs it said promiscious mode deactivated. So now I am running suricate without it on the wan interface. If the problems doesn‘t show up the next days then I ask myself why this mode can cause these problems. But I sadly often were at the point were I thought I found the cause and some hourse later it came back.
About posting the whole logs: I am just but worried, that there are some informations in the logs I don‘t want to shard, like public ip or mac adresses. Do I need to be worried about sending the logs in the forum? Like I said. I know a lot about networking but am new to pfSense.
Thanks for your help -
@derx05
I don't really think that suricata can cause this issue, as the gateway monitoring is an outbound connection from pfSense itself. But I'm not experienced with it, since I don't use it or something like that.Yes, when posting log files, you should remove / hide public IPs. But as your pfSense is behind a NAT route, it won't have any anyway.
Since the interface status doesn't show any errors or collisions, I'd also consider to replace the FritzBox for testing.
The whole gateway monitoring happens between pfSense and FB. As you mentioned, you have already replaced the pfSense hardware and the network cable. So the last hardware part is the FritzBox. -
Can you eliminate the Fritz box or put in modem mode.
-
@viragomann
Oh I think I forgot the write in the first statement that I don‘t have these problems with the 3100. If it would be the fritzbox it would also cause these problems with the 3100 I think. I will download the logs files on my pc later and post them here after I looked through them! -
@nogbadthebad Sadly a modern Fritzbox doesn‘t even have a modem mode anymore and cause of our provider I can‘t use another modem. They are a local fiber provider and the Fritzboxes are used for metrics and debugging so I am not allowed to directly plug the WAN Cable from the FritzBox in the pfSense. Thats really sad. Of course I put a static route in the FritzBox and disabled the NAT on pfsense, so PortForwarding works like without a FritzBox in between. Also like I said. With the same settings it worked on the SG 3100… So I am a bit helpless
-
So here are the logs!
I just want to mention, that all logs before 10.02 about 12 o‘clock are useless cause thats where I switched hardware from the ProLiant Server to the 6100. So I think you should look at the logs from this morning, cause we had about 3 packages losses today!
I looked through them and don‘t think that there are any public IPs or else in so should be safe to post them here! -
30% packet loss like that 'feels' like an IP conflict, especially when it's inside the same subnet.
I would expect to see that reported in the logs though.
Steve
-
@stephenw10
You mean an IP conflict in the subnet from the FritzBox and the pfSense? Well in this subnet are only 2 devices! The FritzBox and the pfSense. Directly connected via a LAN cable. So I don‘t know how an ip-conflict can be created here. Also again. With the 3100 it worked. I am worried that this is a driver or software related issue, what shouldn‘t happen cause it’s a netgate device -
Yes, that's what I mean. 30% loss is massive.
Something wifi connected maybe?
It's very unlikely to be a driver issue with the 6100.
Have you actually swapped back in the 3100 since this started to prove it is still unaffected?
So you have the WAN MAC spoofed?
Steve
-
@stephenw10
Sometimes it’s about 50%!
Wifi on the FritzBox is off so no Wifi Connection!
Yes for the last 2 days before the 6100 arrived I switched back to the 3100 and everything worked fine!
No I don‘t have the MAC spoofed. When I switch the firewall I change which MAC is the exposed host in the FritzBox. -
Hmm, well as I say I would expect numerous entries in the system log if it was an IP conflict.
I would be running a packet capture when it happens then. Se if there is anything unusual happening with the traffic.
Steve
-
@stephenw10
Ok i already did a packet capture this morning while we were experiencing the package loss and couldn‘t find any suspicious traffic but also I have to admit I am not very familiar with Wireshark so could also be possible I miss something.I am not sure if its wise to post the data here cause it contains raw network traffic. On the other side all important traffic is encrypted these days and IPs from public servers can be shared?
Shall I just post it here?Also a note to something I said before: Since I disabled promiscious mode in suricata I didn‘t have a loss according to the logs. So maybe it has to something promiscious mode?
-
Hmm, well seems likely though I would not expect it.
You can PM me link the cap if you want.
-
Update: So over the weekend the problem sadly didn't occur again.
@stephenw10 As promised I will send you a much longer packet capture again when it should happen again. I din't change any settings so it should be only a matter of time. Also I switched the LAN cable between the Fritzbox and the pfSense again, just to make sure.
Will keep you updated here. -
So as promised an update:
The error occured again twice today.
Again no useful logs before the loss starts. I just sent @stephenw10 my logs and a 10 minute long packet capture. If we can't solve the problem with these logs and the capture I honestly don't know what else I could try... -
Can you try monitoring a public IP like 1.1.1.1 for the affected WAN gateway?
It could be an upstream interferer on your cable segment. Cable devices can then lose their heads if they lack sync. -
@nocling With an upsteam interferer you mean a problem on the WAN side? Just for your note: We have fiber, not cable. I don't think an upsteam interferer can happen with fiber or am I wrong? But also worth a shot. I think I will change that when I am back at the company next week
-
@nocling
Today I also changed the monitoring IP but still have the package loss. So I will now reset the 6100 and then set ip back up without restoring the config.