Strange Packet loss - PFsense 2.0

djsmiley2k

Hi all, experiencing some weird issues when trying to setup a new set of routers.

We have single wan coming into two pfsense boxes, with CARP between them on its own interface, and shared Public IP along with a public IP assigned to each box, and on the lan side a shared LAN IP, and then a lan IP assigned to each box, nothing too "complicated" as far as I'm aware.

If I login to the router directly, I can ping any site with no loss - google.com, slashdot, my own home servers. However if I connect from a system on the lan side I'm getting anything between 15 - 60% packet loss. However pinging the router directly with no issues.

tim.bowers@TimPad:~$ ping www.google.com
PING www.l.google.com (209.85.229.103) 56(84) bytes of data.
64 bytes from 209.85.229.103: icmp_req=1 ttl=49 time=22.3 ms
64 bytes from 209.85.229.103: icmp_req=2 ttl=49 time=17.3 ms
64 bytes from 209.85.229.103: icmp_req=3 ttl=49 time=17.7 ms
64 bytes from 209.85.229.103: icmp_req=5 ttl=49 time=17.2 ms
64 bytes from 209.85.229.103: icmp_req=7 ttl=49 time=17.4 ms
64 bytes from 209.85.229.103: icmp_req=8 ttl=49 time=23.9 ms
64 bytes from 209.85.229.103: icmp_req=9 ttl=49 time=17.3 ms
64 bytes from 209.85.229.103: icmp_req=10 ttl=49 time=17.3 ms
^C64 bytes from 209.85.229.103: icmp_req=11 ttl=49 time=20.9 ms

--- www.l.google.com ping statistics ---
11 packets transmitted, 9 received, 18% packet loss, time 42255ms
rtt min/avg/max/mdev = 17.291/19.076/23.943/2.454 ms
tim.bowers@TimPad:~$ ping 10.2.1.5
PING 10.2.1.5 (10.2.1.5) 56(84) bytes of data.
64 bytes from 10.2.1.5: icmp_req=1 ttl=64 time=0.435 ms
64 bytes from 10.2.1.5: icmp_req=2 ttl=64 time=0.422 ms
64 bytes from 10.2.1.5: icmp_req=3 ttl=64 time=0.279 ms
64 bytes from 10.2.1.5: icmp_req=4 ttl=64 time=0.256 ms
64 bytes from 10.2.1.5: icmp_req=5 ttl=64 time=0.250 ms
64 bytes from 10.2.1.5: icmp_req=6 ttl=64 time=0.321 ms
64 bytes from 10.2.1.5: icmp_req=7 ttl=64 time=0.277 ms
64 bytes from 10.2.1.5: icmp_req=8 ttl=64 time=0.233 ms
64 bytes from 10.2.1.5: icmp_req=9 ttl=64 time=0.285 ms
64 bytes from 10.2.1.5: icmp_req=10 ttl=64 time=0.287 ms
64 bytes from 10.2.1.5: icmp_req=11 ttl=64 time=0.286 ms
64 bytes from 10.2.1.5: icmp_req=12 ttl=64 time=0.260 ms
64 bytes from 10.2.1.5: icmp_req=13 ttl=64 time=0.378 ms
^C
--- 10.2.1.5 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 11998ms
rtt min/avg/max/mdev = 0.233/0.305/0.435/0.064 ms
tim.bowers@TimPad:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.2.1.0        0.0.0.0         255.255.255.0   U     1      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0
0.0.0.0         10.2.1.5        0.0.0.0         UG    0      0        0 eth0

I've disabled all services other than NTPsync and the DNS forwarder.

I'm kind of at a complete loss. The network its self is a bit more complicated - As this is a new set of routers we are doing, there is also an existing pfsense box, which everyone is using currently as their gateway (so I can't turn this off/remove it). However I don't see why this would have any effect on the new systems.

Your help is very much appreciated :)

djsmiley2k

Kind of disappointed theres been no responses…. is that normal?

Anyway, the show goes on. I rebuilt the boxes due to finding one had broken its heatsink off the raid card.

I have have fully CARP setup which falls over beatifully when using Automatic NAT - however then setting to Manual nat and I'm once again seeing the loss I was seeing before.

I have the rules which match other posts - translating on the wan to update the IP with the gateway IP yet this doesn't seem to make any difference. I am yet to check what the IP that is seen from outside is, but I believe it'll be the correct IP (i.e. the one of the virtual CARP ip)

hytek

Do you have any traffic shapers enabled on either the LAN or WAN? If so, try putting ICMP packets in your ACK queue (which should have the highest priority), and try again.

Another thing that can cause it (which was in our case) We had the pfsense sitting between the LAN and a Cisco router which connected to our ISP. The cisco router actually caused us to have an average 15% packet loss. Removed the Cisco router from the equation, and now we average around .1% during heavy traffic.

hytek

Here is a comparison during the actual transition from using the Cisco router on the edge, to using only the pfsense box now.

Gateway that was used for quality was the internal IP of the Cisco router (low latency, but horrible packet loss)

Notice that Monday through Friday the packet loss was bad, but the latency was very low.

Gateway now used for quality is the ISP gateway.

Sunday you can see a small break, that was the transition time from removing the cisco router. The spike in latency on Sunday was from downloading 4 1080p HD movies from youtube at the same time. Notice however there was no to minimal packet loss during those downloads.

quality.jpg_thumb

Tikimotel

Pardon the question but,

A single WAN –> two different pfsense boxes interconnected with CARP --> 2 LAN's ?
Where is the redundancy in this configuration?

Why not use:

WAN --> pfsense --> larger switch --> single LAN

As a home user, I never had packet loss because of pfsense not working properly, only experienced packet loss because of faulty ISP hardware.

Last thursday , my ISP updated the software my cable modem, but forgot to reset it afterwards.

status_rrd_graph_img.php.png
status_rrd_graph_img.php.png_thumb

jimp

That kind of loss could also be explained by CARP flapping up/down back and forth between the master and slave boxes. I would especially suspect a CARP issue (probably at layer 2, meaning your switch).

Check the system log, and the carp status on both.

djsmiley2k

Was a network card issue as far as I can tell.

With new boxes - no loss, however I haven't tried with the onboard broadcom cards (which was what I was partly using before).

The reason for the carp is if either of the routers falls over - plus it means we can upgrade one and have the other running happily.

We do have a 2nd ISP, but no IP range with them…. it was never setup correctly in the past and I'm doubtful of it happening now - too much chance of knocking everything offline by accident.