RC2.0 - Dual Wan failover + trafic shaping issue's

stephenw10

Can't help you with the limiter problem but with multiwan; what do you have the trigger level set to in your load balancing gateway?

Steve

FlashBlue

Steve,

For both groups the trigger is set to "member down", i could try experimenting with the other possibilities, but i would assume that "member down" includes physical link failure, high latency and packet loss.

Assuming that my observations from earlier are correct and pfsense kicks the cable offline due to the high latency's (i've seen 900ms and more before it kicks that connection from the group) than all i could try is the "packet loss" trigger. correct ? or am i making wrong assumptions here ?

Thanks for your assistance ! :)

stephenw10

I'm not 100% sure on this. In fact when I setup my own multiwan I had assumed just the opposite; that 'member down' implied completely down and not simply high latency or some packet loss. ~~However looking at other posts here it seems that 'high latency' is for high latency connections such as satellite links.~~
My own setup behaves similarly to your if I try to max it out for a test I get the full bandwidth on both WANs for around 15 seconds and then a lot less. I had assumed it was ISP level throttling but this is a much nice explanation, something I can act on! :)

Steve

Edit: See http://forum.pfsense.org/index.php/topic,37451.0.html

I have confirmed my second WAN is going down but I'm seeing nothing in the logs.

EDIT: I re-read that post and found I had completely misunderstood it! :-[

stephenw10

Hmm. OK.
I've played around with the trigger level settings including the actual values (in the advanced settings for each gateway) but have come to the conclusion that it is an ISP level restriction in my case. I'm not seeing any alarm messages in the system log.

Steve

FlashBlue

If it drops down after a few secs but keeps going, its a cap somewhere indeed.

The phenomenom i'm having is that it maxes out for a while, and than totally stops because that line is kicked from the failover group and all connections on it are terminated.
Which, is a bit more annoying than just slowing down :P

stephenw10

My connection slows down because it stops using one wan interface completely. :(

If you are sure it's failing over are you seeing apinger alarm messages in the system log?
If you are it should say why and then you can increase the ping time/packet loss/down time accordingly.

Steve

FlashBlue

Yes, i actually do get notifications about the connection supposedly being down, both by mails (have alerts set up) and in the syslog.

Jun 4 12:07:17 apinger: ALARM: GW_OPT1(84.192.64.1) *** delay ***
Jun 4 12:07:28 apinger: ALARM: GW_OPT1(84.192.64.1) *** down ***

FlashBlue

I have switched the routing group from "down" to "packet loss" which does prevent the connection from being removed from the group (even with ping times as high as 5000ms…).

But i still find it a mystery why 1 single http connection can actualy max out a line that has trafic shaping rules enabled, and has a limiter set to 50% of its bandwith for origin and destination limiting.

I still assume that i am doing something wrong, but i am unable to find what :-) any suggestions on that matter ? :-)

FlashBlue

Right, been more than a month since my last reply, currently running 2.0-RC3 (i386) , 23jul build, and still having same issue's…

One single http connection succeeds in maxing out one of the wan connections? I am at a loss why it refuses to follow the shaper settings and allow this to happen.

Never had any of those issue's with 1.*, that worked like a charm, it respected the limits set for the interfaces and followed the shaper rules, since 2.0 nothing but issue's.

Tried and failed :
-reinstall of 2.0
-restore of an old config file from a 1. machine
-reinstall of a newer 2.0 image, created config from scratch
-and so on...

What it comes down to, is that every setting about bandwith and shaping is being utterly ignored , a single http connection can max out a line (30 Mbits) that has 20mbits setup as usuable in the shaper, no matter if there is other traffic or not.
Instead of limits being applied, either everything grinds to a maddening slowness, or the routing group decides its had enough, and kicks the iface offline, taking all open connections with it.( and yes - messing with the trigger levels of the routing group helps a bit, but its not a solution, its a (half-effective) stop gap that doesnt always prevent the issue either)

I would highly appreciate and welcome any advice, either pointing out i'm doing something totally wrong, or telling me that its a known issue.

eri--

If you would show how you have conigured the shaper and your router than some help can be given.