DualWAN set-up Still continues to not work… [ my x-mas wish]



  • Hey all,

    I've been bitching about this on twitter and exchanged a few comments with @sullrich & @cbuechler that the dual-wan solution is just not working properly with my pfsense set-up. Yet I am not doing anything very fancy and I've followed the guide at http://doc.pfsense.org/index.php/MultiWanVersion1.2

    I've already had to default SSL sessions onto one link to be able to support authenticated ssl (which to me makes no sense).

    My set-up is quite simple. I have two DSL links both working at about 21mpbs on the down and 2-3 on up. They are connected to my gatweay box through 2x 3com ethernet (same card and version) on a really nice rig with 2mb ram, an Intel E3300 & gigabyte mobo [exact hw is present in pfsenseHWconfig.txt]. Both DSL are set to get ips through DHCP.

    I don't have many rules and most of them are there to handle xbox, wow, torrent and a bit of homeweb traffic.

    I have an number of issues with the performance especially around loading web pages and accessing media across the web. When I highlight them as issue it is because, I have compared the same site/load with dual-wan and pfsense, singlewan pfsense and no pfsense. Each time I get acceptable if not better performance without the dual-wan pfsense combo. I've also ruled out the pc I am using as i am getting this from a 2xmacbook, a laptop, an iphone and ipodtouch not too mention my main gaming rig [a corei7 p6 mobo and gigaether].

    So what are my issues when pfsense dual wan is on:

    • media and especially pics & video take forever to load. for example:
      – youtube for example will start to load and then stall for 5 to 10 mins before continuing
      -- tweetdeck avatars can some times take 10mins to load
      -- google reader items, graphics can take at least 5 minutes to load
    • downloads can become really really slow especially when http
      -- yesterday i tried to download a 15mb pdf, it took close to 45 mins [off pfsense it took less than  couple]
    • pages many times will time out before loading and I have to refresh the page 3 to 4 times before it actually loads.
    • if a link goes down, all new traffic fails - pages don't  load cant start any downloads, …
    • noticed that if a link has bad responses on the loadbalancer ping checks it can also knock the traffic out

    Anyway I am a loss as to why this sucky performance. I also strongly suspect that there is no load-balancing happening (or very little) and I am getting the impression that the 2nd link is being throttled since the RRDgraphs never show anything above 2mbps speeds versus 8mbps showing on the 1st link.

    I am attaching my config in XML. I've previously posted some graphics version of my ruleset they havent changed. I can post over info but need to know what

    I upgraded Friday to version 1.2.3. I would like to make this work but if i can't get something to work properly but i think that if i can't resolve this by my next vacation day I may just have to spend the time switching to another platform.

    what I would like for X-mas is a pfsense set-up that works properly and successfully load-balances traffic out of both my links! >:(
    pfsenseHWconfig.txt
    mypfsenseconfig2.xml.txt



  • Sorry you loadbalancing wishes can't be granted with pfSense, if there are other solution out in the snow that will work I do not know.

    But a way things will work fine with pfSense is to let all traffic out on wan except torrent traffic.



  • You're using Realtek NICs, some of which have all kinds of hardware bugs. Try disabling hardware checksum offloading under System -> Advanced. Realtek are cheap junk in general, I'd change to some better quality NICs regardless.



  • @fvter:

    I've already had to default SSL sessions onto one link to be able to support authenticated ssl (which to me makes no sense).

    Blame the site, not your firewall.
    http://doc.pfsense.org/index.php/Multi-WAN_and_Compatibility#Web_site_incompatibility_with_changing_IP_addresses

    @fvter:

    – youtube for example will start to load and then stall for 5 to 10 mins before continuing

    Many things just plain don't work when requests come from multiple public IPs, again blame the site, not the firewall. See same link above.

    Your problems are a combination of I believe issues induced by your crappy NICs, and not understanding how networking and the web function (some things just can't be load balanced with anything, short of big $$ products with big monthly fees that tunnel your traffic out their datacenter to combine bandwidth with a single public IP on unlimited different Internet connections).



  • So the realtek is the on-board msi gigabyte ethernet! It's a really really common card!

    As for the issue with the web sites, I do know how networking works. And I find your answer actually a little too biais for it to be that of a solution trying to have a professional grade implementation!

    I agree that the responding web site ain't going to send traffic back to both links. The issue stems from the fact that my calls to the webtraffic from my home network are going out the pfsense box and in most cases should be routed via the same interface for each pair internal-external network. The problem is having the ability to ensure that those internal-external networks are properly distributed across both interfaces in a timely and fashionable manner ensuring proper session stability.

    Based on your answer I get more the impression that the pfSense team has no idea what true load-balanced networking is…



  • The pfSense team knows exactly what true load balancing is.
    You just expect something else than what it provides.

    As Perry wrote: With your expectations, pfSense is probably not the solution for you.



  • Your frustration is apparent and clear but your tone will not find you much help. The people who have posted in this thread are and always have been very helpful you'd be wise to heed their advice.

    "if a link goes down, all new traffic fails - pages don't  load cant start any downloads" implies you have configuration issues. also If you have bad monitoring IP's in your load balancing setup an interface could be flapping.

    Realtek interfaces are just plain shit! I learnt the hard way after having just shelled out on a jetway board, just because it came with two GB interfaces doesn't mean they are any good. I purchase another more expensive MSI board that came with two Intel GB NIC and most of my problems disappeared there and then, with any remaining issues being down to configuration.

    I currently (as do many others) have three wan interfaces working quite happily. Not all web traffic behaves in the same way over multiple wan interfaces. In an ideal world the youtube video would be downloaded to your ISP, then split up and shoved down your two modems where you would join them back together seamlessly, this is available but expensive to implement.

    What if you have two wan link from different ISP's, what happens when one chunk of the video arrives later than than the bit sent after it. What happens if a 'chunk' sent to one modem fails after the following chunk has already been sent to the other modem? it needs to be re-transmitted causing a train wreck. You can't expect LAN class loadbalancing over the wan without getting the ISP involved and spending some money.

    Where pfSense excels is if you have many users you can fire their wan requests out of many interfaces in a round robin fashion, spreading the load how you manage this traffic is up to you. Also if you can use applications that can make the most use of this fragmentation the better the results you can expect e.g. uTorrent's requests for a large .iso file are chopped up at an application level, you can then easily fire these out of many interfaces.

    I suspect the brunt of your issues are due to hardware and configuration.

    Merry Christmas.



  • You know, I have been struggling with understanding the implications of Advanced Outbound Nat (AON) on this sort of setup.

    I put in a post asking about directing certain ports over certain wan connections.
    In my case, I believe things were working correctly when AON was turned off, and started having troubles when my setup required AON rules turned on.

    I find the XML config file a bit hard to read, but it looks like you also have AON…  I re-read the how-to that you used to set yourself up and didn't see anything about AON in there.

    Did you have another resource you used to learn how to properly use AON (I only ask because I am having troubles with that end of things) ... but would your setup allow for Automatic outbound NAT rule generation turned back on for testing?

    Also as for performance issues... is there some way to see if after you have 'slowness' if anything is showing up at a kernel level (maybe dmesg, from the command prompt, like in linux?) or something turning up in the system log that doesn't look right.  I have also had bad experiences using Realtec (they happened to be older 10/100 cards)  But I would hate to throw hardware at a problem without some sort of errors help substantiate a hardware problem (rather than a configuration problem)

    well, that's my two cents, I hope you can get it to work, other than I few little quirky things, I have been enjoying learning something a little more complicated then my standard home office style routers, but it sure can be frustrating when you start hitting limitations.



  • You don't need change outbound NAT. The automatic rules are fine unless you need to NAT to multiple public IPs without doing 1:1, or need to use static port, which are the minority of cases. They only affect how traffic gets translated, they have no impact on where it's routed.

    @fvter:

    So the realtek is the on-board msi gigabyte ethernet! It's a really really common card!

    common != good. common on desktop motherboards == cheap. the rtl8139 cards are probably the most common 10/100 chipset ever made, and many/most of them have something broken in the hardware, broken hardware checksum offloading, or promiscuous mode is broken, or any number of other hardware issues. The gig cards aren't much different. Broken hardware checksum offloading causes connectivity issues that tend to exhibit themselves partly as what you're describing.

    @fvter:

    I agree that the responding web site ain't going to send traffic back to both links.

    That's not what I'm saying. When you browse a site, you have multiple HTTP connections out, some of each of them are going to get routed out different public IPs. It's round robin on a per-connection basis. That breaks some websites.

    @fvter:

    The issue stems from the fact that my calls to the webtraffic from my home network are going out the pfsense box and in most cases should be routed via the same interface for each pair internal-external network.

    And that's not how PF functions. There are additional mechanisms added that we may support in the future.



  • Sorry getting back to this so late but I've been very busy and haven't had time to get back to this…

    now I admit that I may have been a bit harsh, but I've been faced with this situation now since I switched to pfSense! one of the reasons I switched was because @SUllrich on twitter specifically said that HW support on pfSense was more generic that some of the other solutions out there. But I find this is not the case. Let me go back to some of what's been said:

    • The first time I installed and report (these similar issues), the same group said «its a hardware problem». So I completely changed the hardware and still having problems. And now once again I am being told its a hardware problem. So if pfSense isn't hardware agnostic then maybe it should say so and restrict the hardware… I turned off the checksum in the advanced options and that hasn't changed anything. This weekend I even swapped in a DGe530T to stop using the realtek and still having the same problems. So again my question is how to i get this to work properly, both my DSL lines have matching speeds so its less than likely a speed/response problem…

    • The only special set-up that i use for AON is because that is what was written in alot of the howtos and even in this forum to get better performance and compatibility with xbox-live and other gaming instances. I don't see how that would really affect the problem since I am only doing it for specific IPs and ports.

    • As for the general load-balancing issues, it has been mentioned to me at least a couple of times on this same forum, that certain things don't perform the way they should and more specifically when talking about link-failures. It just happens that right now I am facing intermittent link failures on one of my lines. Sure enough the dual-wan system doesn't work, it doesn't do as expected when the link goes down all traffic stops instead of rerouting to the other line! I don't understand why this should be the case.

    It's actually a shame, because this is whole situation has become a saw spot! There were alot of things appealing to me to switch to pfSense including some good feedback, the open nature and support for both dual-wan and ipv6.

    But overall, I find that the performance on the dual-wan just isn't there. I spent sometime looking through past posts on this forum and I get the impression that most users are using dual-wan in a more simple set-up as a network separation and backup solution and less the situation I am looking for LAN going out to outgoing WANs. All I asking the set-up to do is (and really not much at that):

    • balance all traffic outbound except for specific protocols going out on one link only
    • failover to a single link if one goes down

    I was hoping to get a little more insight on to how to trouble shoot this than just the typical - its your hardware! Honestly I have to deal with these types of issues everyday at work and don't particularly want to have to deal with them when at home! I would like to just have a working platform!



  • FYI: you can not access embedded youtube properly or n% of the time when using a default round robin method of load balancing.

    I have a alias, something along these lines.
    ext_youtube  64.15.112.0/20, 208.65.152.0/20, 208.117.224.0/20, 208.65.152.0/20, 64.233.167.99/20, 64.233.187.99/20, 72.14.207.99/20, 74.125.8.167/20

    Basically a summary for Youtube netblocks. I do my load balancing by having 2 failover pools and sending a number of things out one link, and other sites out the other.
    I do have a balance all rule, but I rarely use that for rules. There are just too many frigging sites that always expect you to come from the same site.



  • So granted that there are restrictions on how the target site may handle incoming traffic, i've never actually denied that. but this is essentially true of almost all environments.

    My issue is more on how the load-balancer engine handles this, why do I have to specifically create rules to manage the load-balancing/exclude badly responding sites: doesn't that break the whole principal of having a load-balancing engine?

    I don't understand why the engine can't just have enough intelligence to say:

    • internal_ip starting a session to subnet x.y.z.u
    • route traffic internal_ip to subnet x.y.z.u on outgoing int-1
    • keep routing that traffic for n time (secs, minutes)

    I've set up load-balancing on cisco routers and this type of performance issues didn't play a role!

    I am also still perplexed on why would the load-balancing/fail-over completely fail when one link goes down it doesn't make sense!

    just my 0.02€ worth



  • @fvter:

    I am also still perplexed on why would the load-balancing/fail-over completely fail when one link goes down it doesn't make sense!

    just my 0.02€ worth

    This means your config is wrong.
    I use dual wan with failover, and it works fine.  Including when a link fails.



  • I looked at your config, and here is where you have fubared it:

    <lbpool><type>gateway</type>
    <behaviour>balance</behaviour>
    <monitorip>208.67.220.220</monitorip>
    <name>WANLoadB</name>
    <desc>Load Balancing on the WAN Links</desc>
    <port><servers>wan|RRR.TTT.UUU.254</servers>
    <servers>wan|208.67.222.222</servers>
    <servers>opt1|XXX.YYY.ZZZ.1</servers>
    <servers>opt1|208.67.220.220</servers></port></lbpool>

    Why are your interfaces in there twice?

    The monitor IPs you have in there are wrong - because any interface can ping either 208.67.222.222 or 208.67.220.220, pfsense can't tell if a link is down.
    remove those extra items from the pool list for all of your lbpools - you have them in all of the pools.



  • ok, here is all u have to do, this should fix 90% of the probs you are having… very simple. i experienced alot of the probs u have also. and everything works great now!
    goto SYSTEM -> ADVANCED , then ENABLE  Use sticky connections.

    that should help alot with the timeouts, and vids not loading/playing correctly…
    give it a shot, and see how that works 4 u...

    btw, i'm running 3 modems, 2 are load balanced....

    -r0b



  • @fvter:

    So granted that there are restrictions on how the target site may handle incoming traffic, i've never actually denied that. but this is essentially true of almost all environments.

    My issue is more on how the load-balancer engine handles this, why do I have to specifically create rules to manage the load-balancing/exclude badly responding sites: doesn't that break the whole principal of having a load-balancing engine?

    I don't understand why the engine can't just have enough intelligence to say:

    • internal_ip starting a session to subnet x.y.z.u
    • route traffic internal_ip to subnet x.y.z.u on outgoing int-1
    • keep routing that traffic for n time (secs, minutes)

    And how is it going to know what that subnet is - you have nothing to tell you whether that's a /8, /15, /30 etc without performing lookups (probably WhoIs), which take non trivial amounts of time and may not even provide you accurate information.  I know a number of very large load balancing setups that do nothing more advanced than push a single internal unit (whether that's an IP, subnet or whatever) through a single external IP, attempting to balance the load that way, not dynamically.



  • @althornin:

    I looked at your config, and here is where you have fubared it:

    <lbpool><type>gateway</type>
    <behaviour>balance</behaviour>
    <monitorip>208.67.220.220</monitorip>
    <name>WANLoadB</name>
    <desc>Load Balancing on the WAN Links</desc>
    <port><servers>wan|RRR.TTT.UUU.254</servers>
    <servers>wan|208.67.222.222</servers>
    <servers>opt1|XXX.YYY.ZZZ.1</servers>
    <servers>opt1|208.67.220.220</servers></port></lbpool>

    Why are your interfaces in there twice?

    The monitor IPs you have in there are wrong - because any interface can ping either 208.67.222.222 or 208.67.220.220, pfsense can't tell if a link is down.
    remove those extra items from the pool list for all of your lbpools - you have them in all of the pools.

    I thought about that as well! the original intention was to have to addresses to ping in case of failure but even when I removed the openDNS servers from the config, it still doesn't work.
    The other ip addresses are the nexthop (ie. gateway) of each DSL connection.



  • Your interfaces should not be in there twice.



  • @cmb:

    That's not what I'm saying. When you browse a site, you have multiple HTTP connections out, some of each of them are going to get routed out different public IPs. It's round robin on a per-connection basis. That breaks some websites.

    You don't have round robin by source IP address ?

    What I mean by that is that all traffic from e.g.  source IP 192.168.1.100  will go via one route, and 192.168.1.101 will then be routed to a different public IP and so on.

    How does that work with FTP or VOIP then, as they both use control and data multiple paths.  Wouldn't they also get routed to different Public IP's which would break them.

    I don't get that.  Round Robin by source IP would fix that issue.

    Or am I missing something ?



  • You just described why FTP and voip are such problematic protocols….
    It is usually solved by forcing these protocols
    to only one WAN and not balance them.

    The other possibility is to use sticky connections.

    Use sticky connections
    Successive connections will be redirected to the servers in a round-robin manner with connections from the same source being sent to the same web server. This "sticky connection" will exist as long as there are states that refer to this connection. Once the states expire, so will the sticky connection. Further connections from that host will be redirected to the next web server in the round robin.

    however i dont know what the status of that feature is.
    The last i know is, that it doesn't work like it should.


Log in to reply