WAN load balancer keeps choosing wrong gateway for 1 host

TuxTiger

Hi,

I have configured the gateways to form a group at the 'routing' menu. I have made a loadbalancing group and 2 failover groups, I had a little trouble find a good ping target but now everything is running fine. Port forwards, preferred gateways for some protocols etc. TOP!

Using 'Firewall -> Rules' I have directed an internal host to use OPT1 as gateway for all targets and all protocols but to 1 particular destination host…..pfsense chooses the WAN route instead of OPT1.

I thought of some 'state' which I should clear but whatever I try....even reboot/shutdown...when I telnet from this OPT1-host to this particular destination host...pfsense immediately chooses to exit via the WAN gateway.

What can I do to research this problem? Can I do some detective work in the shell?

I am using: pfSense-2.0-BETA1-1g-20100301-2151-nanobsd.img on a ALIX board.

Cheers!

TuxTiger

Oopsss…when I started with 2.0 I tried the mirror for download, now I see the 'regular' site does have more recent builds. I will try to upgrade and test further. ;D

TuxTiger

Installed the last version (pfSense-2.0-BETA1-1g-20100324-1414-nanobsd-upgrade.img.gz) but no difference.

I found some clue maybe, the following external IP's are working via the right gateway:

87.230.22.223 is OK
87.230.22.224 is NOT OK
87.230.22.225 is NOT OK
…
...
87.230.22.238 is NOT OK
87.230.22.239 is NOT OK
87.230.22.240 is OK AGAIN

I don't know how the 'gateway picking routine' works unfortunately...

dusan

@TuxTiger:

Hi,

I have configured the gateways to form a group at the 'routing' menu. I have made a loadbalancing group and 2 failover groups, I had a little trouble find a good ping target but now everything is running fine. Port forwards, preferred gateways for some protocols etc. TOP!

Using 'Firewall -> Rules' I have directed an internal host to use OPT1 as gateway for all targets and all protocols but to 1 particular destination host…..pfsense chooses the WAN route instead of OPT1.

I thought of some 'state' which I should clear but whatever I try....even reboot/shutdown...when I telnet from this OPT1-host to this particular destination host...pfsense immediately chooses to exit via the WAN gateway.

Is WAN your default gateway?

@TuxTiger:

87.230.22.223 is OK
87.230.22.224 is NOT OK
87.230.22.225 is NOT OK
…
...
87.230.22.238 is NOT OK
87.230.22.239 is NOT OK
87.230.22.240 is OK AGAIN

Did you try 80.254.71.228 ( http://m0n0.ch )? Is it OK?

I'm not pfsense dev. I ask you to test just to be sure if we share the same problem.

TuxTiger

@dusan:

Is WAN your default gateway?

Yes

@dusan:

Did you try 80.254.71.228 ( http://m0n0.ch )? Is it OK?

I'm not pfsense dev. I ask you to test just to be sure if we share the same problem.

No, it's not ok. 80.254.71.228 gets routed to the default gateway instead of the gateway mentioned in the 'Firewall -> Rules'. When I try different host in 80.254.71.xxx I can make the same list:

87.254.71.1 is OK
…
...
87.254.71.223 is OK
87.254.71.224 is NOT OK
87.254.71.225 is NOT OK
...
...
87.254.71.238 is NOT OK
87.254.71.239 is NOT OK
87.254.71.240 is OK AGAIN
...
87.254.71.254 is OK

I did a quick test on 2.2.2.239 and 2.2.2.240 and the same behavior (239 default gw, 240 correct policy gw)

dusan

@TuxTiger:

@dusan:

Is WAN your default gateway?

Yes

@dusan:

Did you try 80.254.71.228 ( http://m0n0.ch )? Is it OK?

I'm not pfsense dev. I ask you to test just to be sure if we share the same problem.

No, it's not ok. 80.254.71.228 gets routed to the default gateway instead of the gateway mentioned in the 'Firewall -> Rules'. When I try different host in 80.254.71.xxx I can make the same list:

87.254.71.1 is OK
…
...
87.254.71.223 is OK
87.254.71.224 is NOT OK
87.254.71.225 is NOT OK
...
...
87.254.71.238 is NOT OK
87.254.71.239 is NOT OK
87.254.71.240 is OK AGAIN
...
87.254.71.254 is OK

I did a quick test on 2.2.2.239 and 2.2.2.240 and the same behavior (239 default gw, 240 correct policy gw)

Indeed we share the same problem. Very comprehensive check. Thank you very much.

Now it is clear that there is a endian-related bug in the way pfsense recognizes multicast IP addresses. A multicast IP(v4) address is AFAIK one matching the mask 224/4, i.e. the highest octet belongs to the range 224..239. Your test shows that pfsense is checking for the lowest octet instead.

TuxTiger

@dusan:

Now it is clear that there is a endian-related bug in the way pfsense recognizes multicast IP addresses. A multicast IP(v4) address is AFAIK one matching the mask 224/4, i.e. the highest octet belongs to the range 224..239. Your test shows that pfsense is checking for the lowest octet instead.

Yes, I also was looking at the multicast high octal range in relation to this bug. Seemed to me too much of a coincidence not to be related :)

TuxTiger

Seems to me if BSD states:

pfctl -sr
pass in quick on vr0 route-to (vr2 83.85.124.1) inet from <ziggohosts> to any flags S/SA keep state label "USER_RULE: Some hosts prefer ZIGGO"</ziggohosts>

and packets are only matching if the rightmost byte of the IP address < 224 of > 239 the bug is in the BSD stack?

dusan

@TuxTiger:

Seems to me if BSD states:
pfctl -sr
pass in quick on vr0 route-to (vr2 83.85.124.1) inet from <ziggohosts> to any flags S/SA keep state label "USER_RULE: Some hosts prefer ZIGGO"</ziggohosts>
and packets are only matching if the rightmost byte of the IP address < 224 of > 239 the bug is in the BSD stack?

I don't know if the bug is in the "stack" but I know that it is new.

On March 9th my mail server was still able to communicate with certain business parner's mail server with IP addr belonging to the affected class. A few days after users started complaining about message non-delivery and there are logged events about lost contacts during communication with the business partner etc. I've been updating to the latest snapshot almost everyday. So, the bug must be introduced some day around March 9th.

TuxTiger

Ok, the only thing I can confirm is that my first experience with 2.0 (after endless vyatta v6 trials) was with snapshot: 20100301-2151-nanobsd and the problem already there. :(

eri--

Should be fixed on latest snapshots!
Problem in one patch with host byte ordering.

TuxTiger

@ermal:

Should be fixed on latest snapshots!
Problem in one patch with host byte ordering.

Oh man that would be great, I'm absolutely loving the 2.0 beta! :)

dusan

Me too. I'm deadly waiting for the fix. Thank you very much, Ermal.

TuxTiger

@ermal:

Should be fixed on latest snapshots!
Problem in one patch with host byte ordering.

The latest snapshot did fix it! Thanks very much!

dusan

Confirmed on Mar 26 01:13:43 snapshot. The bug was fixed.

Many thanks.