Connected successfully to a Sonicwall TZ170 but…

HaloGray

I turned off bitmap caching under the experience tab on remote desktop settings and now it's working great. I'm still a bit confused as to why this would be needed, or any different than the other IPSec VPN where this step is not required… but hey, I'll take the fix.

If anybody has any tests or theories I'm still all ears.

cmb

@HaloGray:

I'm still a bit confused as to why this would be needed, or any different than the other IPSec VPN where this step is not required…

because path MTU discovery (PMTUD) with IPsec in FreeBSD is just plain broken. IPsec adds 20 bytes of overhead, so you can't send packets bigger than 1480 on a typical 1500 MTU everywhere network. Commercial VPN's (likely Sonicwall though I don't know because I've never worked with one, this is how Cisco works) will either fragment the packet and transmit it in two IPsec packets, or will properly utilize PMTUD to make sure the host never sends anything bigger than 1460/1480 bytes (depending if you're using transport or tunnel mode with your IPsec, tunnel has an additional 20 bytes overhead per packet for the outer IP header). What FreeBSD really should do is any time it gets a packet that needs to traverse a VPN tunnel, but is over 1460 bytes, is bounce back an ICMP "frag needed but DF bit set" message. Or, MSS clamping in the firewall packages (both pf here and ipfilter in m0n0wall do this for normal Internet traffic) need to take IPsec overhead into account for VPN-destined packets (which at this point it isn't possible to differentiate).

this is a major limitation, with no easy work around at this time unfortunately. If it becomes a major issue, the best "solution" at this time is to lower the MTU of your systems to 1460. That can have some negative impact on host to host network throughput on the LAN though (more packets have to be processed to transfer the same amount of data, which chews up more system resources)

sullrich

I'm still curious why I never need to do this. I connect to any machine across the pfSense tunnel and connect to RDP, etc.

Why is it different in this case?

cmb

@sullrich:

I'm still curious why I never need to do this. I connect to any machine across the pfSense tunnel and connect to RDP, etc.

Why is it different in this case?

That's one I've never figured out. :D I've also successfully run RDP and about everything else over VPN tunnels using m0n0wall and pfsense. Client side group policy processing is the only issue I can easily replicate, because it sends a 2000 byte ping to determine link speed for GPO processing purposes. Because this never gets a reply, the machine aborts group policy processing. You can disable this pretty stupid means of link speed testing via group policy (though it's kinda hard to apply a fix through group policy to machines that can't apply group policies ;) ) or a registry change.

But it's easy to see and replicate the problem. Do a "ping -l 1400 10.x.x.x" to something across the tunnel, and it'll work. Increase it to 1470 or something like that, and it won't. You should be able to ping our servers at BGN with any size packet you want (it'll frag, so only if you allow frags out your LAN). The same won't work over VPN once you get over 1460 or so (though I've also seen oddities where it worked up into the 1460's before the packets started disappearing, which is another oddity).

HaloGray

@cmb:

this is a major limitation, with no easy work around at this time unfortunately. If it becomes a major issue, the best "solution" at this time is to lower the MTU of your systems to 1460. That can have some negative impact on host to host network throughput on the LAN though (more packets have to be processed to transfer the same amount of data, which chews up more system resources)

I knocked my wan down to 1472 and had no difference, I'll try going as far as 1460 and see what happens. The extra resources shouldn't be that much of an issue. The networks I'm talking about have less than 10 users each.

@sullrich:

I'm still curious why I never need to do this. I connect to any machine across the pfSense tunnel and connect to RDP, etc.

Why is it different in this case?

Perhaps it's something to do with the networks?

The pfSense box is located in my home where I'm playing around with it. I'm connecting across a SBC Yahoo 1.5mb ADSL wan link (with dynamic IP) to a static Comcast cable 5mb WAN link (static IP).

My home connection is looking for an IP, and the remote connection is looking for my dyndns address (The TZ170 is capable of doing this, unlike the SoHo provided in m0n0wall's example). My identifyer is my IP, and the sonicwall's identifyer is the unit's serial number.

Have you had a setup similar to this and had it work ok?

sullrich

I must have a magic pfSense:

C:\Documents and Settings\GeekGod.SULLRICH>ping -l 3000 10.0.0.26

Pinging 10.0.0.26 with 3000 bytes of data:

Reply from 10.0.0.26: bytes=3000 time=188ms TTL=63
Reply from 10.0.0.26: bytes=3000 time=208ms TTL=63

:) :)

HaloGray

@cmb:

But it's easy to see and replicate the problem. Do a "ping -l 1400 10.x.x.x" to something across the tunnel, and it'll work. Increase it to 1470 or something like that, and it won't. You should be able to ping our servers at BGN with any size packet you want (it'll frag, so only if you allow frags out your LAN). The same won't work over VPN once you get over 1460 or so (though I've also seen oddities where it worked up into the 1460's before the packets started disappearing, which is another oddity).

C:\WINDOWS|► ping -l 1472 10.50.1.4

Pinging 10.50.1.4 with 1472 bytes of data:

Reply from 10.50.1.4: bytes=1472 time=88ms TTL=127
Reply from 10.50.1.4: bytes=1472 time=91ms TTL=127

Seems to work ok? That's what my WAN MTU is set up at… yet I still get the black screen unless I disable bitmap caching. Am I not understanding properly?

cmb

@sullrich:

I must have a magic pfSense:

C:\Documents and Settings\GeekGod.SULLRICH>ping -l 3000 10.0.0.26

Pinging 10.0.0.26 with 3000 bytes of data:

Reply from 10.0.0.26: bytes=3000 time=188ms TTL=63
Reply from 10.0.0.26: bytes=3000 time=208ms TTL=63

:) :)

what the…

goes off to check FreeBSD 6 and hope to be proven wrong. ;D

i haven't done any extensive testing of this since 4.x and 5.x versions, though I doubt if it's changed.

sullrich

@HaloGray:

@cmb:

But it's easy to see and replicate the problem. Do a "ping -l 1400 10.x.x.x" to something across the tunnel, and it'll work. Increase it to 1470 or something like that, and it won't. You should be able to ping our servers at BGN with any size packet you want (it'll frag, so only if you allow frags out your LAN). The same won't work over VPN once you get over 1460 or so (though I've also seen oddities where it worked up into the 1460's before the packets started disappearing, which is another oddity).

C:\WINDOWS|► ping -l 1472 10.50.1.4

Pinging 10.50.1.4 with 1472 bytes of data:

Reply from 10.50.1.4: bytes=1472 time=88ms TTL=127
Reply from 10.50.1.4: bytes=1472 time=91ms TTL=127

Seems to work ok? That's what my WAN MTU is set up at… yet I still get the black screen unless I disable bitmap caching. Am I not understanding properly?

Does a -l 1500 work by chance? How about -l 3000 ?

Just curious as I think there is something else at play here.

sullrich

@cmb:

@sullrich:

I must have a magic pfSense:

C:\Documents and Settings\GeekGod.SULLRICH>ping -l 3000 10.0.0.26

Pinging 10.0.0.26 with 3000 bytes of data:

Reply from 10.0.0.26: bytes=3000 time=188ms TTL=63
Reply from 10.0.0.26: bytes=3000 time=208ms TTL=63

:) :)

what the…

goes off to check FreeBSD 6 and hope to be proven wrong. ;D

i haven't done any extensive testing of this since 4.x and 5.x versions, though I doubt if it's changed.

SHRUGS. I'm not sure what's going on here but it seems that this is no longer a problem (MTU path discovery)

cmb

@cmb:

though I doubt if it's changed.

i guess i should say "it sure looks like it's changed", because that never would have worked before.

HaloGray

It seems to be a problem with the command given ;)

C:\WINDOWS|► ping -l 3000 10.50.1.4

Pinging 10.50.1.4 with 3000 bytes of data:

Reply from 10.50.1.4: bytes=3000 time=147ms TTL=127
Reply from 10.50.1.4: bytes=3000 time=168ms TTL=127

Works fine for me too, but when I set the -f option (do not fragment) it fails.

C:\WINDOWS|► ping -f -l 3000 10.50.1.4

Pinging 10.50.1.4 with 3000 bytes of data:

Packet needs to be fragmented but DF set.

However… even with the -f option set 1472 still succeeds...

C:\WINDOWS|► ping -f -l 1472 10.50.1.4

Pinging 10.50.1.4 with 1472 bytes of data:

Reply from 10.50.1.4: bytes=1472 time=83ms TTL=127
Reply from 10.50.1.4: bytes=1472 time=127ms TTL=127

:edit:

Perhaps it has something to do with the 20 bytes of overhead you mentioned?

C:\WINDOWS|► ping -f -l 1492 10.50.1.4

Pinging 10.50.1.4 with 1492 bytes of data:

Packet needs to be fragmented but DF set.

sullrich

Do you have anything in the event viewer that pertains to this?

cmb

@HaloGray:

Works fine for me too, but when I set the -f option (do not fragment) it fails.

when you set -f with anything over your MTU on a ping on Windows, it never leaves the NIC of the box (I've verified this with tcpdump on another host). Windows realizes "hey, I can't send this without fragmenting", and should give you "Packet needs to be fragmented but DF set." without the packet ever touching the network.

this is indeed not an issue any more with pings.

from my house to Scott's (this is even with m0n0wall on my end, a little strange that this now works…):

root@s2# ping -s 3000 10.0.250.1
PING 10.0.250.1 (10.0.250.1): 3000 data bytes
3008 bytes from 10.0.250.1: icmp_seq=0 ttl=63 time=236.648 ms
3008 bytes from 10.0.250.1: icmp_seq=1 ttl=63 time=241.168 ms
3008 bytes from 10.0.250.1: icmp_seq=2 ttl=63 time=229.937 ms

But ping is actually a different issue - that was FreeBSD b0rking somewhere on fragmented packets (I guess on the receiving end, based upon the above result).

But with stuff like this still happening regularly with things like RDP, TCP definitely still seems to be an issue. Off to see what I can figure out with that.

It really shouldn't frag any TCP like it does with ICMP as we've exhibited here, because on any host OS with PMTUD enabled, the DF bit will always be set on packets. You don't want any of your network devices frag'ing DF packets, and always want to completely avoid frags if at all possible.

cmb

@sullrich:

SHRUGS. I'm not sure what's going on here but it seems that this is no longer a problem (MTU path discovery)

you can't test that with ping, as when using a -f and packet larger than MTU, it never leaves the box (with Windows at least, can't say that I've tried any other OS). So this test doesn't tell us that the problem is resolved at all. it just shows us a somewhat related other stupid issue with IPsec isn't a problem anymore. I was thinking maybe there would be a way to use ping in another regard to further test this, but that doesn't seem to be the case either because of Windows stupidity.

HaloGray

@sullrich:

Do you have anything in the event viewer that pertains to this?

I just tried connecting without bitmap caching and have an error generated about a missing printer driver (unrelated but it's sort of a place holder).
Turned on bitmap caching and tried to re-connect. > Black screen again.
Disabled bitmap caching and re-connected successfully again to find the printer error repeat, with nothing inbetween the two events.

So the short answer here is… no.

Regarding Windows and ping being a poor test, I have a linux box in the house and I could get to the shell on my pfsense box itself to run some ping tests. That is if you would find it helpful to do so.

cmb

My Windows box is being dumb - can either/both of you try this and see what you get?

I was thinking, I should be able to send a ping packet at 1499 bytes (or something like that) with DF, and it should hit the network. It doesn't. I can't even send a 1300 byte ping with DF.

try a
ping -l 1300 -f 10.x.x.x

to something on your LAN and see what you get.

HaloGray

From windows it works fine:
C:\WINDOWS|► ping -l 1300 -f 192.168.0.1

Pinging 192.168.0.1 with 1300 bytes of data:

Reply from 192.168.0.1: bytes=1300 time<1ms TTL=64

sullrich

C:\Documents and Settings\GeekGod.SULLRICH>ping -l 1300 -f 10.0.0.26

Pinging 10.0.0.26 with 1300 bytes of data:

Reply from 10.0.0.26: bytes=1300 time=82ms TTL=63
Reply from 10.0.0.26: bytes=1300 time=82ms TTL=63

billm

@cmb:

@sullrich:

I'm still curious why I never need to do this. I connect to any machine across the pfSense tunnel and connect to RDP, etc.

Why is it different in this case?

That's one I've never figured out. :D I've also successfully run RDP and about everything else over VPN tunnels using m0n0wall and pfsense. Client side group policy processing is the only issue I can easily replicate, because it sends a 2000 byte ping to determine link speed for GPO processing purposes. Because this never gets a reply, the machine aborts group policy processing. You can disable this pretty stupid means of link speed testing via group policy (though it's kinda hard to apply a fix through group policy to machines that can't apply group policies ;) ) or a registry change.

But it's easy to see and replicate the problem. Do a "ping -l 1400 10.x.x.x" to something across the tunnel, and it'll work. Increase it to 1470 or something like that, and it won't. You should be able to ping our servers at BGN with any size packet you want (it'll frag, so only if you allow frags out your LAN). The same won't work over VPN once you get over 1460 or so (though I've also seen oddities where it worked up into the 1460's before the packets started disappearing, which is another oddity).

Works for me?

ping -s 1600 192.168.177.254

PING 192.168.177.254 (192.168.177.254): 1600 data bytes
1608 bytes from 192.168.177.254: icmp_seq=0 ttl=63 time=61.553 ms
1608 bytes from 192.168.177.254: icmp_seq=1 ttl=63 time=31.591 ms
1608 bytes from 192.168.177.254: icmp_seq=2 ttl=63 time=33.823 ms

–Bill