Web server pages rendering slowly going out WAN but fine internally - HELP PLS!

Gandalf

This isn't a pfSense problem but DNS problem (perhaps you forgot port 53 tcp/udp ??)
take a look at http://www.dnsstuff.com/tools/dnsreport.ch?domain=barfield.com you will see that the Nameserver located at 12.107.230.110 is not responding and since it's your primary DNS server the delay became normal, Fix your DNS will fix your website :)

sullrich

Yes, that would be my 3rd test.

Next test would be to lower the MTU to around 1400 on the WAN. Test again, if the situation improves, keep moving the MTU back higher and higher until you find the sweet spot that works the best.

jobsoft

I can see that on DNS for an initial, but, that should cache once things get rolling. I will check on that, but, again, nothing has changed except swapping fedora/shorewall for pfsense. Well, what used to be servers on WAN under shorewall are now servers on DMZ. I am sure if I moved them back to WAN they would work fine.

I forgot to mention that my DMZ is setup with 1:1 NAT with a public IP mapped to each DMZ server in the same way they were originally setup and known on WAN under shorewall. I first was thinking only of DMZ being the culprit, but, then it occurred to me to try a web server that was setup with Inbound NAT to a LAN box and the same behavior was present.

I will try the MTU suggestion too.

Mark

jobsoft

also, I am certain that I have no double NAT going on as the SMC switches automatically to bridge mode as soon as it detects one of the configured Public IPs on the LAN ports (which I thought was pretty slick). The SMC gatway has been in the picture for a long time anyways.

One thing that I wonder about is if the MTU corrected/compensated for the problem, a) why would it even be an issue and b) how does that play into causing the problematic behavior with delivering HTML to remote browsers? Why would it be an issue on the WAN but not on the LAN? Just strikes me as curious and I would like to get my hands around it.

jobsoft

OK, DNS is not an issue on other web pages on the "home domain", and they have the same issues, so, while it may be a contributing factor initially on www.barfield.com, it still an aside to the main issue.

I tried various settings on the MTU and it made no difference at all. :-(

Thanks though for all your suggestions and thoughts so far!

sullrich

Couple other things that I would check:

Status -> Interfaces .. See any errors or collisions?

jobsoft

There are a few In errors on the xl2 (LAN) and some collisions on each. The web server with www.barfield.com is on xl1 (DMZ), so, the In errs on LAN should not affect DMZ–>WAN. I did not display WAN again here as it has no errors and no collisions.

I suppose an error or collision would trash an outbound http packet, but, would it cause it to delay so much??? I suppose also that a stream of http would attempt to max out the packet size, so, this could be a problem that manifest itself near or at the MTU. I need to look at some tcpdumps and see.

LAN interface (xl2)
Status up
MAC address 00:50:04:76:95:f5
IP address 192.168.1.254
Subnet mask 255.255.255.0
Media 100baseTX
In/out packets 90732/106145 (40.64 MB/16.73 MB)
In/out errors 24/0
Collisions 74

DMZ interface (xl1)
Status up
MAC address 00:10:4b:37:d3:5d
IP address 172.21.0.2
Subnet mask 255.255.255.0
Media 100baseTX
In/out packets 76319/74887 (11.61 MB/16.21 MB)
In/out errors 0/0
Collisions 142

sullrich

One other thing is to verify that the speed and duplex are matching up on all pieces of equipment.

BTW: both sites loaded in under 15 seconds here.

billm

FWIW, both those sites come up instantly for me. Seems like your customers shouldn't notice. The issue is only when you try to access them from behind the same firewall right?

–Bill

jobsoft

no, from behind the same firewall (all on LAN), they are fine. It is from WAN from my house and my partner 's office in Vermont (I had him try) (both PCs from remote are themselves behind NAT), it was the same. And, sometimes they do pop up much quick and at other times they drag. That was why I suggested the F5 to refresh and see the varying performance.

While I agree they may not notice, when it does take a while to load, it looks broken and some times I have even had the browser time out and just leave the spots with broken images icons. Not good. Some times it has timed out when not enough HTML was delivered to even render the page intelligently.

The crux of the issue here is that the previous Fedora/Shorewall setup had no problems. Clearly SOMETHING in the chain with pfsense (and this very well could descend through m0n0wall to freebsd to the xl drivers). it could also be something else hardware wise. but nonetheless, there is a degradation and the new setup has to be contributing to it.

pfsense and/or m0n0wall are super cool tools!! And, what I am doing is nothing major. And, surely others have similar setups without issues or all kinds of heck would be all over these forums. so, my culprit is I think atypical which is going to make it all the more elusive! :

I really want to try and stay with pfsense, AND there has to be some way to at least define why the pages are rendering the way they are from a packet-level view.

yoda715

I went to both of your webpages and both appeared within 3 seconds. I continually hit shift-refresh (which reloads entire webpage) and noticed no hit in performance. Can you confirm that this happens at all times of day? Only thing I can think of based upon what I've read so far is that it might be utilization related. Meaning that there might be 10,000 people trying to pull up your webpage at the same time you were, and that caused it to slow down. Just a theory. Test this by going to the webpage at different times of the day. 12pm, 9pm, 1am, etc. See if that points to anything.

jobsoft

Very interesting indeed. What is your Internet setup configuration there? Since the two places that I tested it from (here and from Vermont) were also on cable internet and both behind NAT routers (each with a Linksys WRT54G running dd-wrt v23 SP2!). Both behaved the same way. I did ask the people at Barfield to test it out advise if performance or other problems and they said it look great to them too.

So, I wonder if the WRT54G's could be a factor in this anomaly??? I will have to rig my laptop direct to my cable this morning and see.

Also, as yet another "try this", I pulled up firefox here at my house from a fedora linux desktop and tried www.jobsoft.com. same thing! :-( But, I went ahead and captured some screen shots for the page render "progress" after the 1st, 2nd and 3rd minutes:

http://www.jobsoft.com/Screenshot_Jobsoft_Design_and_Development_1st_Minute.png
http://www.jobsoft.com/Screenshot_Jobsoft_Design_and_Development_2nd_Minute.png
http://www.jobsoft.com/Screenshot_Jobsoft_Design_and_Development_3rd_Minute.png

This is what I get no matter when I try it and from where and what here in the house behind the WRT54G NAT router. Notice in the 3rd minute the browser had given up and was "Done".

I can also remote VNC to a linux desktop at a customers site that has T1 and a cisco router this moming as well and see what it does from there.

Thanks for the feedback! It has helped to shift focus a bit.

jobsoft

One quick followup. Since I can packet capture on each end of this through the same event period, can anyone suggest what I might look for in wireshark that would be enlightening as to not necessarily what caused the problem to begin with, but what packet situation is resulting in the delays???

jobsoft

OK,

I have done the tcpdumps from 3 places:

http://www.jobsoft.com/packet-watch-dmz-filtered.cap
http://www.jobsoft.com/packet=watch-eth0-filtered.cap
http://www.jobsoft.com/packet-watch-wan-filtered.cap

All tcpdumps were 'tcpdump -s 1500 -i <iface>-w <capfile>.cap' and run simultaneously while I exercised the web pages from windows and linux here at my house. The anomalies did manifest themselves.

I then brought all 3 into Wireshark and filtered out only the packets to/from the web server and my external cable ip address and then saved those filtered sets back to the files above. I am making them available above as well in case anyone else wants to peek at them too, however, I certainly am already! :-)

DMZ was on the pfsense box xl1/DMZ interface. WAN was on the xl0/WAN interface. ETH0 was on the linux server at my house that I had firefox running from and it was listening on the inside wired lan.

What I did discover on the ETH0 stood out was several of the larger packets with HTTP payloads had checksum errors. While I have only just looked at these initially, something like that would trigger a retry. I also saw some "TCP DUP ACKs". What I will have to go back and do is trace one of these packets with the failed checksum back through WAN and DMZ and the see what followed. Ideally, if I could correlate the pauses in page rendering with the HTML contained in these retried packets, that would at least tie the browser behavior to the packet conditions. When I hook up my laptop direct to cable and then capture packets in the same way (just wireshark directly off the laptop on my house side).

The whole thing still puzzles me. ???</capfile></iface>

hoba

Do you see lots of errors or collisions at status>interfaces at one of the nics?

jobsoft

netstat -i

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
xl0 1500 <link#1>00:60:97:d0:14:fe 792325 0 813918 0 0
xl0 1500 fe80:1::260:9 fe80:1::260:97ff: 0 - 2 - -
xl0 1500 70-90-228-184 70-90-228-189-Nas 4167 - 5948 - -

xl1 1500 <link#2>00:10:4b:37:d3:5d 595782 26 554584 0 842
xl1 1500 fe80:2::210:4 fe80:2::210:4bff: 0 - 1 - -
xl1 1500 172.21/24 172.21.0.2 3187 - 7657 - -
xl2 1500 <link#3>00:50:04:76:95:f5 280118 133 294471 0 402
xl2 1500 fe80:3::250:4 fe80:3::250:4ff:f 0 - 1 - -
xl2 1500 192.168.1 gate 1885 - 2897 - -
pflog 33208 <link#4>0 0 0 0 0
lo0 16384 <link#5>9 0 9 0 0
lo0 16384 your-net localhost 445 - 0 - -
lo0 16384 localhost ::1 0 - 0 - -
lo0 16384 fe80:5::1 fe80:5::1 0 - 0 - -
pfsyn 2020 <link#6>0 0 0 0 0

Some Ierrs on xl1 (DMZ) and xl2 (LAN - not being considered at the moment) - none on xl0 (WAN)</link#6></link#5></link#4></link#3></link#2></link#1>

sullrich

Ahh yes. Checksum offloading errors.

From a shell:

ifconfig xl0 -rxsum
ifconfig xl1 -rxsum
ifconfig xl2 -rxsum

These seem like older cards, eh? I bet the checksum offloading is busted in FreeBSD.

jobsoft

Well, this gets even more interesting. I did what someone else suggested an hooked up my laptop directly to the cable modem. I had wireshark on it and started a capture and remoted with putty to my office and logged onto the pfsense box and start tcpdumps on DMZ and WAN. I then pulled up the browser on my laptop and the accessed the same pages. Here are the PCAP files:

http://www.jobsoft.com/packet-watch-dmz-lapdir-take2-filtered.pcap
http://www.jobsoft.com/packet-watch-eth0-lapdir-take2-filtered.pcap
http://www.jobsoft.com/packet-watch-wan-lapdir-take2-filtered.pcap

The amazing thing to me is the web pages rendered FLAWLESSLY! Yeah, a dropped packet here and there, but, NO delay in rendering!! 1-2 SECONDS tops! And none of the corrupt packets from when testing behind the WRT54G NAT.

OK, now, before PFSENSE, these sites also rendered in 1-2 seconds FROM BEHIND the WRT54G NAT!! So, when in combination of:

WEBSERVERS<–>DMZ<-->PFSENSE<-->WAN<-->WRT54G<-->LAPTOP

problems with packet turnaround/delays arise.

Again, before PFSENSE, it was Fedora6/Shorewall-IPTABLES. Under that scenario, some server were on LAN and some on WAN, but, all accessed with no problems whatsoever. With PFSENSE and take out WRT54G, things seem fine too. It is almost as if at the packet level some packets are getting corrupt and/or held up such that at the TCP/packet level no retransmission occurs until the browser (maybe?) tries the URL it is seeking. I haven't dug down that deep yet.

My next steps are 1) swap back in Fedora and capture/test, 2) different NICs and/or 3) "modern" PC hardware for PFSENSE.

Mark

jobsoft

@sullrich:

Ahh yes. Checksum offloading errors.

From a shell:

ifconfig xl0 -rxsum
ifconfig xl1 -rxsum
ifconfig xl2 -rxsum

These seem like older cards, eh? I bet the checksum offloading is busted in FreeBSD.

Slightly! :-)

xl0: <3Com 3c905-TX Fast Etherlink XL> port 0xd800-0xd83f irq 10 at device 18.0 on pci0
xl1: <3Com 3c905B-TX Fast Etherlink XL> port 0xdc00-0xdc7f mem 0xe8801000-0xe880107f irq 5 at device 19.0 on pci0
xlphy0: <3Com internal media interface> on miibus1
xl2: <3Com 3cSOHO100-TX OfficeConnect> port 0xe000-0xe07f mem 0xe8800000-0xe880007f irq 11 at device 20.0 on pci0
xlphy1: <3Com internal media interface> on miibus2

I have 3 newer AON-325 (Realtek 8139C) I was going to try. I also have a Compaq Netexpress (sp?) and can get my hands on another Intel Pro 100.

I had seen some reference to the issue with offloading in another thread. What still concerns me though is that I did try all 3 of these cards in different combinations of roles, so, I would suspect that one might work. and secondly, when I remove the WRT54G, and go laptop direct, no checksum errors at all (best I can tell). This could be an anomaly deriving from PFSENSE/FreeBSD and the WRT54G that only manifests itself when they come together (like chocolate and peanut better)! :-)

jobsoft

@sullrich:

Ahh yes. Checksum offloading errors.

From a shell:

ifconfig xl0 -rxsum
ifconfig xl1 -rxsum
ifconfig xl2 -rxsum

These seem like older cards, eh? I bet the checksum offloading is busted in FreeBSD.

forgot to add the results of the above commands you suggested:

ifconfig xl0 -rxsum

ifconfig: -rxsum: bad value

ifconfig xl1 -rxsum

ifconfig: -rxsum: bad value

ifconfig xl2 -rxsum

ifconfig: -rxsum: bad value