10 second delay on new TCP connections to specific IP address

danswartz

If the lines tagged "Server" are being captured on the LAN the server is on, it's got to be something wrong on the server itself, since we see the inbound SYN segments but no outbound SYN/ACK segments. I agree with Havok.

foxdie

Thanks for the replies. iptables is blank on the server, and the server is a public IP server in a remote datacenter, not on the local LAN.

I've spoken with some technical bods in #centos on Freenode IRC, done some more verbose dumps, which you can find here: http://pastebin.centos.org/31628

Someone there has suggested that one or both of the pfSense firewalls may be mangling the packets (incorrect checksums)?

danswartz

that doesn't sound very likely. i don't know why the pfsense would be mangling checksums and then stopping after 10 seconds. also, the tcp checksums on that trace you posted look correct. are the traces on the server itself or a host on that LAN?

Cry Havok

Packet mangling wouldn't explain the 10/20 second delay. If it was packet mangling I'd expect to see it on UDP too.

I'd suggest your next test will be to plug into your Netgear managed switch and test from there, or set up a VPN between the 2 pfSense boxes. That should eliminate many sources of potential issue. As for iptables, don't forget to check TCPWrappers - that's entirely separate.

foxdie

The trace on the server was run on the server itself, not any intermediary / third party device.

It was also mentioned that as soon as the WSCALE flag was dropped, the connection started working immediately.

I can't reproduce the connection issue from the switch, I've tried between hosts (2 servers connected to the same switch), or from another server.

I could try setting up a VPN, but isn't that skating around the issue?

TCPWrapper, where should I be looking at that? I've tried googling for it, there's no tcpwrapper command on the server either.

Thanks in advance,

danswartz

@Jason:

The trace on the server was run on the server itself, not any intermediary / third party device.

It was also mentioned that as soon as the WSCALE flag was dropped, the connection started working immediately.

I can't reproduce the connection issue from the switch, I've tried between hosts (2 servers connected to the same switch), or from another server.

I could try setting up a VPN, but isn't that skating around the issue?

TCPWrapper, where should I be looking at that? I've tried googling for it, there's no tcpwrapper command on the server either.

Thanks in advance,

Umm, I just searched this thread and found no reference to wscale being an issue?

foxdie

@danswartz:

@Jason:

The trace on the server was run on the server itself, not any intermediary / third party device.

It was also mentioned that as soon as the WSCALE flag was dropped, the connection started working immediately.

I can't reproduce the connection issue from the switch, I've tried between hosts (2 servers connected to the same switch), or from another server.

I could try setting up a VPN, but isn't that skating around the issue?

TCPWrapper, where should I be looking at that? I've tried googling for it, there's no tcpwrapper command on the server either.

Thanks in advance,

Umm, I just searched this thread and found no reference to wscale being an issue?

Sorry, this was mentioned in the Centos IRC channel.

Cry Havok

@Jason:

It was also mentioned that as soon as the WSCALE flag was dropped, the connection started working immediately.

I can't reproduce the connection issue from the switch, I've tried between hosts (2 servers connected to the same switch), or from another server.

That does point to an intermediate device, though obviously not which one.

@Jason:

I could try setting up a VPN, but isn't that skating around the issue?

Yes, but it allows you to narrow down the potential problem sources.

@Jason:

TCPWrapper, where should I be looking at that? I've tried googling for it, there's no tcpwrapper command on the server either.

That's because the package is called TCP Wrappers ;) The library is libwrap and the config files are /etc/hosts.*

That said, reviewing this thread, the evidence currently points towards an issue local to your office network. If you use a VPN between the pfSense hosts you'll eliminate your ADSL modem and both ISPs as potential sources of the problem. If it all works at that point, with WSCALE enabled, you'll know the problem isn't related to pfSense but to some device between the 2 pfSense hosts.

foxdie

Okay, bit lost at this point, I've been trying to set up a VPN as you've suggested using OpenVPN. I've already configured both local and remote pfSense OpenVPN boxes to connect with PKI, connection is successful, it's just getting it to work with our network (all IP's anonymised btw)..

The remote pfSense box has 1.2.3.128/27 as it's IP address assignment, it's configured as a transparent firewall with NAT disabled, so both internal and external interfaces have a static IP (1.2.3.130 and 1.2.3.131), all the servers have IPs in the same subnet but slightly higher.

The local pfSense box has a public range of 4.5.6.0/29 as it's IP address assignment, it's configured as a NAT gateway / firewall, all the LAN workstations are assigned IPs on the 192.168.0.0/24 range with the local pfSense's LAN IP being 192.168.0.1.

I tried configuring OpenVPN on the remote pfSense box as a server, setting the "Address Pool" to "1.2.3.153/30", enabled "Use Static IPs", and set the "Local Network" field to "1.2.3.128/27". On our end, I configured our local pfSense box as an OpenVPN client, connecting to 1.2.3.130, with the "Interface IP" field set as "1.2.3.154/30".

When the VPN comes up, no one in the office including the local pfSense box can communicate with 1.2.3.128/27, with exception with the local pfSense box being able to communicate with 1.2.3.153 by ping / SSH.

I have to be careful what I try because the remote pfSense box is responsible for several mission critical websites, so I can't take too many risks as it's an hours drive away. That said, don't suppose anyone can see what I'm doing wrong here can they? ;)

Cry Havok

It's worth reading the sticky posts at the top of the OpenVPN forum, and the OpenVPN documentation. In this case set the address pool to an RFC1918 IP range you don't use (eg 10.11.12.0/24). With that done you can add the local network (192.168.0.0/24) and the remote network (server.ip.add.ress/32).

foxdie

Well, at the risk of being flamed a little, we replaced our local pfSense box and Netgear ADSL modem with a Draytek Vigor 2820n, the problem still remains (but at least our routing cupboard is cleaner, hehe).

I'm quite worried now as this pretty much eliminates everything in our office, bar the ISP itself, although they deny any foul play. This shifts focus to the pfSense firewall in our datacenter, which means this problem could be affecting other people too that access the websites we host.

I'm going to try setting up a site-to-site VPN between our new Vigor and remote pfSense box, not sure how to do that just yet, but heh, I'll try the search function first ;)

Kind regards,

Cry Havok

@Jason:

From another ISP? Yes, I can't reproduce this from a server in another datacenter for example

That eliminates anything at the datacenter.

foxdie

Well I said that, I've only tried it from one remote location, I can't reproduce it from one location but that doesn't mean it's not happening for others.

Battling on trying to set up this vpn..

Cry Havok

Try another location - before you spend time chasing red herrings you need to be able to narrow down the issues.

As for VPN - try this about setting up site to site with OpenVPN. Before you do that though, do check that it is happening from more than just your office.