Odd Craigslist Issue



  • Greetings,

    Okay, so having an odd issue with a network at work. This is a fairly small LAN with about 30-50 workstations / devices / printers / etc. So the issue in essence is that only one of the workstation on the network can access craigslist. All others time out.

    This is not a DNS issue as a ping to craigslist.org resolves correctly to 208.82.237.226 on all workstations. Also a tracert completes on all workstations with nothing remarkable.

    This is not an ISP issue as one of the workstations can connect and browse Craigslist without issue. Also one of the workstations that is unable to connect from the LAN was able to connect when directly linked to the ISP modem.

    Okay, so that points to an issue with one of the routers or perhaps a switch.

    And to add to the confusion.... All of the workstations that "time out" DO receive a cookie from Craigslist. The cookie, 'cl_b' is one of the same cookies that the workstation that CAN connect also receives.

    Let me also add that this issue of timing out is ONLY happening with Craigslist. No other sites have this issue.

    The basic topology of the network is: Cable Modem (in bridge mode) -> Bridged PFSENSE Router v2.4.3-RELEASE-p1 -> Zyxel Switch (GS1900) -> TPLink Router (TL-R470T+) -> HP Switch (Procurve 2824) -> Workstations / Printers / AP's / Etc.
    The reason for the two routers are servers that are wired to first switch with external IP addresses. The first router acts as a firewall / IDS / IPS. Router two does limited NAT.

    I have scoured logs of the switches and routers. I have also done a packet capture from one of the afflicted workstations, please see attachment. It's not browser specific as the problem exists with all browsers with the exception of TOR browser.

    Any help in this matter would be greatly appreciated. I am stumped. What makes Craigslist different than all other sites? What makes the one workstation that can connect different than the others?

    Thanks

    0_1542231948693_Untitled.png



  • @sabyre said in Odd Craigslist Issue:

    Zyxel Switch (GS1900) -> TPLink Router (TL-R470T+)

    Does it work if you remove these two devices from the network chain? I am not sure why you need these when you could remove them and go from pfSense to your Procurve and simplify the network.


  • Netgate Administrator

    Are you running any dynamic package on the firewall like Snort or Suricata or pfBlocker?

    Do you have multiwan? Policy routing?

    Steve



  • Sorry for the delay,

    @tim-mcmanus I haven't tested yet with a laptop directly connected to the Zyxel switch. I can't remove the TP-Link router and Zyxel switch. There are some servers off the Zyxel switch with exclusive external IP's. Behind the TP Link router is a local network (local IP's).

    @stephenw10 The packages running on the pfSense device is: dpinger, ntpd, syslogd, unbound, and firewall rules. It's single wan connection. In bridged mode. No NAT


  • Netgate Administrator

    It's unclear what the TP-Link router is doing in this setup. Is that doing all the NAT? Is pfSense in 'transparent mode', WAN bridged to LAN?

    Where was that packet capture taken?

    Steve



  • @stephenw10 The TPLink is acting as the router/gateway for the local network. The local network is assigned 1 external IP address. The rest of the IP block goes to various servers. See image. pfSense is bridged WAN to LAN.

    The packet capture was from pfSense while trying to access CL from a workstation behind the TP Link.

    0_1542654189714_topo.png



  • You can simplify your network by removing the TPlink and creating a VLAN for the PC network. pfSense can do the same work as the TPLink by providing DHCP/DNS/routing services.

    I checked your Zyxel switch model, and it does have VLAN capabilities. So you should be able to do fairly easily this unless I've misunderstood your network architecture.



  • @tim-mcmanus Thank you for the reply. Yes, it could be done that way however, the circumstances of the equipment ownership will not make that possible.



  • @sabyre said in Odd Craigslist Issue:

    @tim-mcmanus Thank you for the reply. Yes, it could be done that way however, the circumstances of the equipment ownership will not make that possible.

    Oh, it's "complicated". :)

    The challenge is mitigating the double-NAT that can be created using two routers in series. I think @stephenw10 noted that above, and it could be the root cause of your issue.



  • @tim-mcmanus Double NAT? pfSense is not setup for NAT.



  • @sabyre said in Odd Craigslist Issue:

    @tim-mcmanus Double NAT? pfSense is not setup for NAT.

    Okay, so they are bridged and Outbound NAT is disabled?

    Sorry for the seemingly obvious questions. Just trying to get an understanding of the configs.



  • @tim-mcmanus Correct


  • Netgate Administrator

    The bridge is more likely the issue here. Seems like it could be some TCP flag oddness perhaps. No clue why it would not affect one workstation though.
    Do you have a browser on any of those servers? Or can you put a client there to test?

    Steve



  • @stephenw10 I can plug a laptop into the Zyxel and give it a public IP and test. I'll go do that now and see how that goes.


  • Netgate Administrator

    Cool. Check that it fails on the LAN side of the Zyxel just to be sure.

    Steve



  • @stephenw10 Okay, so I tested this morning with a laptop that times out on the LAN. I gave it a public facing IP with the proper subnet and gateway. I used 8.8.8.8 for DNS. I plugged it into the Zyxel switch. It has connectivity, I can visit websites, etc. Craigslist times out as it did before. I am able to ping CL and tracert CL without issues. This test connection would transverse the switch and the pfSense box as outlined in the topo image above.


  • Netgate Administrator

    Hmm, well that's odd. Did you grab a packet capture of the failure again? Still looks the same? Client not sending ACKs?

    Can you run a packet capture at the client to compare it? Does it see the SYN-ACKs and reply?

    You might also check the MACs are correct for those packets. That would not affect devices behind the other router though obviously.

    It's hard to see what could be causing this on the firewall though. Those packets are not special in any way I can see, not very large for example.
    This seems far more likely to be some odd client side setting though the evidence seems to exclude that.

    Steve



  • @stephenw10 said in Odd Craigslist Issue:

    Hmm, well that's odd.
    Steve

    Exactly....

    It's definitely an issue with the pfSense box. A direct connection to the ISP MODEM and CL works as it should. What has me really scratching my head is why one of the workstations doesn't exhibit the issue at all. Granted it is the only Windows 10 machine here.


  • Netgate Administrator

    Really I think comparing pcaps of what works and what doesn't and what the firewall sees vs what the client sees is the only way to get to the root of this.

    About the only thing I could imagine is that some part of the CL site is doing something it shouldn't for non-windows clients only. FreeBSD/pfSense sticks rigidly to the rules where as other OSes are more flexible. However in bridge mode that would really have to be something pf sees as invalid. You might try disabling pf scrub in System > Advanced > Firewall&NAT.

    Steve



  • @stephenw10 I took a pcap on the workstation that doesn't have issue. There are no retransmits like in the earlier pcap from a system that cannot connect. Let me see if I can upload them both. Perhaps you could have a look and something will jump out at you.

    I did disable scrub, but that didn't have any effect. Oddly enough while I was on a system that times out I did a browser refresh and got some of the CL data to appear, but not all. That was short lived as the next couple of refreshes again timed out.

    0_1542829932802_fail.pcap
    0_1542829946153_good.pcap



  • I just tested with a phone and had no issues. It was connected to the wifi (LAN) with the cell data turned off. After that I tested with a tablet over wifi and had no issues with that either.

    Seems to be only affecting Windows systems with OS's older than version 10. I'll bring in my Debian laptop and my Macbook on Monday to test.

    I should add that the workstations are on a windows domain with one DC. The DC also has DNS, DHCP, and fileserver roles. The DC is Windows Server 2016 and it also times out trying to connect to CL.


  • Netgate Administrator

    It's hard to see how this can be anything other than a client side issue. When it fails the client never ACKs the servers SYN-ACK. It fails the initial TCP handshake. Either the SYN-ACK from the server never makes it back to the client or the client never responds to it. A capture actually on the failing client would show which.
    Are those pcaps on the pfSense LAN? Since they are bridged they should be the same but...

    Steve



  • @stephenw10 The pcaps are on the pfSense WAN. I will run a pcap on a failing system.



  • @stephenw10 I ran a pcap from one of the systems that time out. Refreshed craigslist.org. Applied filter: ip.addr == 208.82.237.226 and get a blank result. How can packets not be leaving the NIC?



  • @stephenw10 See attached. These are pcaps from a system that times out and a system that successfully connects. Both are on the LAN. Note how the "Not Working" starts with .17 and the "Working" starts with .2 and never shows packets to or from .17.

    0_1543243594421_working.pcapng

    0_1543243612068_Not Working.pcapng


  • Netgate Administrator

    Interesting. Do they both resolve craigslist.org to the same IP?

    Steve



  • @stephenw10 Yes 208.82.237.226



  • Is there any security or AV software or browser plugins on the affected machine.



  • @grimson No, all have been disabled. Let me also reiterate that the machines that time out can successfully connect if plugged directly into the modem. This very much seems to be a pfSense problem, but I cannot for the life of me understand what the issue is or even where to look.



  • Any other suggestions?



  • @sabyre said in Odd Craigslist Issue:

    Any other suggestions?

    Wipe your pfSense installation. Leave a basic, default configuration on it and then connect a client to it. See if you can repeat the issue.

    It could be a config issue buried deep somewhere that we're not looking. A default install turned into a bridge would eliminate a config issue (in theory). You can save your old configs and re-import them after the test.

    It shouldn't be behaving like it is, and if it truly is a pfSense issue, testing a default install with minimal configurations may help resolve this.



  • @tim-mcmanus I was trying to avoid that, but thank you for the response. It seems that may be the best option at this point.



  • I know you've kind of covered this but could Squid or some proxy caching be causing the issue? Or did you have it and remove the package where there may be some remnants? One PC could be set to be ignored and allow all traffic. Could explain why that one PC can connect but the others can't?

    Just throwing something out there.



  • @stewart Excellent reply, thank you. I did have Suricata installed at one point, however it would crash and need a restart every couple of weeks. Downtime tends to make customers angry. So I disabled it. It is still installed, just not running.



  • @sabyre I've had a lot of experience with Suricata doing odd things. Under Diagnostics-Table is there anything in the Snort2c table?



  • @stewart Good call, I didn't think of that, but alas it is empty.


  • Netgate Administrator

    Yeah disabling Suricata (or Snort) or even uninstalling it does not necessarily remove any blocks.

    At this point I would be setting it to a basic config to test. It's easy tot restore your current config if it doesn't help.

    Steve



  • @stephenw10 Yeah, I've been bit by that before.

    @Sabyre On an affected machine, what does a traceroute show? Also, I've used a program called PingPlotter (there is an old freeware version floating on the internet) that graphically combines Ping and Traceroute. I'm curious what a trace would show since you said you don't see packets going to the router.



  • @stewart That's a nice program. I hadn't used it before. So I ran a trace with the program on the working machine and on the fail machine. Both results are identical with the exception of the final destination.

    On the working machine the trace ends at 208.82.237.2
    On the fail machine the trace ends at 208.82.237.242

    Both IP's belong to CL. On either machine there is only one CL IP in the trace.



  • @stewart And when running it again on both they both end with 208.82.237.18