NAT or networking issues



  • I'm running a web application for home automation (Windows 2012 R2, IIS 8.5) behind a pfSense 2.2.2 firewall. I've NAT'ed port 80 for IPv4 and forward both IPV4 and IPv6 on port 80.

    I've run all the tests I've found:
    The web site is reachable from around the world:
    https://www.site24x7.com/public/t/results-1431343994872.html

    I've run DNS checks as well. It resolves both on IPv4 and IPv6. Both are accessible.

    A few users have notified me that they cannot reach the web site from certain networks. Most often it is from their work networks. From home it works fine. One have even involved their IT department to try to figure out the issue. They do not use a proxy, there is no filtering or blacklisting that should stop the site. When you open it in a browser it just times out with no response.

    nslookup works
    ping works
    traceroute works

    The IT department thinks it's an MTU issue and that  TCP Path MTU Discovery (PMTUD) isn’t working. I see this was an issue when using NAT a few releases back (2.2), but is supposedly fixed in 2.2.2.

    Website in question: tellmon.net
    Network connectivity is via fibre from Altibox.

    Any ideas?


  • Netgate

    The IT department thinks it's an MTU issue and that  TCP Path MTU Discovery (PMTUD) isn’t working.

    Pass ICMPv6 through to the server if that's the case.


  • Rebel Alliance Global Moderator

    I show it loading here both ipv4 and ipv6

    But you don't " forward both IPV4 and IPv6 on port 80."

    You allow it..  So what does a traceroute show?  Do they not resolve the fqdn, do they resolve it to something else - if there is something wrong with their ipv6, then they could have issue sure.  Browsers and oses like to pick ipv6 first be it 100% working or not if they have an address, etc..

    Have them disable ipv6 in their browser and see if that fixes the problem

    Have them do a sniff are they going ipv4 or ipv6 do they get handshake, or don't even get syn,ack back?  That would have little to do with a mtu between you and them, etc.



  • This is from a machine outside of my network where it works:

    IPv6:
    $ telnet tellmon.net 80
    Trying 2a01:79d:3e86:5a18:5945:87fd:f332:83c…
    Connected to tellmon.net

    IPv4
    $ telnet -4 tellmon.net 80
    Trying 79.161.150.134...
    Connected to 134.79-161-150.customer.lyse.net

    When one of the users with issues try the same they get connection timeouts, both for IPv4 and IPv6.

    $ telnet tellmon.net 80
    Trying 79.161.150.134...
    telnet: connect to address 79.161.150.134: Operation timed out
    Trying 2a01:79d:3e86:5a18:5945:87fd:f332:83c...
    telnet: connect to address 2a01:79d:3e86:5a18:5945:87fd:f332:83c: No route to host
    telnet: Unable to connect to remote host

    Traceroute from a "nonworking" location:
    Traceroute to tellmon.net (79.161.150.134), 64 hops max, 72 byte packets
    1 10.4.4.1 (10.4.4.1) 3.298 ms 1.915 ms 2.128 ms
    2 192.36.162.9 (192.36.162.9) 2.676 ms 2.291 ms 2.279 ms
    3 ilikr2-ge-0-0-3-1.ilik.net (192.71.20.21) 5.031 ms 509.863 ms 4.442 ms
    4 telia-gw.ilik.net (192.71.20.25) 2.745 ms 2.808 ms 2.957 ms
    5 193.181.252.67 (193.181.252.67) 3.065 ms 3.087 ms 2.987 ms
    6 sesbww01-r12.ericsson.net (193.180.17.236) 2.984 ms 3.087 ms 3.340 ms
    7 78.77.163.217 (78.77.163.217) 3.033 ms 4.801 ms 3.232 ms
    8 s-b6-link.telia.net (80.91.250.43) 3.828 ms 4.179 ms 4.517 ms
    9 level3-ic-155475-s-b2.c.telia.net (213.248.99.134) 42.783 ms 31.932 ms 31.786 ms
    10 * ae-1-3.bear1.oslo2.level3.net (4.69.202.245) 28.569 ms 29.383 ms
    11 ae-1-3.bear1.oslo2.level3.net (4.69.202.245) 28.428 ms 28.698 ms 28.451 ms
    12 62.140.27.6 (62.140.27.6) 28.766 ms 28.999 ms 28.393 ms
    13 216.213-167-114.customer.lyse.net (213.167.114.216) 29.179 ms 30.508 ms 31.162 ms
    14 237.79-160-112.customer.lyse.net (79.160.112.237) 30.061 ms 28.175 ms 28.393 ms
    15 2.79-160-49.customer.lyse.net (79.160.49.2) 29.889 ms 29.706 ms 29.963 ms
    16 65.79-160-49.customer.lyse.net (79.160.49.65) 32.049 ms 31.082 ms 31.180 ms
    17 203.79-160-49.customer.lyse.net (79.160.49.203) 31.332 ms 31.903 ms 31.391 ms
    18 134.79-161-150.customer.lyse.net (79.161.150.134) 31.324 ms 31.248 ms 31.329 ms

    So traceroute works, ping works, connecting to port 80 not so much.

    I notice the error from telnet "No route to host" for IPv6. Could there be an IPv6 routing issue at their end?


  • Rebel Alliance Global Moderator

    yeah sure looks like ipv6 issue.. Have them just ping your ipv6 address I show it responding

    C:>ping -6 2a01:79d:3e86:5a18:5945:87fd:f332:83c

    Pinging 2a01:79d:3e86:5a18:5945:87fd:f332:83c with 32 bytes of data:
    Reply from 2a01:79d:3e86:5a18:5945:87fd:f332:83c: time=147ms
    Reply from 2a01:79d:3e86:5a18:5945:87fd:f332:83c: time=150ms
    Reply from 2a01:79d:3e86:5a18:5945:87fd:f332:83c: time=145ms



  • Had them disable IPv6, no change. Also note that they cannot connect via IPv4 either.


  • Rebel Alliance Global Moderator

    well have them do a tracepath -n to the ipv4 address what do they get for mtu size?

    Or on windows you can grab mturoute

    D:\Dropbox\tools>mturoute.exe 79.161.150.134

    • ICMP Fragmentation is not permitted. *
    • Speed optimization is enabled. *
    • Maximum payload is 10000 bytes. *
    • ICMP payload of 1472 bytes succeeded.
    • ICMP payload of 1473 bytes is too big.
      Path MTU: 1500 bytes.

    Or grap mtupath

    D:\Dropbox\tools>mtupath.exe 79.161.150.134

    MTU path scan to 79.161.150.134, ttl=64, limit=48

    16 processing - best MSS 1472 (estimated MTU 1500) [pPPPPpPppPpppppp]

    01 nearest minimum MTU on local interface

    #1 MSS IN RANGE    1 <==  1471 ==>  1472
            #2 MSS EXCEEDED  1473 <== 14911 ==> 16384

    D:\Dropbox\tools>mtupath.exe 2a01:79d:3e86:5a18:5945:87fd:f332:83c

    MTU path scan to 2a01:79d:3e86:5a18:5945:87fd:f332:83c, ttl=64, limit=48

    16 processing - best MSS 1232 (estimated MTU 1280) [uUUUUuUUuuUUuuuu]

    08 nearest minimum MTU on 2001:470:<snipped>::1 (2 hops away)

    #1 MSS IN RANGE    1 <==  1231 ==>  1232
            #2 MSS EXCEEDED  1233 <== 15151 ==> 16384</snipped>



  • I've asked a user to try tracepath.

    This is the output from a computer where I do have access (although tracepath failes):

    $ tracepath -n 79.161.150.134
    1?: [LOCALHOST]                                        pmtu 1500
    1:  92.63.175.3                                          1.127ms
    1:  92.63.175.3                                          1.017ms
    2:  92.63.170.25                                          2.617ms
    3:  83.231.213.105                                        2.033ms
    4:  129.250.4.185                                        1.824ms
    5:  4.68.63.217                                          1.714ms asymm  6
    6:  no reply
    7:  62.140.27.6                                          26.220ms asymm 11
    8:  213.167.114.216                                      26.821ms asymm 13
    9:  79.160.112.237                                      25.884ms asymm 13
    10:  79.160.49.2                                          26.651ms asymm 13
    11:  79.160.49.65                                        28.570ms asymm 14
    12:  79.160.49.203                                        31.047ms asymm 14
    13:  no reply
    14:  no reply
    15:  no reply
    16:  no reply
    17:  no reply
    18:  no reply
    19:  no reply
    20:  no reply
    21:  no reply
    22:  no reply
    23:  no reply
    24:  no reply
    25:  no reply
    26:  no reply
    27:  no reply
    28:  no reply
    29:  no reply
    30:  no reply
    31:  no reply
        Too many hops: pmtu 1500
        Resume: pmtu 1500


  • Rebel Alliance Global Moderator

    Yeah lots of places no longer follow the rfcs and turn off shit that use to help you trouble shoot.  Its amazing that normal traceroute sometimes works - you almost always run into something that doesn't answer along the way.

    Try the mturoute and mtupath tools.. just google for them and you will find them, or I can post up links to where to get them.  The mtupath tool supports ipv6 while mturoute does not.



  • The user I have to test is on a Mac. But he did the following: ping -g 1444 -G 1508 -c 2 -h 1 -D tellmon.net

    
    $ ping -g 1444 -G 1508 -c 2 -h 1 -D tellmon.net
    PING tellmon.net (79.161.150.134): (1444 ... 1508) data bytes
    1452 bytes from 79.161.150.134: icmp_seq=0 ttl=48 time=43.852 ms
    1452 bytes from 79.161.150.134: icmp_seq=1 ttl=48 time=44.231 ms
    1453 bytes from 79.161.150.134: icmp_seq=2 ttl=48 time=41.999 ms
    1453 bytes from 79.161.150.134: icmp_seq=3 ttl=48 time=41.772 ms
    1454 bytes from 79.161.150.134: icmp_seq=4 ttl=48 time=40.842 ms
    1454 bytes from 79.161.150.134: icmp_seq=5 ttl=48 time=42.563 ms
    1455 bytes from 79.161.150.134: icmp_seq=6 ttl=48 time=41.921 ms
    1455 bytes from 79.161.150.134: icmp_seq=7 ttl=48 time=40.807 ms
    1456 bytes from 79.161.150.134: icmp_seq=8 ttl=48 time=46.312 ms
    1456 bytes from 79.161.150.134: icmp_seq=9 ttl=48 time=40.710 ms
    1457 bytes from 79.161.150.134: icmp_seq=10 ttl=48 time=42.128 ms
    1457 bytes from 79.161.150.134: icmp_seq=11 ttl=48 time=38.404 ms
    1458 bytes from 79.161.150.134: icmp_seq=12 ttl=48 time=42.161 ms
    1458 bytes from 79.161.150.134: icmp_seq=13 ttl=48 time=39.170 ms
    1459 bytes from 79.161.150.134: icmp_seq=14 ttl=48 time=40.740 ms
    1459 bytes from 79.161.150.134: icmp_seq=15 ttl=48 time=42.998 ms
    1460 bytes from 79.161.150.134: icmp_seq=16 ttl=48 time=45.073 ms
    1460 bytes from 79.161.150.134: icmp_seq=17 ttl=48 time=40.441 ms
    1461 bytes from 79.161.150.134: icmp_seq=18 ttl=48 time=41.104 ms
    1461 bytes from 79.161.150.134: icmp_seq=19 ttl=48 time=42.609 ms
    1462 bytes from 79.161.150.134: icmp_seq=20 ttl=48 time=41.451 ms
    1462 bytes from 79.161.150.134: icmp_seq=21 ttl=48 time=45.585 ms
    1463 bytes from 79.161.150.134: icmp_seq=22 ttl=48 time=37.775 ms
    1463 bytes from 79.161.150.134: icmp_seq=23 ttl=48 time=42.920 ms
    1464 bytes from 79.161.150.134: icmp_seq=24 ttl=48 time=43.763 ms
    1464 bytes from 79.161.150.134: icmp_seq=25 ttl=48 time=42.746 ms
    1465 bytes from 79.161.150.134: icmp_seq=26 ttl=48 time=41.689 ms
    1465 bytes from 79.161.150.134: icmp_seq=27 ttl=48 time=41.784 ms
    1466 bytes from 79.161.150.134: icmp_seq=28 ttl=48 time=47.693 ms
    1466 bytes from 79.161.150.134: icmp_seq=29 ttl=48 time=44.977 ms
    1467 bytes from 79.161.150.134: icmp_seq=30 ttl=48 time=43.754 ms
    1467 bytes from 79.161.150.134: icmp_seq=31 ttl=48 time=44.094 ms
    1468 bytes from 79.161.150.134: icmp_seq=32 ttl=48 time=42.748 ms
    1468 bytes from 79.161.150.134: icmp_seq=33 ttl=48 time=46.548 ms
    1469 bytes from 79.161.150.134: icmp_seq=34 ttl=48 time=38.262 ms
    1469 bytes from 79.161.150.134: icmp_seq=35 ttl=48 time=45.237 ms
    1470 bytes from 79.161.150.134: icmp_seq=36 ttl=48 time=38.332 ms
    1470 bytes from 79.161.150.134: icmp_seq=37 ttl=48 time=44.879 ms
    1471 bytes from 79.161.150.134: icmp_seq=38 ttl=48 time=51.683 ms
    Request timeout for icmp_seq 39
    Request timeout for icmp_seq 40
    1472 bytes from 79.161.150.134: icmp_seq=41 ttl=48 time=470.155 ms
    1473 bytes from 79.161.150.134: icmp_seq=42 ttl=48 time=56.146 ms
    1473 bytes from 79.161.150.134: icmp_seq=43 ttl=48 time=44.793 ms
    1474 bytes from 79.161.150.134: icmp_seq=44 ttl=48 time=40.672 ms
    1474 bytes from 79.161.150.134: icmp_seq=45 ttl=48 time=45.132 ms
    1475 bytes from 79.161.150.134: icmp_seq=46 ttl=48 time=43.245 ms
    1475 bytes from 79.161.150.134: icmp_seq=47 ttl=48 time=37.639 ms
    1476 bytes from 79.161.150.134: icmp_seq=48 ttl=48 time=48.990 ms
    1476 bytes from 79.161.150.134: icmp_seq=49 ttl=48 time=41.792 ms
    1477 bytes from 79.161.150.134: icmp_seq=50 ttl=48 time=41.589 ms
    1477 bytes from 79.161.150.134: icmp_seq=51 ttl=48 time=41.105 ms
    1478 bytes from 79.161.150.134: icmp_seq=52 ttl=48 time=44.118 ms
    1478 bytes from 79.161.150.134: icmp_seq=53 ttl=48 time=42.335 ms
    1479 bytes from 79.161.150.134: icmp_seq=54 ttl=48 time=42.574 ms
    1479 bytes from 79.161.150.134: icmp_seq=55 ttl=48 time=43.763 ms
    1480 bytes from 79.161.150.134: icmp_seq=56 ttl=48 time=43.831 ms
    1480 bytes from 79.161.150.134: icmp_seq=57 ttl=48 time=42.214 ms
    ping: sendto: Message too long
    ping: sendto: Message too long
    Request timeout for icmp_seq 58
    ping: sendto: Message too long
    Request timeout for icmp_seq 59
    ping: sendto: Message too long
    Request timeout for icmp_seq 60
    ....
    --- tellmon.net ping statistics ---
    130 packets transmitted, 56 packets received, 56.9% packet loss
    round-trip min/avg/max/stddev = 37.639/50.640/470.155/56.656 ms
    
    

  • Rebel Alliance Global Moderator

    and what part of that command I don't have os x is telling it not to fragment?  I can send very large pings too, doesn't have anything to do with the mtu really, since will just be fragmented, etc.

    C:>ping -l 6000 tellmon.net

    Pinging tellmon.net [79.161.150.134] with 6000 bytes of data:
    Reply from 79.161.150.134: bytes=6000 time=148ms TTL=54
    Reply from 79.161.150.134: bytes=6000 time=148ms TTL=54
    Reply from 79.161.150.134: bytes=6000 time=150ms TTL=54



  • The -D argument makes ping not fragment (the message too long part).
    Ran the same command from a Mac I borrowed on another (working) network and I did not get any "message too long" errors at all. It went up all the way to 1508 without an issue.


  • Rebel Alliance Global Moderator

    I would suggest you sniff on machine with problem and also on your server and see exactly what is happening, is server not answering for some reason.  Is something blocking traffic? etc..

    Post up the sniffs from both sides and can take a look see.



  • Check your DNS record for tellmom.net.

    I cannot resolve it, and it dead-ends at Network Solutions.

    dig://www.tellmom.net;server=10.0.1.240;debug=0;querytype=ANY
    
    ; <<>> DiG 9.8.3-P1 <<>> @10.0.1.240 www.tellmom.net ANY
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12934
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2
    
    ;; QUESTION SECTION:
    ;www.tellmom.net.		IN	ANY
    
    ;; ANSWER SECTION:
    www.tellmom.net.	7052	IN	A	208.91.197.27
    
    ;; AUTHORITY SECTION:
    tellmom.net.		7189	IN	NS	ns88.WORLDNIC.COM.
    tellmom.net.		7189	IN	NS	NS87.WORLDNIC.COM.
    
    ;; ADDITIONAL SECTION:
    ns87.worldnic.com.	7052	IN	A	207.204.40.144
    ns88.worldnic.com.	7052	IN	A	207.204.21.144
    
    ;; Query time: 16 msec
    ;; SERVER: 10.0.1.240#53(10.0.1.240)
    ;; WHEN: Tue May 19 09:59:27 2015
    ;; MSG SIZE  rcvd: 153
    
    ----------------[End of response]----------------
    
    


  • Rebel Alliance Global Moderator

    Well the OP domain is tellmon.net not tellmom.net ;)

    His setup is fine.. if in question with dns problems always just query the authoritative servers for the domain.

    user@ubuntu:~$ dig @ns1.hyp.net www.tellmon.net

    ; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> @ns1.hyp.net www.tellmon.net
    ; (2 servers found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37313
    ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 7
    ;; WARNING: recursion requested but not available

    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;www.tellmon.net.              IN      A

    ;; ANSWER SECTION:
    www.tellmon.net.        3600    IN      A      79.161.150.134

    ;; AUTHORITY SECTION:
    tellmon.net.            3600    IN      NS      ns1.hyp.net.
    tellmon.net.            3600    IN      NS      ns2.hyp.net.
    tellmon.net.            3600    IN      NS      ns3.hyp.net.

    ;; ADDITIONAL SECTION:
    ns1.hyp.net.            86400  IN      A      194.63.248.53
    ns1.hyp.net.            86400  IN      AAAA    2a01:5b40:0:248::53
    ns2.hyp.net.            86400  IN      A      78.129.173.18
    ns2.hyp.net.            86400  IN      AAAA    2001:1b40:5600:1900::2
    ns3.hyp.net.            86400  IN      A      151.249.126.3
    ns3.hyp.net.            86400  IN      AAAA    2a01:5b40:0:251::13

    ;; Query time: 138 msec
    ;; SERVER: 2a01:5b40:0:248::53#53(2a01:5b40:0:248::53)
    ;; WHEN: Tue May 19 09:26:44 CDT 2015
    ;; MSG SIZE  rcvd: 250



  • @johnpoz:

    Well the OP domain is tellmon.net not tellmom.net ;)

    There just isn't enough coffee in the world today to get this right…. :)

    Sorry about that.