Can't reach SIP provider

VirtualOffice

We are having trouble connecting to a SIP server: sip.gridare.com, it used to work and no changes have been made to the interfaces, routing or firewall rules.

We have multiple WANs. It stopped working first on WAN1 so I routed traffic to it out through WAN2 which worked for a couple of weeks, but now WAN2 can no longer reach it.

Here is a tracert from pfsense:

Results
 1  telstra-fibre (203.XXX.XXX.XXX)  0.311 ms  0.309 ms  0.235 ms
 2  Bundle-Ether1051-362.chw-edge902.sydney.telstra.net (139.130.105.209)  12.361 ms  14.581 ms  12.589 ms
 3  bundle-ether14.chw-core10.sydney.telstra.net (203.50.11.100)  14.979 ms  13.866 ms  12.607 ms
 4  bundle-ether1.chw-edge903.sydney.telstra.net (203.50.11.177)  14.729 ms  13.742 ms  13.231 ms
 5  opt2823000.lnk.telstra.net (110.145.206.62)  15.480 ms  15.656 ms  14.597 ms
 6  * * *
 7  59.154.142.50 (59.154.142.50)  14.326 ms
    59.154.18.154 (59.154.18.154)  15.373 ms
    59.154.142.46 (59.154.142.46)  14.208 ms
 8  202.139.16.66 (202.139.16.66)  14.224 ms  15.623 ms  15.477 ms
 9  203.33.142.73 (203.33.142.73)  12.972 ms  14.081 ms  12.858 ms
10  mitx03.atu.net.au (203.56.251.143)  14.232 ms  15.327 ms  15.349 ms
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *

Here is the result from a laptop I plugged into the WAN and gave the same setup:

>tracert sip.gridare.com

Tracing route to sip.gridare.com [203.20.110.132]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  telstra-fibre [203.XXX.XXX.XXX]
  2    13 ms    13 ms    15 ms  Bundle-Ether1051-362.chw-edge902.sydney.telstra.net [139.130.105.209]
  3    14 ms    15 ms    16 ms  bundle-ether14.chw-core10.sydney.telstra.net [203.50.11.100]
  4    12 ms    12 ms    12 ms  bundle-ether1.chw-edge903.sydney.telstra.net [203.50.11.177]
  5    14 ms    15 ms    15 ms  opt2823000.lnk.telstra.net [110.145.206.62]
  6     *        *        *     Request timed out.
  7    14 ms    14 ms    14 ms  59.154.18.148
  8    14 ms    14 ms    17 ms  202.139.16.66
  9    13 ms    13 ms    13 ms  203.33.142.73
 10    13 ms    13 ms    13 ms  mitx03.atu.net.au [203.56.251.143]
 11    15 ms    14 ms    14 ms  gridare2.atu.net.au [203.20.110.132]

Trace complete.

As far as I know this is the only server we cannot reach.

There is nothing in the firewall logs suggesting traffic is being blocked to or from this address

Both the SIP provider and ISP state the problem is not with them, which I believe since it worked fine from a laptop.

We have a backup cellular WAN connection. From here it is reachable from pfsense:

Results
 1  192.168.XXX.XXX (192.168.XXX.XXX)  0.527 ms  0.375 ms  0.372 ms
 2  119.225.62.29 (119.225.62.29)  123.287 ms  83.807 ms  80.066 ms
 3  119.225.62.138 (119.225.62.138)  79.928 ms  86.923 ms  79.944 ms
 4  * * *
 5  59.154.142.48 (59.154.142.48)  91.636 ms  77.065 ms
    59.154.142.46 (59.154.142.46)  80.089 ms
 6  202.139.16.66 (202.139.16.66)  89.785 ms  79.401 ms  89.932 ms
 7  203.33.142.73 (203.33.142.73)  80.075 ms  86.680 ms  79.940 ms
 8  mitx03.atu.net.au (203.56.251.143)  80.058 ms  79.771 ms  79.941 ms
 9  gridare2.atu.net.au (203.20.110.132)  80.074 ms  86.162 ms  89.893 ms

I really have no idea what I can try to fix this. Any Ideas would be appreciated.

Thanks in advance

stephenw10

Does it fail at the same point if you traceroute from WAN1?

Do you see the same result if your traceroute using ICMP from pfSense?

Are the WANs static IPs? Does the laptop get the same IP?

Are you running Snort or pfBlocker or any other package that auto-updates?

Steve

VirtualOffice

Hi Steve, thanks for the reply.

Yes the trace fails at the same point on both WAN connections:

Results
 1  192.168.XXX.XXX (192.168.XXX.XXX)  2.594 ms  1.090 ms  1.240 ms
 2  lo10.lns01.sydnmtc.nsw.m2core.net.au (203.134.4.140)  20.349 ms  20.408 ms  20.228 ms
 3  225.330.dsl.syd.iprimus.net.au (203.134.75.225)  23.472 ms
    225.6-134-203.static.corp.syd.iprimus.net.au (203.134.6.225)  20.489 ms
    225.330.dsl.syd.iprimus.net.au (203.134.75.225)  21.026 ms
 4  ae11.per03.sydnmtc.nsw.m2core.net.au (203.134.72.224)  21.030 ms  20.969 ms  20.606 ms
 5  9266.syd.equinix.com (45.127.172.166)  20.722 ms  20.891 ms  20.486 ms
 6  mitx03.atu.net.au (203.56.251.143)  20.466 ms  20.532 ms  20.854 ms
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *

atu.net.au seems to be a wholesale hosting provider, seems a good bet that this is where our SIP providers' servers live. I reached out to them but they didn't get back to me.

ICMP gave the same results.

One of the WANs is a static IP, the other is a static route to a publicly reachable subnet, but I suppose that counts as a static IP.

I should note that I have 2 pfsense boxes in failover and all the WAN connections are CARP virtual IPs (including the cellular backup that is currently still working). It has been set up this way for a couple of years and never been a problem before.

The laptop was given the CARP virtual IP and also tested with other IPs on the subnet, all worked.

I only have 2 packages installed: openvpn-client-export and Status_Traffic_Totals. No pfBlocker or any other blocker.

stephenw10

Hmm, weird.

What if you use the actual WAN IP as the source (or the CARP IP if you weren't)?

Hard to see what could be different there. I could find almost nothing about Gridare, their web site support section seems mostly broken! Do they have any TCP ports open at that IP you can test against?

They do respond with a RST on port 5060 for me which at least confirms my traffic is reaching them. A traceroute succeeds from here. Coming from the opposite side of the planet though so the route is different, even at the second to last hop.

Steve

VirtualOffice

Hi Steve,

A Minor breakthrough! At least on one of the WAN connections:

pinging sip.gridare.com:

SUCCEEDS from CARP IP on pfsense1
Fails from WAN IP on pfsense1
Succeeds from CARP IP on pfsense2 (if it is manually failed-over)
Succeeds from WAN IP on pfsense2

Fails on laptop when set up with pfsense1's WAN IP address
Succeeds on laptop with any other of the WAN subnets addresses (it is a /29 subnet)

I noticed while failed over to pfSense2 we did get a successful SIP connection, which dropped out again when failed back to pfSense1.

Do you think they might have blocked some of our IP addresses? It would also explain why the various WAN connections stopped working slowly over the course of a couple of weeks.

stephenw10

Yes, that seems almost the only possibility. They are blocking the pfSense WAN IP and from the traceroute it looks like on their actual server.

However connections from a client behind pfSense should be NAT'd to the CARP IP so I would expect that to succeed. Including the actual VoIP connection.

Steve

VirtualOffice

OK armed with this I have gone back to the provider and they have ~~actually looked and~~ found the issue. Thanks very much for your help.

I have traffic to that provider routed out a specific WAN/Gateway, perhaps thats causing it to be NATd from the WAN IP instead of the CARP IP, but since it is now working I'm not going to mess with it.

Thanks again

Joel

netblues

You should NEVER use the wan ip instead of the carp.
And that's irrelevant of the fact you are using a specific gateway. And it will lead to issues.

Even though you never bothered to tell us what the issue was I can safely presume that you have misconfigured sip clients somewhere in your network that try to access the sip server with wrong credentials.
This eventually blocks the natted wan/carp ip at the dc edge perimeter.
Most probably your sip provider is using some kind of fingerprinting, which leads to these strange blocking behavior.

stephenw10

Yes, policy routing to a particular gateway should have no effect on the outbound NAT which should still translate that to the CARP VIP. If it does not then your failover will be a lot less smooth as states will need to re-created on the other node WAN.

Steve