Can't reach SIP provider
-
We are having trouble connecting to a SIP server: sip.gridare.com, it used to work and no changes have been made to the interfaces, routing or firewall rules.
We have multiple WANs. It stopped working first on WAN1 so I routed traffic to it out through WAN2 which worked for a couple of weeks, but now WAN2 can no longer reach it.
Here is a tracert from pfsense:
Results 1 telstra-fibre (203.XXX.XXX.XXX) 0.311 ms 0.309 ms 0.235 ms 2 Bundle-Ether1051-362.chw-edge902.sydney.telstra.net (139.130.105.209) 12.361 ms 14.581 ms 12.589 ms 3 bundle-ether14.chw-core10.sydney.telstra.net (203.50.11.100) 14.979 ms 13.866 ms 12.607 ms 4 bundle-ether1.chw-edge903.sydney.telstra.net (203.50.11.177) 14.729 ms 13.742 ms 13.231 ms 5 opt2823000.lnk.telstra.net (110.145.206.62) 15.480 ms 15.656 ms 14.597 ms 6 * * * 7 59.154.142.50 (59.154.142.50) 14.326 ms 59.154.18.154 (59.154.18.154) 15.373 ms 59.154.142.46 (59.154.142.46) 14.208 ms 8 202.139.16.66 (202.139.16.66) 14.224 ms 15.623 ms 15.477 ms 9 203.33.142.73 (203.33.142.73) 12.972 ms 14.081 ms 12.858 ms 10 mitx03.atu.net.au (203.56.251.143) 14.232 ms 15.327 ms 15.349 ms 11 * * * 12 * * * 13 * * * 14 * * * 15 * * *
Here is the result from a laptop I plugged into the WAN and gave the same setup:
>tracert sip.gridare.com Tracing route to sip.gridare.com [203.20.110.132] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms telstra-fibre [203.XXX.XXX.XXX] 2 13 ms 13 ms 15 ms Bundle-Ether1051-362.chw-edge902.sydney.telstra.net [139.130.105.209] 3 14 ms 15 ms 16 ms bundle-ether14.chw-core10.sydney.telstra.net [203.50.11.100] 4 12 ms 12 ms 12 ms bundle-ether1.chw-edge903.sydney.telstra.net [203.50.11.177] 5 14 ms 15 ms 15 ms opt2823000.lnk.telstra.net [110.145.206.62] 6 * * * Request timed out. 7 14 ms 14 ms 14 ms 59.154.18.148 8 14 ms 14 ms 17 ms 202.139.16.66 9 13 ms 13 ms 13 ms 203.33.142.73 10 13 ms 13 ms 13 ms mitx03.atu.net.au [203.56.251.143] 11 15 ms 14 ms 14 ms gridare2.atu.net.au [203.20.110.132] Trace complete.
As far as I know this is the only server we cannot reach.
There is nothing in the firewall logs suggesting traffic is being blocked to or from this address
Both the SIP provider and ISP state the problem is not with them, which I believe since it worked fine from a laptop.
We have a backup cellular WAN connection. From here it is reachable from pfsense:
Results 1 192.168.XXX.XXX (192.168.XXX.XXX) 0.527 ms 0.375 ms 0.372 ms 2 119.225.62.29 (119.225.62.29) 123.287 ms 83.807 ms 80.066 ms 3 119.225.62.138 (119.225.62.138) 79.928 ms 86.923 ms 79.944 ms 4 * * * 5 59.154.142.48 (59.154.142.48) 91.636 ms 77.065 ms 59.154.142.46 (59.154.142.46) 80.089 ms 6 202.139.16.66 (202.139.16.66) 89.785 ms 79.401 ms 89.932 ms 7 203.33.142.73 (203.33.142.73) 80.075 ms 86.680 ms 79.940 ms 8 mitx03.atu.net.au (203.56.251.143) 80.058 ms 79.771 ms 79.941 ms 9 gridare2.atu.net.au (203.20.110.132) 80.074 ms 86.162 ms 89.893 ms
I really have no idea what I can try to fix this. Any Ideas would be appreciated.
Thanks in advance
-
Does it fail at the same point if you traceroute from WAN1?
Do you see the same result if your traceroute using ICMP from pfSense?
Are the WANs static IPs? Does the laptop get the same IP?
Are you running Snort or pfBlocker or any other package that auto-updates?
Steve
-
Hi Steve, thanks for the reply.
Yes the trace fails at the same point on both WAN connections:
Results 1 192.168.XXX.XXX (192.168.XXX.XXX) 2.594 ms 1.090 ms 1.240 ms 2 lo10.lns01.sydnmtc.nsw.m2core.net.au (203.134.4.140) 20.349 ms 20.408 ms 20.228 ms 3 225.330.dsl.syd.iprimus.net.au (203.134.75.225) 23.472 ms 225.6-134-203.static.corp.syd.iprimus.net.au (203.134.6.225) 20.489 ms 225.330.dsl.syd.iprimus.net.au (203.134.75.225) 21.026 ms 4 ae11.per03.sydnmtc.nsw.m2core.net.au (203.134.72.224) 21.030 ms 20.969 ms 20.606 ms 5 9266.syd.equinix.com (45.127.172.166) 20.722 ms 20.891 ms 20.486 ms 6 mitx03.atu.net.au (203.56.251.143) 20.466 ms 20.532 ms 20.854 ms 7 * * * 8 * * * 9 * * * 10 * * * 11 * * * 12 * * * 13 * * * 14 * * * 15 * * *
atu.net.au seems to be a wholesale hosting provider, seems a good bet that this is where our SIP providers' servers live. I reached out to them but they didn't get back to me.
ICMP gave the same results.
One of the WANs is a static IP, the other is a static route to a publicly reachable subnet, but I suppose that counts as a static IP.
I should note that I have 2 pfsense boxes in failover and all the WAN connections are CARP virtual IPs (including the cellular backup that is currently still working). It has been set up this way for a couple of years and never been a problem before.
The laptop was given the CARP virtual IP and also tested with other IPs on the subnet, all worked.
I only have 2 packages installed: openvpn-client-export and Status_Traffic_Totals. No pfBlocker or any other blocker.
-
Hmm, weird.
What if you use the actual WAN IP as the source (or the CARP IP if you weren't)?
Hard to see what could be different there. I could find almost nothing about Gridare, their web site support section seems mostly broken! Do they have any TCP ports open at that IP you can test against?
They do respond with a RST on port 5060 for me which at least confirms my traffic is reaching them. A traceroute succeeds from here. Coming from the opposite side of the planet though so the route is different, even at the second to last hop.
Steve
-
Hi Steve,
A Minor breakthrough! At least on one of the WAN connections:
pinging sip.gridare.com:
SUCCEEDS from CARP IP on pfsense1
Fails from WAN IP on pfsense1
Succeeds from CARP IP on pfsense2 (if it is manually failed-over)
Succeeds from WAN IP on pfsense2Fails on laptop when set up with pfsense1's WAN IP address
Succeeds on laptop with any other of the WAN subnets addresses (it is a /29 subnet)I noticed while failed over to pfSense2 we did get a successful SIP connection, which dropped out again when failed back to pfSense1.
Do you think they might have blocked some of our IP addresses? It would also explain why the various WAN connections stopped working slowly over the course of a couple of weeks.
-
Yes, that seems almost the only possibility. They are blocking the pfSense WAN IP and from the traceroute it looks like on their actual server.
However connections from a client behind pfSense should be NAT'd to the CARP IP so I would expect that to succeed. Including the actual VoIP connection.
Steve
-
OK armed with this I have gone back to the provider and they have
actually looked andfound the issue. Thanks very much for your help.I have traffic to that provider routed out a specific WAN/Gateway, perhaps thats causing it to be NATd from the WAN IP instead of the CARP IP, but since it is now working I'm not going to mess with it.
Thanks again
Joel
-
You should NEVER use the wan ip instead of the carp.
And that's irrelevant of the fact you are using a specific gateway. And it will lead to issues.Even though you never bothered to tell us what the issue was I can safely presume that you have misconfigured sip clients somewhere in your network that try to access the sip server with wrong credentials.
This eventually blocks the natted wan/carp ip at the dc edge perimeter.
Most probably your sip provider is using some kind of fingerprinting, which leads to these strange blocking behavior. -
Yes, policy routing to a particular gateway should have no effect on the outbound NAT which should still translate that to the CARP VIP. If it does not then your failover will be a lot less smooth as states will need to re-created on the other node WAN.
Steve