Possible routing loop? Routing loop diagnostics
This is going to seem like a bit of a weird one but it should make more sense after the details in this intro:
I am developing a network application for a client. The application is a Python 3 implementation of traceroute with ICMP and raw sockets.
I can provide details of the code as well, if this would be helpful, but I will not include this now to avoid an information dump.
As I increase the TTL from 1, I see initially a list of sensible IP addresses. These match the values as produced by the implementation of traceroute provided with Debian.
Then, I find a spurious local IP address. This occurs at the same time that I read data from the socket which looks to be the same as the data I am sending.
That probably doesn't make a whole lot of sense, so allow me to clarify:
This is what I see when TTL = 1:
Pinging (220.127.116.11) 18.104.22.168 Address: ('192.168.2.1', 0) TTL=1 RTT=0 ms Type=11 Code=0 192.168.2.1
That is the IP address of one of the interfaces on pfSense.
This is what I see for TTL = 2:
Pinging (22.214.171.124) 126.96.36.199 Address: ('192.168.0.1', 0) TTL=2 RTT=2 ms Type=11 Code=0 192.168.0.1
That is the IP address of my ISP router.
The output is sensible until we reach TTL = 6:
Pinging (188.8.131.52) 184.108.40.206 Address: ('192.168.2.1', 0) error: icmpType=8 Address: ('192.168.2.1', 0) error: icmpType=8 Address: ('192.168.2.1', 0) error: icmpType=8 Address: ('192.168.122.1', 0) TTL=6 RTT=1929 ms Type=3 Code=1 192.168.122.1
Two things occur here:
Firstly, I am able to "read" some data from the raw socket which has
icmpType=8. That means "echo request". I saw this behaviour earlier today and was confused by it - so I "bodged" the code to work by continually reading from the socket while
icmpType=8and then breaking out of the read loop when
icmpTypeis something other than 8. This suggests to me that for some reason the ICMP messages are being sent back to my desktop machine. I have no idea why this would occur - and in particular, I don't understand why this would occur more than once, which to me seems very, very strange behaviour.
Secondly notice that the final IP address obtained is
192.168.122.1. This is a private network IP, and is the address of a network device on my desktop machine.
192.168.122.0/24is the default network for virtual machines (KVM/QEMU) on Debian linux systems. There are no VMs currently running on this network. I also have an additional VM network with address
192.168.100.0/24, again with no VMs currently running.
The two above points indicate to me that there is some kind of routing loop occuring, however I don't know this for certain.
My pfSense system has a static route to
192.168.100.0/24defined. It does not have a static route to
192.168.122.0/24. It also has a gateway to
192.168.100.0/24defined, and the gateway address is
192.168.2.100. That is the address of my desktop machine.\
I used to have problems with routing loops and pfSense, where a routing loop for all network traffic could be spuriously started. Here's some breif details of what caused that:
- ISP router is a bit "shaky". Something would go wrong which caused pfSense to think the router/gateway was down.
- No problem if this is the only gateway, but if other gateways were present (for example gateways to other internal networks) then pfSense would begin routing traffic to these other gateways.
- Explicitly specifying a default gateway fixed this.
- pfSense currently thinks the ISP gateway/router is down due to "packet loss". However there are no generally noticable network problems, other than this python implementation of traceroute which appears to show some kind of routing loop or packets being sent back to the sending machine.
This is all I know about the situation, other than to say if I increase the TTL enough, then I am able to see other more sensible IP addresses with the python 3 traceroute implementation. For example
Pinging (220.127.116.11) 18.104.22.168 Address: ('22.214.171.124', 0) TTL=9 RTT=85 ms Type=11 Code=0 126.96.36.199
Just to finish off by saying is there any way to diagnose this issue and potential routing loop issues?
Do you see the same behaviour tracerouting to any external public IP?
If you pcap the traffic where are the replies actually coming from?
If it was actually a routing loop I would I expect it to timeout whatever TTL you set.
Yes. What about to a different public IP?
If you are hitting something odd in the route you may not hit that to a different target.