Possible routing loop? Routing loop diagnostics

hypernova

This is going to seem like a bit of a weird one but it should make more sense after the details in this intro:

I am developing a network application for a client. The application is a Python 3 implementation of traceroute with ICMP and raw sockets.

I can provide details of the code as well, if this would be helpful, but I will not include this now to avoid an information dump.

As I increase the TTL from 1, I see initially a list of sensible IP addresses. These match the values as produced by the implementation of traceroute provided with Debian.

Then, I find a spurious local IP address. This occurs at the same time that I read data from the socket which looks to be the same as the data I am sending.

That probably doesn't make a whole lot of sense, so allow me to clarify:

This is what I see when TTL = 1:

Pinging (209.233.126.254) 209.233.126.254
Address: ('192.168.2.1', 0)
  TTL=1    RTT=0 ms    Type=11    Code=0    192.168.2.1

That is the IP address of one of the interfaces on pfSense.

This is what I see for TTL = 2:

Pinging (209.233.126.254) 209.233.126.254
Address: ('192.168.0.1', 0)
  TTL=2    RTT=2 ms    Type=11    Code=0    192.168.0.1

That is the IP address of my ISP router.

The output is sensible until we reach TTL = 6:

Pinging (209.233.126.254) 209.233.126.254
Address: ('192.168.2.1', 0)
error: icmpType=8
Address: ('192.168.2.1', 0)
error: icmpType=8
Address: ('192.168.2.1', 0)
error: icmpType=8
Address: ('192.168.122.1', 0)
  TTL=6    RTT=1929 ms    Type=3    Code=1    192.168.122.1

Two things occur here:

Firstly, I am able to "read" some data from the raw socket which has icmpType=8. That means "echo request". I saw this behaviour earlier today and was confused by it - so I "bodged" the code to work by continually reading from the socket while icmpType=8 and then breaking out of the read loop when icmpType is something other than 8. This suggests to me that for some reason the ICMP messages are being sent back to my desktop machine. I have no idea why this would occur - and in particular, I don't understand why this would occur more than once, which to me seems very, very strange behaviour.
Secondly notice that the final IP address obtained is 192.168.122.1. This is a private network IP, and is the address of a network device on my desktop machine. 192.168.122.0/24 is the default network for virtual machines (KVM/QEMU) on Debian linux systems. There are no VMs currently running on this network. I also have an additional VM network with address 192.168.100.0/24, again with no VMs currently running.

The two above points indicate to me that there is some kind of routing loop occuring, however I don't know this for certain.

My pfSense system has a static route to 192.168.100.0/24 defined. It does not have a static route to 192.168.122.0/24. It also has a gateway to 192.168.100.0/24 defined, and the gateway address is 192.168.2.100. That is the address of my desktop machine.\

I used to have problems with routing loops and pfSense, where a routing loop for all network traffic could be spuriously started. Here's some breif details of what caused that:

ISP router is a bit "shaky". Something would go wrong which caused pfSense to think the router/gateway was down.
No problem if this is the only gateway, but if other gateways were present (for example gateways to other internal networks) then pfSense would begin routing traffic to these other gateways.
Explicitly specifying a default gateway fixed this.
pfSense currently thinks the ISP gateway/router is down due to "packet loss". However there are no generally noticable network problems, other than this python implementation of traceroute which appears to show some kind of routing loop or packets being sent back to the sending machine.

This is all I know about the situation, other than to say if I increase the TTL enough, then I am able to see other more sensible IP addresses with the python 3 traceroute implementation. For example

Pinging (209.233.126.254) 209.233.126.254
Address: ('84.116.140.170', 0)
  TTL=9    RTT=85 ms    Type=11    Code=0    84.116.140.170

Just to finish off by saying is there any way to diagnose this issue and potential routing loop issues?

stephenw10

Interesting.

Do you see the same behaviour tracerouting to any external public IP?

If you pcap the traffic where are the replies actually coming from?

If it was actually a routing loop I would I expect it to timeout whatever TTL you set.

Steve

hypernova

@stephenw10 said in Possible routing loop? Routing loop diagnostics:

Do you see the same behaviour tracerouting to any external public IP?

Not sure if I understand: 209.233.126.254 is an external IP

stephenw10

Yes. What about to a different public IP?

If you are hitting something odd in the route you may not hit that to a different target.