Strange DHCP related problem XG-7100
On one part of our network we use Corinex cablelan (internet over coax) modems, and I have a problem with the master modems using dhcp. When I boot up these master modems, they get an ip address assigned, and then become unreachable. When I do a Wireshark capture the master modem keeps doing (R)STP requests. The cablelan clients cannot connect to the master modem.
- When I plug in these master modems to my old Pfsense pc based router (2.4.2-RELEASE-p1) everything works normally. Also tested on other dhcp enabled networks
- when I assign a fixed network address to the master, the master modems work normally
- after assigning a fixed ip address to the master modem, the cablelan client modems working well with dhcp (transparent bridge)
- no difference between the 2.4.5 firmware and the previous version
- I temporarily reverted back to my old pc router, and made a test setup with the 7100 and a spare master modem
As you can see I have a workaround for this problem (fixed address), but I am still curious how to investigate this further in case something similar happens to other devices.
Not entirely clear how this is connected.
The XG-7100 us the DHCP server here I assume?
The 'master' device is Ethernet connected to it? On one of the Eth ports directly? Via other switches etc?
The client modems are then connected to master device over coax and devices connected to them also pull dhcp leases from pfSense? But that doesn't work if the master device doesn't have an IP?
@stephenw10 thank you for the reply.
The master modem is connected by ethernet, and gets an network address by dhcp from the Netgate. The clients connect over coax to the master. Once connected they use the Netgate dhcp server as well (transparent bridge). At this time the clients only connect to the master when the master is given a fixed ip address.
On my test setup the master modem is connected directly to the Netgate router. The dhcp lease of the master is shown as offline and active immediately after assigned.
The problem occurred after a short power outage. I switched from pc-pfsense to Netgate recently. I think they booted up correctly on the Pfsense and kept working after the router switch. After a short power outage the master modems booted new and then the problem occurred. It took me a few hour to find out, because there were weeks between the router switch and this problem.
If it shows as off-line then it is not present in the ARP table, you can check that in Diag > ARP.
If the modem is not requesting another lease or trying to renew does it still think it has the initial IP?
Can we see the pcap of what the modem is sending?
The XG-7100 has a built in switch that your previous device would not have. However it doesn't support STP.
Unless you have added a bridge, which can be configured for STP, I would not expect STP to be present.
Is it possible there's a loop somewhere? Do you have more that one connection to the Eth ports?
The Corinex gets ip 192.168.27.184
Because the master modem is not reachable, I have to unplug/plug the power to reboot.
Currently I have a test setup, and only WAN1, the Corinex master and a laptop are connected to the Netgate. I don't think there is a loop somewhere.
The Corinex is indeed not in the ARP table.
I have tried to add a switch which is (R)STP capable between the Corinex and the Netgate. But this makes no difference.
I have tried to reach the Corinex over the old fixed ip address, but not able to connect.
Hmm, well that looks correct. And when you give it a fixed IP you are using the same address/subnet?
Really the only thing that raises a flag for me there is that you're using .local for your domain and that can cause problems with mDNS. That wouldn't stop you accessing the modem though.
I might have expected at least one part of that to come from the assigned IP. Hard to see how it could possibly not respond to ARP requests though. If you try to ping that IP from the firewall with a pcap running do you see ARP requests? Or responses?
@stephenw10 Thank again for the reply
When given a fixed ip I use the same subnet 192.168.24.13/255.255.252.0
I did two captures. One when the Corinex is booted up on a old Draytek router (the modem is reachable then):
Second I did a ping from a connected laptop (makes no difference pinging from laptop or from Netgate webinterface):
When I do an arp -a immediately after a ping, I have this entry in the arp table:
? (192.168.27.184) at (incomplete) on lagg0.4091 expired [vlan]
Is it possible something there is using the wrong subnet mask? Maybe it's hard coded to /24 somehow?
It succeeds when you set a fixed IP inside the same /24 as pfSense. It succeeds whgen connected to a dhcp server that's handing out /24.
Try setting the a static DHCP lease in pfSense so it gets an IP in the 192.168.24.X range when using dhcp. See if that then works.
pfSense never sees any ARP replies from the modem so the table is incomplete.
@stephenw10 The dhcp also succeeds on the pc based Pfsense router, which has the same dhcp and ip config as the Netgate. The difference between the Pfsense and the Netgate is the internal switch configuration from the Netgate (LAGG etc.)
There is a lot of traffic on the Pfsense router, so I took an old Draytek router to get a good/clean capture. I can run a capture on the Pfsense if necessary.
More data can only help. Is there any way to get a console connection on the modem maybe? That would probably show you what's happening.
@stephenw10 I did read the old manuals from the Corinex, and it should have a further undocumented rs485 port. I will look into it, but it can take a bit to figure out. I am really curious now what the problem is.
@stephenw10 I finally found the problem!
I compared the DHCP offer packet field to field from the Pfsense and the Netgate. The only difference was in the DNS part. The Netgate had 4 dns servers, and the Pfsense 1.
I reduced the dns servers to 2, and now it works as it should.
Thank you for all the support.
Hmm, nice catch. Interesting.