Not getting a DHCP WAN IP Address on netgate hardware.
-
@stephenw10 Here is the packet capture. I have replaced the public IP with xxx.xxx.xxx.xxx. You can see the ICMP requests that dpringer is making to 1.1.1.1. I noticed a lot of these are getting flagged for bad checksum, but I am not quite sure what to do about that.
10:45:51.905405 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 77: (tos 0x0, ttl 127, id 38590, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->861d)!) xxx.xxx.xxx.xxx.35343 > 1.1.1.1.53: [udp sum ok] 49570+ A? forum.netgate.com. (35) 10:45:51.905451 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 77: (tos 0x0, ttl 127, id 38591, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->861c)!) xxx.xxx.xxx.xxx.42560 > 1.1.1.1.53: [udp sum ok] 51250+ Type65? forum.netgate.com. (35) 10:45:51.921752 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 83: (tos 0x0, ttl 127, id 38592, offset 0, flags [none], proto UDP (17), length 69, bad cksum 0 (->8615)!) xxx.xxx.xxx.xxx.60130 > 1.1.1.1.53: [udp sum ok] 19119+ A? signaler-pa.youtube.com. (41) 10:45:51.921848 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 83: (tos 0x0, ttl 127, id 38593, offset 0, flags [none], proto UDP (17), length 69, bad cksum 0 (->8614)!) xxx.xxx.xxx.xxx.19205 > 1.1.1.1.53: [udp sum ok] 21591+ Type65? signaler-pa.youtube.com. (41) 10:45:51.934131 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 127, id 21954, offset 0, flags [DF], proto TCP (6), length 40, bad cksum 0 (->c8e1)!) xxx.xxx.xxx.xxx.8358 > 52.226.139.121.443: Flags [R.], cksum 0xb229 (correct), seq 3206783296, ack 1699559528, win 0, length 0 10:45:52.200962 60:22:32:46:45:0d > 01:80:c2:00:00:00, 802.3, length 39: LLC, dsap STP (0x42) Individual, ssap STP (0x42) Command, ctrl 0x03: STP 802.1w, Rapid STP, Flags [Learn, Forward, Agreement], bridge-id 8000.60:22:32:46:45:0c.8010, length 43 message-age 0.00s, max-age 20.00s, hello-time 2.00s, forwarding-delay 15.00s root-id 8000.60:22:32:46:45:0c, root-pathcost 0, port-role Designated 10:45:52.216361 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 43: (tos 0x0, ttl 64, id 63816, offset 0, flags [none], proto ICMP (1), length 29, bad cksum 0 (->62c5)!) xxx.xxx.xxx.xxx > 1.1.1.1: ICMP echo request, id 47797, seq 1577, length 9 10:45:52.240408 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 15133, offset 0, flags [none], proto UDP (17), length 72, bad cksum 0 (->12a8)!) xxx.xxx.xxx.xxx.6424 > 8.8.8.8.53: [bad udp cksum 0x2d26 -> 0xc857!] 13895+ PTR? 8.179.243.104.in-addr.arpa. (44) 10:45:52.297247 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 55: (tos 0x0, ttl 127, id 13743, offset 0, flags [DF], proto TCP (6), length 41, bad cksum 0 (->d40b)!) xxx.xxx.xxx.xxx.50716 > 172.64.41.3.443: Flags [.], cksum 0x25e3 (correct), seq 110458962:110458963, ack 572074752, win 1028, length 1 10:45:52.720360 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 43: (tos 0x0, ttl 64, id 23499, offset 0, flags [none], proto ICMP (1), length 29, bad cksum 0 (->43)!) xxx.xxx.xxx.xxx > 1.1.1.1: ICMP echo request, id 47797, seq 1578, length 9 10:45:52.835580 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 57232, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum 0 (->4b70)!) xxx.xxx.xxx.xxx.25868 > 3.95.234.235.30011: Flags [.], cksum 0x1094 (correct), seq 707216205:707217653, ack 148236916, win 166, options [nop,nop,TS val 35204921 ecr 94619178], length 1448 10:45:52.835654 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 57233, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum 0 (->4b6f)!) xxx.xxx.xxx.xxx.25868 > 3.95.234.235.30011: Flags [.], cksum 0x7d8e (correct), seq 1448:2896, ack 1, win 166, options [nop,nop,TS val 35204921 ecr 94619178], length 1448 10:45:52.835665 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 57234, offset 0, flags [DF], proto TCP (6), length 1500, bad cksum 0 (->4b6e)!) xxx.xxx.xxx.xxx.25868 > 3.95.234.235.30011: Flags [.], cksum 0x77e6 (correct), seq 2896:4344, ack 1, win 166, options [nop,nop,TS val 35204921 ecr 94619178], length 1448 10:45:52.835779 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 1461: (tos 0x0, ttl 63, id 57235, offset 0, flags [DF], proto TCP (6), length 1447, bad cksum 0 (->4ba2)!) xxx.xxx.xxx.xxx.25868 > 3.95.234.235.30011: Flags [P.], cksum 0x5acb (correct), seq 4344:5739, ack 1, win 166, options [nop,nop,TS val 35204921 ecr 94619178], length 1395 10:45:52.871216 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 127, id 47650, offset 0, flags [none], proto UDP (17), length 56, bad cksum 0 (->54b2)!) xxx.xxx.xxx.xxx.7567 > 8.8.8.8.53: [udp sum ok] 57083+ A? dns.google. (28) 10:45:52.871224 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 127, id 38594, offset 0, flags [none], proto UDP (17), length 56, bad cksum 0 (->8620)!) xxx.xxx.xxx.xxx.40601 > 1.1.1.1.53: [udp sum ok] 57083+ A? dns.google. (28) 10:45:52.918662 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 127, id 47651, offset 0, flags [none], proto UDP (17), length 56, bad cksum 0 (->54b1)!) xxx.xxx.xxx.xxx.31362 > 8.8.8.8.53: [udp sum ok] 54725+ A? dns.google. (28) 10:45:52.918707 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 127, id 47652, offset 0, flags [none], proto UDP (17), length 56, bad cksum 0 (->54b0)!) xxx.xxx.xxx.xxx.8219 > 8.8.8.8.53: [udp sum ok] 16179+ Type65? dns.google. (28) 10:45:52.919544 90:ec:77:34:73:8e > 78:ba:f9:30:82:33, ethertype IPv4 (0x0800), length 77: (tos 0x0, ttl 127, id 47653, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->54a8)!)
-
Mmm, nothing coming back from the gateway at all though.
The checksum errors are because hardware checksum off-loading is enabled. That's not a problem but you can disable it in Sys > Adv > Networking
-
@stephenw10 Yeah nothing comes back. It is weird.
-
In you can install the arping pkg you can try arping for the gateway:
[23.09-DEVELOPMENT][admin@4100-3.stevew.lan]/root: pkg install arping Updating pfSense-core repository catalogue... Fetching meta.conf: 0% pfSense-core repository is up to date. Updating pfSense repository catalogue... Fetching meta.conf: 0% pfSense repository is up to date. All repositories are up to date. The following 2 package(s) will be affected (of 0 checked): New packages to be INSTALLED: arping: 2.21_1 [pfSense] libnet: 1.2,1 [pfSense] Number of packages to be installed: 2 118 KiB to be downloaded. Proceed with this action? [y/N]: y [1/2] Fetching libnet-1.2,1.pkg: 100% 92 KiB 94.1kB/s 00:01 [2/2] Fetching arping-2.21_1.pkg: 100% 26 KiB 26.5kB/s 00:01 Checking integrity... done (0 conflicting) [1/2] Installing libnet-1.2,1... [1/2] Extracting libnet-1.2,1: 100% [2/2] Installing arping-2.21_1... [2/2] Extracting arping-2.21_1: 100% [23.09-DEVELOPMENT][admin@4100-3.stevew.lan]/root: rehash
Then:
[23.09-DEVELOPMENT][admin@4100-3.stevew.lan]/root: arping -c 3 172.21.16.1 ARPING 172.21.16.1 60 bytes from 00:08:a2:0c:c9:91 (172.21.16.1): index=0 time=767.357 usec 60 bytes from 00:08:a2:0c:c9:91 (172.21.16.1): index=1 time=661.690 usec 60 bytes from 00:08:a2:0c:c9:91 (172.21.16.1): index=2 time=682.343 usec --- 172.21.16.1 statistics --- 3 packets transmitted, 3 packets received, 0% unanswered (0 extra) rtt min/avg/max/std-dev = 0.662/0.704/0.767/0.046 ms
If the gateway doesn't respond even to arp there must be something low level disconnected somehow.
The ARP entry in the table will expired after ~15mins so it may appear to be there still even if it's not responding at all.
-
This post is deleted! -
What about the MTU settings? Does that matter with ONT modems? Also a duplex mismatch could occur Is the connection set to auto or full duplex on the WAN? I think it's a duplex mismatch as it corrects with a switch so the switch could be set to auto negotiation, and somehow the firewall is set to half of something.
https://docs.netgate.com/pfsense/en/latest/troubleshooting/low-throughput.html
-
There appear to be two issues here, at least. Firstly the ONT seems to be set to 100M fixed which means the interfaces on the 4100 cannot link to it directly.
Secondly the ISP gateway stops responding after some time. That's unlikely to be an MTU issue because pings are tiny. As are the DHCP requests.
We have seen something similar to this previously. A misbehaving ISP gateway stopped responding when it's ARP entry expired instead of sending an ARP request to renew it. IIRC we worked around it by setting the pfSense ARP expiry time low so that it sends an ARP request before the gateway expires it's entry. By default it's 20mins:
[23.09-DEVELOPMENT][admin@4100-3.stevew.lan]/root: sysctl net.link.ether.inet.max_age net.link.ether.inet.max_age: 1200
Try setting that to 5mins and see if that allows it to continue:
[23.09-DEVELOPMENT][admin@4100-3.stevew.lan]/root: sysctl net.link.ether.inet.max_age=300 net.link.ether.inet.max_age: 1200 -> 300
If that works you can add it as a system tunable.
Running an arping against the gateway would probably also renew the remote ARP entry.
Both are hacks that shouldn't be required!
-
@stephenw10 Thank you for your time on this. I will not have physical access to the device until Friday or Saturday. I will try it again and let you know what happens asap.
-
@stephenw10 This was the result of ARPing the gateway's mac
-
I assume that's after it stops responding? Does that ARPing work initially?
Did you try setting a lower max_age value?
-
@stephenw10 ARPing does not work initially, neither did lowering the max age value.
-
Hmm, the gateway doesn't respond to ARPing even when you are still able to reach external hosts?
-
@stephenw10 Correct
-
Hmm, then maybe it's blocking something immediately but continues passing traffic until it's ARP entry expires.
Hard to think what that could be given you are no longer pinging it.... -
@stephenw10 Sorry for the late reply. Life got a bit crazy there for a moment. I have tried a different switch in-between the Pfsense box and the ONT. Unfortunately I got the same result. What be the next step for support at this time since we seem to have exhausted our abilities here? Should we look into purchasing support from negate on this, or do you think that there is nothing that can be done at this time?
-
Hmm, I'm not sure what more they could do here. They could re-run those tests to check the data. But what you did seems good.
Both the 1100 and 4100 have interfaces with quirks that could be causing issues here. If you can I would try connecting a very generic pfSense CE install to see if that also behaves the same. Some hardware with Intel NICs if you have it.
Reading back I was almost sure it was going to be that ARP timeout value You could try setting that to something very low like 60.
Steve
-
What about offboarding?????
-
The default settings on the 1100 and 4100 should be fine there. Hard to imagine that preventing ARP. But easy to test...
-
@stephenw10 I will try both of these things as soon as I am able, but that probably won't be until Sunday.
-
Thank you both for the suggestions. Unfortunately, I got the same result after trying both suggestions. I did notice that if I unplug, and then replug the cable the interface comes back online for a while, but eventually does go offline again. I do not have any other hardware to test with atm.