Gateway - Send to Error 65



  • I am receiving send to error 65 as noted in the gateway logs. What might be the issue and solution.


  • Rebel Alliance Developer Netgate

    Error 65 is 'no route to host', so it means you have no route to the gateway at that time. Either it's not in your subnet or somehow the interface doesn't have a route to its subnet, etc.

    Since the WAN is DHCP, that could mean that it was handed a different address in a different subnet, perhaps from the modem itself.



  • Hi so how can we resolve this issue? Do we need to change settings on the WAN to allowing monitoring on that ?



  • I get all these sorts of errors and it's very frustrating because when these sorts of errors pop up the WAN connection stops untill you take some manual step. A cheap $30 router won't lock up like this, but "pro grade" pfsense does! I search, and search, and search, and can not find a good long term solution.

    Today my ISP did something, maybe some sort of outage or maintenance. And in pfsense it showed these messages as well as false 100% packet loss for a period of many hours until I edit gateway and apply settings then the WAN starts working again.

    Let's not be snobs and say if your ISP goes down switch, because even the best ISP will have some downtime.

    So how can I make sure pfsense keeps my WAN connection without having to jiggle the cords each time my ISP has some issue? Really is the only solution some hack that reboots the system when the errors pop up?



  • I'm getting this as well, every 3 hours exactly.  Error 64 or 65.

    May 17 17:24:55 pfSense dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 17:24:58 pfSense dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 75.RE.DAC.TED  bind_addr 75.RE.DAC.TED  identifier "WAN_DHCP "
    May 17 17:29:10 abominus dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 75.RE.DAC.TED  bind_addr 75.RE.DAC.TED  identifier "WAN_DHCP "
    May 17 17:29:12 abominus dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 75.RE.DAC.TED  bind_addr 75.RE.DAC.TED  identifier "WAN_DHCP "
    May 17 20:29:50 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: Alarm latency 30189us stddev 3683us loss 21%
    May 17 20:30:59 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:30:59 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:00 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:00 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:01 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:01 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:02 abominus dpinger: WAN_DHCP 75.RE.DAC.TED: sendto error: 65
    May 17 20:31:05 abominus dpinger: send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 0ms  data_len 0  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 20%  dest_addr 75.RE.DAC.TED  bind_addr 75.RE.DAC.TED  identifier "WAN_DHCP "
    

    Anyone have any ideas?  The fix seems to be to drop and renew the DHCP lease, or probably restart the interface in some fashion.


  • Netgate

    It is probably something like this:

    Your ISP gave you a lease
    They did something and that lease no longer works
    pfSense WAN did not lose link and still has a valid lease so there is no reason for it to ask for a renewal

    Rebooting your modem to fix whatever your ISP did is probably the answer.

    They should know they need to have a way to tell the modem to down/up the link if they're going to do whatever they did so the connected device will get a new lease.

    For instance, I get this from Cox:

    option dhcp-lease-time 86400;
      option dhcp-renewal-time 43200;
      option dhcp-rebinding-time 75600;

    If the link does not down/up, pfSense dhclient will not attempt a renewal for 12 hours. It will try that for 9 hours then try to rebind (go through the whole DHCPDISCOVER process) for an hour. If that doesn't work, the lease will expire and my internet will be broken. (I think. dhclient might just continue to try to use the last lease it got.)

    The gateway logs are interesting but the DHCP logs are probably more telling. Maybe filter on process dhclient.

    This is mine rebooting today to apply 2.4.3-p1:

    May 17 15:44:25 dhclient PREINIT
    May 17 15:44:25 dhclient 29025 DHCPREQUEST on igb0 to 255.255.255.255 port 67
    May 17 15:44:25 dhclient 29025 DHCPACK from 10.65.128.1
    May 17 15:44:25 dhclient REBOOT
    May 17 15:44:25 dhclient Starting add_new_address()
    May 17 15:44:25 dhclient ifconfig igb0 inet 198.51.100.132 netmask 255.255.255.0 broadcast 198.51.100.255
    May 17 15:44:25 dhclient New IP Address (igb0): 198.51.100.132
    May 17 15:44:25 dhclient New Subnet Mask (igb0): 255.255.255.0
    May 17 15:44:25 dhclient New Broadcast Address (igb0): 198.51.100.255
    May 17 15:44:25 dhclient New Routers (igb0): 198.51.100.1
    May 17 15:44:25 dhclient Adding new routes to interface: igb0
    May 17 15:44:25 dhclient /sbin/route add default 198.51.100.1
    May 17 15:44:25 dhclient Creating resolv.conf
    May 17 15:44:25 dhclient 29025 bound to 198.51.100.132 – renewal in 43200 seconds.

    You can look at the status of the lease in a file like this:

    cat /var/db/dhclient.leases.igb0

    (My WAN is interface igb0)



  • [NB: When you redact addresses like that, it makes sense to distinguish between the various addresses such as bind address the target address.]

    Jimp's post is pretty instructive…

    One question comes to mind... Are you using an explicit monitor address? Or using the default gateway address? If an explicit address, you might want to choose one a little farther inside your ISP, or one on the other side of your ISP.



  • I'm using the default address for the monitor.  I decided I don't care about obfuscating my IP here, so here's the contents of the dhclient.leases.em1 file:

    lease {
      interface "em1";
      fixed-address 75.100.91.105;
      option subnet-mask 255.255.240.0;
      option routers 75.100.80.1;
      option domain-name-servers 216.165.129.158,216.170.153.146;
      option domain-name "tds.net";
      option broadcast-address 75.100.95.255;
      option dhcp-lease-time 21600;
      option dhcp-message-type 5;
      option dhcp-server-identifier 64.50.232.14;
      option dhcp-renewal-time 10800;
      option dhcp-rebinding-time 18900;
      renew 5 2018/5/18 13:19:17;
      rebind 5 2018/5/18 15:34:17;
      expire 5 2018/5/18 16:19:17;
    }
    lease {
      interface "em1";
      fixed-address 75.100.91.105;
      option subnet-mask 255.255.240.0;
      option routers 75.100.80.1;
      option domain-name-servers 216.165.129.158,216.170.153.146;
      option domain-name "tds.net";
      option broadcast-address 75.100.95.255;
      option dhcp-lease-time 21600;
      option dhcp-message-type 5;
      option dhcp-server-identifier 64.50.232.14;
      option dhcp-renewal-time 10800;
      option dhcp-rebinding-time 18900;
      renew 5 2018/5/18 14:09:33;
      rebind 5 2018/5/18 16:24:33;
      expire 5 2018/5/18 17:09:33;
    }
    
    

    So like clockwork, things stopped working exactly 3 hours after I last renewed the lease overnight.  This morning I decided to hard power-cycle the DSL modem/router (T3200M in bridge mode) and while I was at it, change my entire internal subnet from 192.168.0.0/23 to 10.0.0.0/8 and re-IP everything, just in case it's some weird thing with the T3200M being on 192.168.0.1 even though it's in Bridge mode.  It hasn't been long enough for me to know yet if any of this has helped yet.

    As far as why releasing/renewing would work, I really don't know.  It would come back with the same IP every time - in fact, my IP hasn't changed in about a year I don't think.  Also I'm mostly getting Error 64 on the dpinger, not 65, but not sure.  Thanks again for the help - I will see how things go for time being.  I should know in about 2 hours if anything I did helped.

    Edit: Here's dhclient log: https://pastebin.com/SB8UddBS

    I got a new modem (the aforementioned T3200M) and upgraded service during the morning of 5/17.  You can see where I manually RENEW as well.

    Edit 2: It made it past 3 hours OK, but I note that it did that one other time and then started having problems at 6 hours.  I will be out kayaking all day today so I won't know for sure til I get home unless I decide to start pinging things from my phone while kayaking, which sounds less fun than kayaking without pinging things from my phone.  So I'll find out this evening sometime.  I'm hopeful that this is resolved, but we'll see.


  • Netgate

    Obviously some DHCP server issues with the renewal here:

    May 17 21:04:10 abominus dhclient[97689]: bound to 75.100.91.105 – renewal in 10800 seconds.
    May 18 00:04:10 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 00:04:10 abominus dhclient[2635]: DHCPACK from 64.50.232.14
    May 18 00:04:10 abominus dhclient: RENEW
    May 18 00:04:10 abominus dhclient: Creating resolv.conf
    May 18 00:04:10 abominus dhclient[2635]: bound to 75.100.91.105 – renewal in 10800 seconds.
    May 18 03:04:10 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:11 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:12 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:13 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:15 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:20 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:31 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:39 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:04:50 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:05:05 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:05:24 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:05:46 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:06:19 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:07:52 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:09:41 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:10:29 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:10:54 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:11:20 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:11:49 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:12:17 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:12:28 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:12:40 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 03:13:04 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67

    Then it tries to rebind and gets an answer not from

    May 18 00:04:10 abominus dhclient[2635]: DHCPACK from 64.50.232.14

    but from

    May 18 06:09:33 abominus dhclient[5041]: DHCPACK from 75.100.80.1

    I would suspect that would be coming from the modem itself, and not from the ISP. Just a guess.

    May 18 05:17:42 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 05:18:07 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 05:18:38 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 05:18:52 abominus dhclient[2635]: DHCPREQUEST on em1 to 64.50.232.14 port 67
    May 18 05:19:17 abominus dhclient[2635]: DHCPREQUEST on em1 to 255.255.255.255 port 67
    May 18 05:19:17 abominus dhclient[2635]: DHCPACK from 75.100.80.1
    May 18 05:19:17 abominus dhclient: RENEW
    May 18 05:19:17 abominus dhclient: Creating resolv.conf
    May 18 05:19:17 abominus dhclient[2635]: bound to 75.100.91.105 – renewal in 10800 seconds.
    May 18 06:09:31 abominus dhclient: PREINIT
    May 18 06:09:31 abominus dhclient[5041]: DHCPREQUEST on em1 to 255.255.255.255 port 67
    May 18 06:09:33 abominus dhclient[5041]: DHCPREQUEST on em1 to 255.255.255.255 port 67
    May 18 06:09:33 abominus dhclient[5041]: DHCPACK from 75.100.80.1
    May 18 06:09:33 abominus dhclient: REBOOT
    May 18 06:09:33 abominus dhclient: Starting add_new_address()
    May 18 06:09:33 abominus dhclient: ifconfig em1 inet 75.100.91.105 netmask 255.255.240.0 broadcast 75.100.95.255
    May 18 06:09:33 abominus dhclient: New IP Address (em1): 75.100.91.105
    May 18 06:09:33 abominus dhclient: New Subnet Mask (em1): 255.255.240.0
    May 18 06:09:33 abominus dhclient: New Broadcast Address (em1): 75.100.95.255
    May 18 06:09:33 abominus dhclient: New Routers (em1): 75.100.80.1
    May 18 06:09:33 abominus dhclient: Adding new routes to interface: em1
    May 18 06:09:33 abominus dhclient: /sbin/route add default 75.100.80.1
    May 18 06:09:33 abominus dhclient: Creating resolv.conf

    Probably also need to take a look at the status of the WAN interface and the routing table (routes to the WAN subnet and the default route in particular) at the time it is having trouble renewing.

    ifconfig em1 # Or Status > Interfaces

    netstat -rn4 # Or Diagnostics > Routes



  • I return after 8nm of kayaking, and I have functional internet.  I suspect either the second power-cycling of everything or changing my home subnet from 192.168.0.0/23 to 10.0.0.0/8 has resolved this.  It was a bit odd for things to just flat out die exactly at 3 hours so I'm glad it seems to be improving.  I'm over 12 hours now with no issues so fingers crossed.

    I'll have the new modem next week anyways, so we'll see if that makes any difference either.

    Thanks again for the help to everyone.



  • Of course right after I post that it dies again.  What on earth.  I'll do some more digging.

    Edit: Here's some info for now, while it's working:

    [2.4.3-RELEASE][root@abominus.shortspecialbus.org]/root: netstat -rn4
    Routing tables
    
    Internet:
    Destination        Gateway            Flags     Netif Expire
    default            75.100.80.1        UGS         em1
    10.0.0.0/8         link#1             U           em0
    10.0.0.1           link#1             UHS         lo0
    75.100.80.0/20     link#2             U           em1
    75.100.91.105      link#2             UHS         lo0
    127.0.0.1          link#3             UH          lo0
    [2.4.3-RELEASE][root@abominus.shortspecialbus.org]/root: ifconfig em1
    em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:25:90:6a:f2:2f
            hwaddr 00:25:90:6a:f2:2f
            inet6 fe80::225:90ff:fe6a:f22f%em1 prefixlen 64 scopeid 0x2 
            inet 75.100.91.105 netmask 0xfffff000 broadcast 75.100.95.255 
            nd6 options=23 <performnud,accept_rtadv,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)
            status: active</full-duplex></performnud,accept_rtadv,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast>
    


  • Update:  This keeps happening.  Apparently if it goes long enough it'll come back on its own for 3 hours, then drop again.  I am not sure if it's up for 3, down for 3, up for 3, etc.  Here's some data from when it's not working - all the network info is the same, it's just unable to reach anything:

    Routing tables
    
    Internet:
    Destination        Gateway            Flags     Netif Expire
    default            75.100.80.1        UGS         em1
    10.0.0.0/8         link#1             U           em0
    10.0.0.1           link#1             UHS         lo0
    75.100.80.0/20     link#2             U           em1
    75.100.91.105      link#2             UHS         lo0
    127.0.0.1          link#3             UH          lo0
    em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:25:90:6a:f2:2f
            hwaddr 00:25:90:6a:f2:2f
            inet6 fe80::225:90ff:fe6a:f22f%em1 prefixlen 64 scopeid 0x2
            inet 75.100.91.105 netmask 0xfffff000 broadcast 75.100.95.255
            nd6 options=23 <performnud,accept_rtadv,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)
            status: active</full-duplex></performnud,accept_rtadv,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast> 
    

    I decided to drop connection, unplug the modem for a bit, spoof my mac, and bring everything back up again in hopes to maybe get a new IP in case there's just something wonky.  Despite the new MAC, it came back with the same IP as always - the modem must be retaining it on its end I guess, although this is a new modem and the IP it pulled on its own before it was even hooked up to the pfsense box was identical to the IP I've had for over a year.  Is that attached to the account somehow with DSL?  I don't know how that would have happened otherwise.  The previous modem was my own so unless they went through a fair bit of effort I don't think they're MAC spoofing on it.  I may try connecting to it on another port and logging into its interface and see if there's anything useful there.  Right now this is extremely annoying to just lose internet connect every 3 hours on the dot and I'm not sure how much longer my wife will tolerate it.

    At this point I'm tempted to just set it all to static for a bit and see if it helps, at least til I get a new modem.  I have a new modem coming next week in case it's just something incredibly bizarre with Bridge Mode on the one they gave me on Thursday.  Failing that I'm just going to have to take my pfsense box out of the equation and see what happens.


  • Netgate

    There is no reason, based on those outputs, you should be getting errno 65.

    when it's not working - all the network info is the same, it's just unable to reach anything:

    Does that mean you are getting errno 65 or just no successful connections?

    From the shell, what do you get when you run these:

    ping -c5 75.100.80.1

    ping -c5 8.8.8.8

    arp 75.100.80.1

    You might want to get the output from those when working then again when not working.

    Is that attached to the account somehow with DSL?  I don't know how that would have happened otherwise.

    Question for your ISP. You HAVE called them about this right?

    10.0.0.0/8        link#1            U          em0

    As an aside, putting 10.0.0.0/8 on an interface is just silly. Some random suggestions:

    10.42.175.0/24
    172.19.25.0/24
    192.168.183.0/24



  • Update: Nothing here makes any sense.  I called ISP, they said they will offer zero support at all with the T3200M in Bridge Mode and said that if I want answers to anything other than the time of day, I must disable bridge mode.  Unfortunately, there are zero other internet options where I live without going to Satellite, which isn't a realistic option.

    For time being I've turned off bridge mode, given my pfsense box a static DHCP mapping on the T3200M, and stuck it in the normal DMZ.  The advanced DMZ didn't seem to work right so I gave up that attempt.  Things work well enough for now to get me to having my own non-router modem, outside of things like the PS4 and whatnot getting a decent NAT type.  Modem is next week.  We'll hope that helps.

    The results of the various pings to the gateway are just "no route to host".  There's no reason for any of this other than they did something to cripple bridge mode as far as I can tell.

    If they pull the "we won't answer anything at all if you use your own modem" bullshit I'm not sure what I'm going to do about that.  I don't have many options here, despite at least some of this probably being a violation of Magnusson-Moss in some fashion.  I'm honestly not sure pfsense is entirely not to blame here, since during the "outage" I'm able to successfully ping things from the modem itself (I hooked another computer directly up to it to do various testing during that time) but I'm not really willing to try bridge mode to anything else due to firewall concerns.

    Edit: sometimes, but not all the time, I get a bunch of

    arpresolve: can't allocate llinfo for 75.100.80.1 on em1
    arpresolve: can't allocate llinfo for 75.100.80.1 on em1
    arpresolve: can't allocate llinfo for 75.100.80.1 on em1
    

    in the dmesg.  This is so stupid.

    As far as a /8 I know it's silly but it's what I did at the time in a hurry and it's not like there's a shortage of RFC1918 running around.



  • I spent some time with the aforementioned Advanced DMZ mode.  More baffling stuff:  According to various documentation, it (should) basically seems to more or less work like bridge mode, except theoretically the ISP should like it more.  This is a guide from an ISP that isn't mine on getting it working:

    http://bellaliant.bell.ca/binaries/content/assets/support/Customer/Support/StaticIP/dmz-config.pdf

    So, following that guide, I'm able to get things working for an EXTREMELY short time every time I renew the DHCP lease on my WAN interface.  It gets an ack from the T3200M at 192.168.0.1, and binds to the public IP, the same one listed before.  I'm then able to ping the gateway for several seconds, and then it all stops working.  Every time it tries to renew, it'll get the same exact IP, and then be able to ping for several seconds again before failing pings eternally until the next dhclient run.  The longest I've been able to ping the gateway (75.100.80.1) and be able to do DNS resolution, etc was about 30 seconds.  It seems to average closer to 5 seconds.  During the brief period that things are "working" after renewing the lease, "arp 75.100.80.1" comes back with the DNS name of the gateway and the MAC of the LAN port on the T3200M.  When not working, it comes back with the same information except no DNS.  I'm honestly wondering if this is just a faulty modem.  I know I keep saying I have a new one coming, and I do, but the mystery here is baffling me and I'm rather frustrated with the refusal of my ISP to even talk to me about trying to do this.  For what it's worth, I temporarily disabled the "block RFC1918" setting on the gateway, as well as added rules allowing 192.168.0.1 to connect to the WAN interface explicitly, as well as allowing the WAN IP itself to connect to the WAN interface as I was seeing firewall blocks for both.  It didn't seem to help.

    A computer hooked directly up to the T3200M and on the 192.168.0.0/24 subnet is able to connect with no issues throughout this, for what it's worth.  This appears to be an issue SOLELY with either bridge mode or the 'Advanced DMZ" mode.

    One bit of info here is I note that the T3200M ARP table is showing the ARP MAC correctly as the WAN MAC on my pfsense box for my WAN IP, however it's status 0x0.  I'm wondering if I need to do something else to allow the ARPs on the WAN interface beyond allow any traffic from both 192.168.0.1 and WAN IP.  I'll keep digging on that since I suspect it may have something to do with it.

    Edit: some ARP captures.  I'm not seeing a reply for that initial request and I'm wondering if that's it.  I'm a linux sysadmin by trade but networking is by far my weakest area and I flat-out suck at it for things like this.  I could easily be barking up the wrong tree here, maybe even the wrong forest.```
    tcpdump: listening on em1, link-type EN10MB (Ethernet), capture size 262144 bytes
    22:28:41.559771 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has h75-100-91-105.mdtnwi.dsl.dynamic.tds.net tell h75-100-91-105.mdtnwi.dsl.dynamic.tds.net, length 28
    22:28:41.642408 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 75.100.80.1 tell h75-100-91-105.mdtnwi.dsl.dynamic.tds.net, length 28
    22:28:41.704081 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 75.100.80.1 tell h75-100-91-105.mdtnwi.dsl.dynamic.tds.net, length 28
    22:28:41.844106 ARP, Ethernet (len 6), IPv4 (len 4), Reply 75.100.80.1 is-at e8:6f:f2:d4:b6:c0 (oui Unknown), length 46
    22:28:41.995228 ARP, Ethernet (len 6), IPv4 (len 4), Reply 75.100.80.1 is-at e8:6f:f2:d4:b6:c0 (oui Unknown), length 46

    
    Edit 2: Ok a little more, I set up a Proxy-ARP for specifically that IP and I see a response now with the correct MAC (or maybe they were showing up before and I just wasn't looking when it happened).  It flags 0x2 in the ARP table on the T3200M for a short bit, and then drops back to 0x0.  It works for the ~10 seconds when it's 0x2, but then stops working and drops back to 0x0.  During the functional time, I see normal other ARP requests on my WAN interface for other ISP clients, and for all the "Request who-has 75.100.91.105 tell 75.100.91.105" requests I see a reply with the correct MAC.  So I'm not sure what's up.  I'm honestly starting to think this modem is either broken or that this is beyond me and I should just let it rest.  I'm still seeing a ton of "arpresolve: can't allocate llinfo for 75.100.80.1 on em1" during problem times too.

  • Netgate

    That "Advanced DMZ" mode looks like a reasonable compromise. Especially if it results in them acting less like ^#%$s.

    Did you do all of the configure, reboot, configure, reboot steps exactly as they recommended?

    Have you done as hard a reset as you can to that modem?

    I assume the R3000 mentioned in that document is substantially-similar to the T3200M you have?

    You have access to the ISP device and can freely check its status?

    You have access to set the MAC address used for the "Advanced DMZ" mode as described on page 7? That is the actual hardware address for the WAN port and is not spoofed or anything?

    You can connect a laptop (with a MAC address DIFFERENT from the "Advanced DMZ" MAC) and get an address from the LAN side of the T3200M and get good DHCP, etc and things work? Can you check the web interface of the modem from there?

    I don't see anything that might be incompatible. Should work fine if it does what that doc says it does.

    This does not make a lot of sense to me:

    Request who-has h75-100-91-105.mdtnwi.dsl.dynamic.tds.net tell h75-100-91-105.mdtnwi.dsl.dynamic.tds.net, length 28

    Unless that is the DMZ modem that has 75.100.91.105 asking its LAN for the ARP for the same address and looking for a response from the WAN interface. Looking at the full pcaps with MAC addresses, etc would shed light there.

    Honestly, if I were having the troubles you are, I would already have a managed switch between the modem and the WAN port and be running packet captures on a mirror port there.

    I am unsure how to instruct you in starting a packet capture on pfSense that will give you what you need. Maybe something like this:

    Get everything configured as they describe for "Advanced DMZ" mode.

    Disconnect patch from pfSense WAN to modem.

    Disconnect power from modem

    Diagnostics > Packet capture, Select WAN, don't filter anything, and set the packet count to something like 100000. Start the capture.

    Reconnect the modem patch cord to the WAN port

    Power on the modem

    Do whatever testing you want and, after a bit, stop the capture and download it. You can open that straight away in wireshark. See what's there. Feel free to post it here if there is nothing you don't want seen. I can send you a nextcloud link to upload it to me if you'd rather since PMs don't allow attachments.



  • @Derelict:

    That "Advanced DMZ" mode looks like a reasonable compromise. Especially if it results in them acting less like ^#%$s.

    Did you do all of the configure, reboot, configure, reboot steps exactly as they recommended?

    Have you done as hard a reset as you can to that modem?

    I assume the R3000 mentioned in that document is substantially-similar to the T3200M you have?

    You have access to the ISP device and can freely check its status?

    You have access to set the MAC address used for the "Advanced DMZ" mode as described on page 7? That is the actual hardware address for the WAN port and is not spoofed or anything?

    You can connect a laptop (with a MAC address DIFFERENT from the "Advanced DMZ" MAC) and get an address from the LAN side of the T3200M and get good DHCP, etc and things work? Can you check the web interface of the modem from there?

    Yes on all.

    I don't see anything that might be incompatible. Should work fine if it does what that doc says it does.

    This does not make a lot of sense to me:

    Request who-has h75-100-91-105.mdtnwi.dsl.dynamic.tds.net tell h75-100-91-105.mdtnwi.dsl.dynamic.tds.net, length 28

    Unless that is the DMZ modem that has 75.100.91.105 asking its LAN for the ARP for the same address and looking for a response from the WAN interface. Looking at the full pcaps with MAC addresses, etc would shed light there.

    I suspect that is what it's doing based on that document's mention of needing the WAN interface to respond to ARP requests.

    Honestly, if I were having the troubles you are, I would already have a managed switch between the modem and the WAN port and be running packet captures on a mirror port there.

    Don't have one handy.  Can get one from work if needed, but I honestly think I'm waving the white flag on this until my new modem comes.

    I am unsure how to instruct you in starting a packet capture on pfSense that will give you what you need. Maybe something like this:

    Get everything configured as they describe for "Advanced DMZ" mode.

    Disconnect patch from pfSense WAN to modem.

    Disconnect power from modem

    Diagnostics > Packet capture, Select WAN, don't filter anything, and set the packet count to something like 100000. Start the capture.

    Reconnect the modem patch cord to the WAN port

    Power on the modem

    Do whatever testing you want and, after a bit, stop the capture and download it. You can open that straight away in wireshark. See what's there. Feel free to post it here if there is nothing you don't want seen. I can send you a nextcloud link to upload it to me if you'd rather since PMs don't allow attachments.

    Thank you for this!  I think I'm at my wit's end with it, and my wife is nearing the end of her patience with it as well, so I'm probably not going to do this right now.  If I'm unable to get things working with the Vigor130, this is a thing I will do.  This whole thing is a further reminder that I should probably take a networking course.  I know enough to be a linux sysadmin and I can farm weird stuff out to the network services group, but things like this are outside my zone.  I'll bug my boss to send me to one.  Thank you again for all of the help - everything looks like it should all be working fine and then it's just not for completely unknown reasons so it's probably made my posts a bit cranky sounding, so to speak.



  • So, an update here. I ended up returning the Vigor 130 after working with Draytek support. I couldn't get anything better than almost exactly 1/2 of my expected download speed (12.5mbit rather than 25mbit) no matter what firmware I used. So. Bleh.

    I went for a while just using the ISP modem, but I really am unhappy with some stuff, so I'm trying Bridge Mode again, except this time I've formatted and reinstalled my pfsense box with my config tucked away somewhere else. My goal is to see if I can get it working with a fresh install just in case something was screwy. So far all I've done is set a couple static DHCP IPs, turn on static DHCP mapping for the DNS forwarder, and one NAT rule to forward WAN ssh to an internal linux box. And reset the password. That's it - I've done pretty much nothing else. I want to see if it works with a fresh install.

    If I still have the same issue where every 3 or 6 hours, things die for several hours, I'm going to try connecting directly to a linux box (from bridge mode again) and see how that goes. If that goes well, then there must be some sort of issue with pfsense. If it doesn't, well, then there's an issue with bridge mode on the modem (this is a replacement T3200M since for some reason they sent me one un-asked) or with something else that's going to end up outside my control.

    I'm not very optimistic. I really miss having pfsense but I'm just not sure it's in the cards. It works well enough if I just have the pfsense box hooked up to the router in DMZ mode (the advanced DMZ never worked when I tested it) except for the whole double-nat thing which wreaks some havoc with things like PS4 and probably P2P if I ever used it. We'll see what happens for now, I'll report back. Sorry for going silent - I really was hoping Draytek would figure something out since it held the connection just fine, only at half speed for unknown reasons.



  • Additional update: I give up with pfsense as the gateway. I've done a compromise of sorts, I have my pfsense box just with the LAN interface active, and it's doing my DHCP and DNS. The T3200M is just doing routing. I lose some stuff, like bandwidthd and the ability to see what's using my bandwidth, but I don't have a double NAT and I'm not just randomly losing the ability to contact the gateway every 3-6 hours like clockwork. I haven't tried a non-pfsense host on bridge mode but I think I just give up at this point. At least my LAN hostname resolution isn't terrible with the pfsense box doing that. I might split that off to another linux host like my NAS or something, but this works and I'm just so tired of this.

    I don't know where the fault lies. None of it ever made a lot of sense with how it was manifesting and I never did get around to packet captures. Thanks again for the help. This will work well enough I guess.