Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfatt - ngeth0 interface disappears

    Scheduled Pinned Locked Moved General pfSense Questions
    25 Posts 2 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SeaMonkey
      last edited by

      Over the past few months, a handful of times, the ngeth0 interface disappears, WAN is reassigned to a physical interface, and the system log starts flooding with arpresolve: can't allocate llinfo for x.x.x.x on ngeth0. Any suggestions on nailing down the root cause of this? I was using pfatt without issue for a long time without encountering this, so I'm not certain what might have changed to cause it.

      1 Reply Last reply Reply Quote 0
      • S
        SeaMonkey
        last edited by

        This happened again this morning, but I realized my description wasn't entirely accurate. The ngeth0 interface doesn't disappear and get reassigned until I attempt to reset netgraph by issuing the ngctl commands described in the pfatt documentation.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          So what does happen before you do anything to try to recover? It stops passing traffic on WAN? Anything logged? Interfaces still show as UP?

          1 Reply Last reply Reply Quote 0
          • S
            SeaMonkey
            last edited by

            Yes, all traffic stops passing on WAN and the system log floods with the arpresolve message above. I checked the messages prior to it happening, but didn't find anything consistently happening immediately prior. Once the last event in the log was a pfBlocker update, and another was a Suricata rules update - nothing out of the ordinary. Just to see if it makes a difference, I changed the interface that the AT&T gateway is using. I haven't had any further outages, but it's only been a couple of days.

            S 1 Reply Last reply Reply Quote 0
            • S
              SeaMonkey @SeaMonkey
              last edited by

              Also WAN and all external VPN interfaces show as DOWN.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                And I assume x.x.x.x is the gateway IP there?

                It implies the WAN interface (ngeth0) has lost it's IP address/subnet. The logs should show something about that. The dhcp logs maybe? I would have expect the main system log to show something.

                Ah, it the WAN shows as down that would do it. What does ifconfig show the state of the interfaces to be?

                Steve

                S 2 Replies Last reply Reply Quote 0
                • S
                  SeaMonkey @stephenw10
                  last edited by

                  @stephenw10 I'll have to test ifconfig if/when I encounter it again.

                  1 Reply Last reply Reply Quote 0
                  • S
                    SeaMonkey @stephenw10
                    last edited by

                    @stephenw10 Yes, x.x.x.x is the gateway IP. Good call on checking the DHCP logs. When WAN goes down, the DHCP log is flooded with send_packet: No buffer space available.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Mmm, so that looks like it really is down somewhere. It could be the linked physical interface that goes down. pfSense wouldn't see that link change because it's looking at the the virtual ngeth0 interface so it just keeps trying to send until it fills the buffers.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • S
                        SeaMonkey
                        last edited by

                        It happened again. Grabbed ifconfig.

                        em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
                        	description: LAN
                        	options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
                        	ether 00:15:17:d3:1d:48
                        	inet6 fe80::215:17ff:fed3:1d48%em0 prefixlen 64 scopeid 0x1
                        	inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
                        	inet 10.10.10.1 netmask 0xffffffff broadcast 10.10.10.1
                        	media: Ethernet autoselect (1000baseT <full-duplex>)
                        	status: active
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        em1: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
                        	options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
                        	ether 00:15:17:d3:1d:49
                        	inet6 fe80::215:17ff:fed3:1d49%em1 prefixlen 64 scopeid 0x2
                        	media: Ethernet autoselect (1000baseT <full-duplex>)
                        	status: active
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        em2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
                        	options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
                        	ether fc:aa:14:20:bf:f1
                        	media: Ethernet autoselect
                        	status: no carrier
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        alc0: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
                        	options=c319a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,LINKSTATE>
                        	ether fc:aa:14:20:bf:f3
                        	inet6 fe80::feaa:14ff:fe20:bff3%alc0 prefixlen 64 scopeid 0x4
                        	media: Ethernet autoselect (1000baseT <full-duplex>)
                        	status: active
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        enc0: flags=0<> metric 0 mtu 1536
                        	groups: enc
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
                        	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
                        	inet6 ::1 prefixlen 128
                        	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
                        	inet 127.0.0.1 netmask 0xff000000
                        	groups: lo
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        pfsync0: flags=0<> metric 0 mtu 1500
                        	groups: pfsync
                        pflog0: flags=100<PROMISC> metric 0 mtu 33160
                        	groups: pflog
                        ngeth0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                        	description: WAN
                        	options=28<VLAN_MTU,JUMBO_MTU>
                        	ether 10:56:11:d9:ce:01
                        	inet6 fe80::1256:11ff:fed9:ce01%ngeth0 prefixlen 64 scopeid 0x9
                        	inet x.x.x.149 netmask 0xfffffc00 broadcast x.x.x.255
                        	inet x.x.x.197 netmask 0xffffffff broadcast x.x.x.197
                        	media: Ethernet autoselect (1000baseT <full-duplex>)
                        	status: active
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        ovpns5: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpns5 prefixlen 64 scopeid 0xa
                        	inet 10.0.8.1 --> 10.0.8.2 netmask 0xffffffff
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 35927
                        ovpns6: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1472
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpns6 prefixlen 64 scopeid 0xb
                        	inet 10.0.9.1 --> 10.0.9.2 netmask 0xffffff00
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 54737
                        ovpns4: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpns4 prefixlen 64 scopeid 0xc
                        	inet 10.0.10.1 --> 10.0.10.2 netmask 0xffffffff
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 74730
                        ovpnc1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpnc1 prefixlen 64 scopeid 0xd
                        	inet 10.6.110.2 --> 10.6.110.1 netmask 0xffffff00
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 85937
                        ovpnc2: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpnc2 prefixlen 64 scopeid 0xe
                        	inet 10.14.110.27 --> 10.14.110.1 netmask 0xffffff00
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 8277
                        ovpnc3: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
                        	options=80000<LINKSTATE>
                        	inet6 fe80::215:17ff:fed3:1d48%ovpnc3 prefixlen 64 scopeid 0xf
                        	inet 10.20.110.15 --> 10.20.110.1 netmask 0xffffff00
                        	groups: tun openvpn
                        	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                        	Opened by PID 23995
                        
                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Which is the physical WAN NIC?

                          Is the subnet shown on ngeth0 correct? The gateway is within that?

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • S
                            SeaMonkey
                            last edited by

                            em1 goes to the ONT. alc0 goes to the AT&T gateway. WAN is set to use DHCP. The ifconfig output for em1, alc0, and ngeth0 are identical when the connection is working.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              What about the subnet that's shown on it, is that still valid? Like it contains the gateway IP?

                              1 Reply Last reply Reply Quote 0
                              • S
                                SeaMonkey
                                last edited by

                                Yes, the gateway IP is within the subnet.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Hmm, it I really wouldn't expect it to show that arpresolve error unless the subnet containing the gateway no longer existed. 🤔

                                  S 1 Reply Last reply Reply Quote 0
                                  • S
                                    SeaMonkey @stephenw10
                                    last edited by SeaMonkey

                                    My setup has changed, but the problem persists. However, I think I can give some better detail on the symptom of the problem.

                                    Current configuration is simply pfSense with pfatt only using netgraph for VLAN0 tagging and the WAN interface directly connected to an Azores WAG-D20. Bridging interfaces for auth isn't needed in this configuration.

                                    #!/usr/bin/env sh
                                    
                                    RG_ETHER_ADDR="XX:XX:XX:XX:XX:XX"
                                    LOG=/var/log/pfatt.log
                                    ONT_IF="ix1"
                                    
                                    getTimestamp(){
                                        echo `date "+%Y-%m-%d %H:%M:%S :: [pfatt_azores.sh] ::"`
                                    }
                                    
                                    {
                                    /usr/bin/logger -st "pfatt" "starting pfatt..."
                                    /usr/bin/logger -st "pfatt" "configuration:"
                                    /usr/bin/logger -st "pfatt" "  ONT_IF = $ONT_IF"
                                    /usr/bin/logger -st "pfatt" " RG_ETHER_ADDR = $RG_ETHER_ADDR"
                                    
                                    # Netgraph cleanup.
                                    /usr/bin/logger -st "pfatt" "resetting netgraph..."
                                    /usr/sbin/ngctl shutdown $ONT_IF: >/dev/null 2>&1
                                    /usr/sbin/ngctl shutdown vlan0: >/dev/null 2>&1
                                    /usr/sbin/ngctl shutdown ngeth0: >/dev/null 2>&1
                                    
                                    /usr/bin/logger -st "pfatt" "your ONT should be connected to pyshical interface $ONT_IF"
                                    /usr/bin/logger -st "pfatt" "creating vlan node and ngeth0 interface..."
                                    /usr/sbin/ngctl mkpeer $ONT_IF: vlan lower downstream
                                    /usr/sbin/ngctl name $ONT_IF:lower vlan0
                                    /usr/sbin/ngctl mkpeer vlan0: eiface vlan0 ether
                                    /usr/sbin/ngctl msg vlan0: 'addfilter { vlan=0 hook="vlan0" }'
                                    /usr/sbin/ngctl msg ngeth0: set $RG_ETHER_ADDR
                                    
                                    /usr/bin/logger -st "pfatt" "enabling promisc for $ONT_IF..."
                                    /sbin/ifconfig $ONT_IF ether $RG_ETHER_ADDR
                                    /sbin/ifconfig $ONT_IF up
                                    /sbin/ifconfig $ONT_IF promisc
                                    } >> $LOG
                                    

                                    The connection remains stable until the state table is large for a prolonged period of time. "Large" is difficult to assess here, as I've seen the connection drop when it's around 80K, but it's also grown to over 130K at times and remained stable for a while before dying.

                                    While the connection doesn't recover on its own, at this point, manually executing the pfatt script allows everything to recover usually, although sometimes I've had to manually restart my OpenVPN servers.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      The odd thing here is that I expect to see something logged when it happens to give us some sort of clue top the cause but I assume there is still nothing?

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        SeaMonkey
                                        last edited by SeaMonkey

                                        Looking at the most recent instance...

                                        In the general log, there are gateway alarms on VPN gateways before it starts spamming the arpresolve messages

                                        Jan 16 11:26:56 	rc.gateway_alarm 	19405 	>>> Gateway alarm: SITE2SITE_VPNV4 (Addr:10.0.8.2 Alarm:1 RTT:26.363ms RTTsd:.380ms Loss:22%)
                                        Jan 16 11:26:56 	rc.gateway_alarm 	22616 	>>> Gateway alarm: PIA_OVPN_VPNV4 (Addr:10.12.110.1 Alarm:1 RTT:7.536ms RTTsd:.633ms Loss:22%)
                                        Jan 16 11:26:56 	rc.gateway_alarm 	25984 	>>> Gateway alarm: PIA_OVPN2_VPNV4 (Addr:10.2.110.1 Alarm:1 RTT:37.007ms RTTsd:.417ms Loss:22%)
                                        Jan 16 11:26:56 	php-fpm 	19915 	/rc.filter_configure_sync: MONITOR: PIA_OVPN_VPNV4 has packet loss, omitting from routing group US_PIA
                                        Jan 16 11:26:56 	php-fpm 	19915 	10.12.110.1|10.12.110.32|PIA_OVPN_VPNV4|7.536ms|0.633ms|22%|down|highloss
                                        Jan 16 11:26:56 	php-fpm 	19915 	/rc.filter_configure_sync: MONITOR: PIA_OVPN2_VPNV4 has packet loss, omitting from routing group US_PIA
                                        Jan 16 11:26:56 	php-fpm 	19915 	10.2.110.1|10.2.110.34|PIA_OVPN2_VPNV4|37.007ms|0.417ms|22%|down|highloss
                                        Jan 16 11:26:56 	php-fpm 	89734 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use PIA_OVPN_PF_VPNV4.
                                        Jan 16 11:26:56 	rc.gateway_alarm 	96856 	>>> Gateway alarm: XYGNET_VPNV4 (Addr:10.0.10.2 Alarm:1 RTT:52.162ms RTTsd:1.031ms Loss:22%)
                                        Jan 16 11:26:56 	php 	77766 	notify_monitor.php: Message sent to seamonkey@seamonkey.tech OK
                                        Jan 16 11:26:57 	php-fpm 	9450 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
                                        Jan 16 11:34:20 	root 	11114 	[PIA-API] Error! Failed to bind received port!
                                        Jan 16 12:00:00 	php 	74413 	[Suricata] Suricata signalled with SIGHUP for LAN (ix0)...
                                        Jan 16 12:00:00 	php 	74413 	[Suricata] Logs Mgmt job rotated 1 file(s) in '/var/log/suricata/suricata_ix030287/' ...
                                        Jan 16 12:01:04 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0
                                        Jan 16 12:01:05 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0
                                        Jan 16 12:01:05 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0 
                                        

                                        Gateway log is a flood of sendto error: 55 on all gateway interfaces. Entries don't reach back to the same time period.

                                        Nothing at all in the routing log in that time frame.

                                        DNS log is, of course, failure to resolve everything.

                                        DHCP log has repeated requests from the ngeth interface with no response.

                                        Jan 16 11:31:03 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:04 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:06 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:11 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:16 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:27 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:31:42 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:32:00 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
                                        Jan 16 11:32:49 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67 
                                        

                                        OpenVPN Log is just inactivity timeouts

                                        Jan 16 11:28:46 	openvpn 	22069 	TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
                                        Jan 16 11:28:46 	openvpn 	22069 	TLS Error: TLS handshake failed
                                        Jan 16 11:28:46 	openvpn 	62216 	TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
                                        Jan 16 11:28:46 	openvpn 	62216 	TLS Error: TLS handshake failed
                                        Jan 16 11:29:41 	openvpn 	39749 	[chicago421] Inactivity timeout (--ping-restart), restarting
                                        Jan 16 11:29:41 	openvpn 	39749 	SIGUSR1[soft,ping-restart] received, process restarting
                                        Jan 16 11:29:41 	openvpn 	98794 	[denver420] Inactivity timeout (--ping-restart), restarting
                                        Jan 16 11:29:41 	openvpn 	98794 	SIGUSR1[soft,ping-restart] received, process restarting
                                        Jan 16 11:29:41 	openvpn 	22362 	[montreal424] Inactivity timeout (--ping-restart), restarting
                                        Jan 16 11:29:41 	openvpn 	22362 	SIGUSR1[soft,ping-restart] received, process restarting 
                                        

                                        I guess what's key is that the DHCP server for WAN stops responding, but I don't really understand why manually running the pfatt script fixes that.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Hmm, this seems odd:

                                          Jan 16 11:34:20 	root 	11114 	[PIA-API] Error! Failed to bind received port!
                                          

                                          What is that? Some custom script?
                                          Seems unlikely to be a cause but...

                                          S 1 Reply Last reply Reply Quote 0
                                          • S
                                            SeaMonkey @stephenw10
                                            last edited by

                                            That's this:
                                            https://github.com/SeaMonkey82/PIA-NextGen-PortForwarding

                                            I suppose I could disable it for a while just to test, but it's worked without issue for a long time.

                                            As I mentioned, the connection remains perfectly stable when my state table usage remains low. To verify this, I suspended my Ethereum client testing setup, which is normally the bulk of my traffic. For context, I'm testing 20 client pairs and each client is configured to connect to anywhere from 45-160 peers at a time, and all of this uses 53 exposed ports on the firewall. I noticed that even when I suspended testing, state table usage remained high until I disabled the associated NAT rule for all of these ports.

                                            I just checked my firewall log and there are almost 10,000 blocks of attempted connections to this port range within the past few minutes, despite it being disabled since some time yesterday. Do you suppose my whole problem is just that my connection is being DDOSed?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.