Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfatt - ngeth0 interface disappears

    Scheduled Pinned Locked Moved General pfSense Questions
    25 Posts 2 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, it I really wouldn't expect it to show that arpresolve error unless the subnet containing the gateway no longer existed. 🤔

      S 1 Reply Last reply Reply Quote 0
      • S
        SeaMonkey @stephenw10
        last edited by SeaMonkey

        My setup has changed, but the problem persists. However, I think I can give some better detail on the symptom of the problem.

        Current configuration is simply pfSense with pfatt only using netgraph for VLAN0 tagging and the WAN interface directly connected to an Azores WAG-D20. Bridging interfaces for auth isn't needed in this configuration.

        #!/usr/bin/env sh
        
        RG_ETHER_ADDR="XX:XX:XX:XX:XX:XX"
        LOG=/var/log/pfatt.log
        ONT_IF="ix1"
        
        getTimestamp(){
            echo `date "+%Y-%m-%d %H:%M:%S :: [pfatt_azores.sh] ::"`
        }
        
        {
        /usr/bin/logger -st "pfatt" "starting pfatt..."
        /usr/bin/logger -st "pfatt" "configuration:"
        /usr/bin/logger -st "pfatt" "  ONT_IF = $ONT_IF"
        /usr/bin/logger -st "pfatt" " RG_ETHER_ADDR = $RG_ETHER_ADDR"
        
        # Netgraph cleanup.
        /usr/bin/logger -st "pfatt" "resetting netgraph..."
        /usr/sbin/ngctl shutdown $ONT_IF: >/dev/null 2>&1
        /usr/sbin/ngctl shutdown vlan0: >/dev/null 2>&1
        /usr/sbin/ngctl shutdown ngeth0: >/dev/null 2>&1
        
        /usr/bin/logger -st "pfatt" "your ONT should be connected to pyshical interface $ONT_IF"
        /usr/bin/logger -st "pfatt" "creating vlan node and ngeth0 interface..."
        /usr/sbin/ngctl mkpeer $ONT_IF: vlan lower downstream
        /usr/sbin/ngctl name $ONT_IF:lower vlan0
        /usr/sbin/ngctl mkpeer vlan0: eiface vlan0 ether
        /usr/sbin/ngctl msg vlan0: 'addfilter { vlan=0 hook="vlan0" }'
        /usr/sbin/ngctl msg ngeth0: set $RG_ETHER_ADDR
        
        /usr/bin/logger -st "pfatt" "enabling promisc for $ONT_IF..."
        /sbin/ifconfig $ONT_IF ether $RG_ETHER_ADDR
        /sbin/ifconfig $ONT_IF up
        /sbin/ifconfig $ONT_IF promisc
        } >> $LOG
        

        The connection remains stable until the state table is large for a prolonged period of time. "Large" is difficult to assess here, as I've seen the connection drop when it's around 80K, but it's also grown to over 130K at times and remained stable for a while before dying.

        While the connection doesn't recover on its own, at this point, manually executing the pfatt script allows everything to recover usually, although sometimes I've had to manually restart my OpenVPN servers.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          The odd thing here is that I expect to see something logged when it happens to give us some sort of clue top the cause but I assume there is still nothing?

          1 Reply Last reply Reply Quote 0
          • S
            SeaMonkey
            last edited by SeaMonkey

            Looking at the most recent instance...

            In the general log, there are gateway alarms on VPN gateways before it starts spamming the arpresolve messages

            Jan 16 11:26:56 	rc.gateway_alarm 	19405 	>>> Gateway alarm: SITE2SITE_VPNV4 (Addr:10.0.8.2 Alarm:1 RTT:26.363ms RTTsd:.380ms Loss:22%)
            Jan 16 11:26:56 	rc.gateway_alarm 	22616 	>>> Gateway alarm: PIA_OVPN_VPNV4 (Addr:10.12.110.1 Alarm:1 RTT:7.536ms RTTsd:.633ms Loss:22%)
            Jan 16 11:26:56 	rc.gateway_alarm 	25984 	>>> Gateway alarm: PIA_OVPN2_VPNV4 (Addr:10.2.110.1 Alarm:1 RTT:37.007ms RTTsd:.417ms Loss:22%)
            Jan 16 11:26:56 	php-fpm 	19915 	/rc.filter_configure_sync: MONITOR: PIA_OVPN_VPNV4 has packet loss, omitting from routing group US_PIA
            Jan 16 11:26:56 	php-fpm 	19915 	10.12.110.1|10.12.110.32|PIA_OVPN_VPNV4|7.536ms|0.633ms|22%|down|highloss
            Jan 16 11:26:56 	php-fpm 	19915 	/rc.filter_configure_sync: MONITOR: PIA_OVPN2_VPNV4 has packet loss, omitting from routing group US_PIA
            Jan 16 11:26:56 	php-fpm 	19915 	10.2.110.1|10.2.110.34|PIA_OVPN2_VPNV4|37.007ms|0.417ms|22%|down|highloss
            Jan 16 11:26:56 	php-fpm 	89734 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use PIA_OVPN_PF_VPNV4.
            Jan 16 11:26:56 	rc.gateway_alarm 	96856 	>>> Gateway alarm: XYGNET_VPNV4 (Addr:10.0.10.2 Alarm:1 RTT:52.162ms RTTsd:1.031ms Loss:22%)
            Jan 16 11:26:56 	php 	77766 	notify_monitor.php: Message sent to seamonkey@seamonkey.tech OK
            Jan 16 11:26:57 	php-fpm 	9450 	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
            Jan 16 11:34:20 	root 	11114 	[PIA-API] Error! Failed to bind received port!
            Jan 16 12:00:00 	php 	74413 	[Suricata] Suricata signalled with SIGHUP for LAN (ix0)...
            Jan 16 12:00:00 	php 	74413 	[Suricata] Logs Mgmt job rotated 1 file(s) in '/var/log/suricata/suricata_ix030287/' ...
            Jan 16 12:01:04 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0
            Jan 16 12:01:05 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0
            Jan 16 12:01:05 	kernel 		arpresolve: can't allocate llinfo for 99.xxx.xxx.1 on ngeth0 
            

            Gateway log is a flood of sendto error: 55 on all gateway interfaces. Entries don't reach back to the same time period.

            Nothing at all in the routing log in that time frame.

            DNS log is, of course, failure to resolve everything.

            DHCP log has repeated requests from the ngeth interface with no response.

            Jan 16 11:31:03 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:04 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:06 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:11 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:16 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:27 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:31:42 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:32:00 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67
            Jan 16 11:32:49 	dhclient 	91921 	DHCPREQUEST on ngeth0 to 172.xxx.xxx.1 port 67 
            

            OpenVPN Log is just inactivity timeouts

            Jan 16 11:28:46 	openvpn 	22069 	TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
            Jan 16 11:28:46 	openvpn 	22069 	TLS Error: TLS handshake failed
            Jan 16 11:28:46 	openvpn 	62216 	TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
            Jan 16 11:28:46 	openvpn 	62216 	TLS Error: TLS handshake failed
            Jan 16 11:29:41 	openvpn 	39749 	[chicago421] Inactivity timeout (--ping-restart), restarting
            Jan 16 11:29:41 	openvpn 	39749 	SIGUSR1[soft,ping-restart] received, process restarting
            Jan 16 11:29:41 	openvpn 	98794 	[denver420] Inactivity timeout (--ping-restart), restarting
            Jan 16 11:29:41 	openvpn 	98794 	SIGUSR1[soft,ping-restart] received, process restarting
            Jan 16 11:29:41 	openvpn 	22362 	[montreal424] Inactivity timeout (--ping-restart), restarting
            Jan 16 11:29:41 	openvpn 	22362 	SIGUSR1[soft,ping-restart] received, process restarting 
            

            I guess what's key is that the DHCP server for WAN stops responding, but I don't really understand why manually running the pfatt script fixes that.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, this seems odd:

              Jan 16 11:34:20 	root 	11114 	[PIA-API] Error! Failed to bind received port!
              

              What is that? Some custom script?
              Seems unlikely to be a cause but...

              S 1 Reply Last reply Reply Quote 0
              • S
                SeaMonkey @stephenw10
                last edited by

                That's this:
                https://github.com/SeaMonkey82/PIA-NextGen-PortForwarding

                I suppose I could disable it for a while just to test, but it's worked without issue for a long time.

                As I mentioned, the connection remains perfectly stable when my state table usage remains low. To verify this, I suspended my Ethereum client testing setup, which is normally the bulk of my traffic. For context, I'm testing 20 client pairs and each client is configured to connect to anywhere from 45-160 peers at a time, and all of this uses 53 exposed ports on the firewall. I noticed that even when I suspended testing, state table usage remained high until I disabled the associated NAT rule for all of these ports.

                I just checked my firewall log and there are almost 10,000 blocks of attempted connections to this port range within the past few minutes, despite it being disabled since some time yesterday. Do you suppose my whole problem is just that my connection is being DDOSed?

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, potentially. Though I can see no reason that would remove the ngeth interface entirely. I guess netgraph itself could be hitting a resource limit. that we don't normally see. 🤔

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    SeaMonkey @stephenw10
                    last edited by

                    I should have mentioned in my updated posts that I realized the ngeth0 interface disappearing was only happening when I ran the commands under the "Reset netgraph" section of the pfatt README and then failed to manually run the script to bring it back up afterwards.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Ah, I think you did mention that. Too many threads!

                      Hmm, so if it just stops passing traffic it could be something in netgraph overloaded. Though I would expect to see some log entries.

                      Can you see any debug info using ngctl status?

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        SeaMonkey @stephenw10
                        last edited by

                        ngctl status returns No status available for all <path>. Did you mean ngctl show?

                        Here's what it looks like when everything's working.

                        [2.6.0-RELEASE][root@fallia.thegalaxy]/: ngctl list
                        There are 9 total nodes:
                          Name: ix0             Type: ether           ID: 00000001   Num hooks: 0
                          Name: ix1             Type: ether           ID: 00000002   Num hooks: 1
                          Name: em0             Type: ether           ID: 00000003   Num hooks: 0
                          Name: alc0            Type: ether           ID: 00000004   Num hooks: 0
                          Name: <unnamed>       Type: socket          ID: 00000007   Num hooks: 0
                          Name: vlan0           Type: vlan            ID: 0000000d   Num hooks: 2
                          Name: ngeth0          Type: eiface          ID: 00000010   Num hooks: 1
                          Name: ngctl22422      Type: socket          ID: 00000319   Num hooks: 0
                          Name: <unnamed>       Type: socket          ID: 0000001f   Num hooks: 0
                        [2.6.0-RELEASE][root@fallia.thegalaxy]/: ngctl show ix1:
                          Name: ix1             Type: ether           ID: 00000002   Num hooks: 1
                          Local hook      Peer name       Peer type    Peer ID         Peer hook      
                          ----------      ---------       ---------    -------         ---------      
                          lower           vlan0           vlan         0000000d        downstream     
                        [2.6.0-RELEASE][root@fallia.thegalaxy]/: ngctl show vlan0:
                          Name: vlan0           Type: vlan            ID: 0000000d   Num hooks: 2
                          Local hook      Peer name       Peer type    Peer ID         Peer hook      
                          ----------      ---------       ---------    -------         ---------      
                          vlan0           ngeth0          eiface       00000010        ether          
                          downstream      ix1             ether        00000002        lower          
                        [2.6.0-RELEASE][root@fallia.thegalaxy]/: ngctl show ngeth0:
                          Name: ngeth0          Type: eiface          ID: 00000010   Num hooks: 1
                          Local hook      Peer name       Peer type    Peer ID         Peer hook      
                          ----------      ---------       ---------    -------         ---------      
                          ether           vlan0           vlan         0000000d        vlan0         
                        

                        Once I'm home again, I'll intentionally break things and see if the results are different.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, no I meant status but I'm also seeing the same output...

                          The status data might show more. If I could work out the syntax!

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.