NIC issues and advise
-
Is there nothing additionally logged in the system log when it disconnects?
You could run a pcap on the parent NIC when it fails and see if the server is responding at all.
-
Swap the two interfaces.
If the Intel was LAN, and the Realtek was WAN, make the Intel now WAN, etc.
If the issues follows the NIC, you know it's not the cable neither the equipment in front of the NIC.Also, keep in mind, when a connection becomes unstable (see the system log, the UP and DOWN events - and a realtek NIC is involved), stop whatever you are doing, throw the Realtek out of the window, get an Intel NIC, and be member of the club "should have done that earlier".
-
To be fair the Realtek 2.5G NICs seem OK so far. I haven't seen any catastrophic behaviour...yet.
-
So the disconnection issue happened again with the intel NIC and i was lucky enough to get a few minutes to look into it. Seems to be a different issue than the realtek issue.
So from logs it looks like the PPPoe connection went down and came back up and after it came back up pfsense still have internet access(tested ping to google DNS) while everything on LAN was unable to ping google DNS.
After i changed the wan interface and changed it back to the intel again everything worked as expected.
Attached are the ppp and the logs around the time from system.
-
...
Feb 14 04:10:28 pfSense php-fpm[28297]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.16.8.10 -> 10.16.8.10 - Restarting packages.
Feb 14 04:10:28 pfSense check_reload_status[425]: Starting packages
Feb 14 04:10:28 pfSense check_reload_status[425]: Reloading filter
Feb 14 04:10:29 pfSense php-fpm[386]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.199.8.3 -> 10.199.8.3 - Restarting packages.
...
within one seconds : 2 WAN IPs ?
10.16.8.10 and 10.199.8.3 ?You use a VPN ?
-
@Gertjan
yes there is a number of openvpn clients running on this machine. those connections are separate and only some traffic is routed through them. those IPs in that log are from the openvpn clientsovpnc3: flags=1008043<UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=80000<LINKSTATE> inet 10.16.8.10 netmask 0xffffff00 broadcast 10.16.8.255 inet6 fe80::20c:29ff:fe82:caec%ovpnc3 prefixlen 64 scopeid 0xe groups: tun openvpn nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Opened by PID 55805 ovpnc5: flags=1008043<UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=80000<LINKSTATE> inet 10.199.8.3 netmask 0xffffff00 broadcast 10.199.8.255 inet6 fe80::20c:29ff:fe82:caec%ovpnc5 prefixlen 64 scopeid 0xf groups: tun openvpn nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Opened by PID 71191
-
A, ok, thanks, that explains the rapid sequential IP attribution.
Your real WAN was/is...
110.79.143.248 -> 110.79.210.9-four
....and the attribution of that one looks normal.
As the logs show, when your pppoe time out, and reconnects, many (like a lot) of 'packages == system processes restart.
And because you have not one WAN IP, but 3 (the two VPNs) processes get restarted even more often and faster.No need to be a specialist to draw a simple conclusion from this :
...
Feb 14 04:10:24 pfSense tail_pfb[83338]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:24 pfSense php_pfb[84182]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:24 pfSense php-fpm[11175]: /rc.start_packages: The command '/usr/local/etc/rc.d/pfb_filter.sh stop' returned exit code '1', the output was 'kill: 15788: No such process'
Feb 14 04:10:24 pfSense php-fpm[11175]: [pfBlockerNG] Restarting firewall filter daemon
Feb 14 04:10:25 pfSense tail_pfb[89730]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[93021]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense lighttpd_pfb[97157]: [pfBlockerNG] DNSBL Webserver stopped
Feb 14 04:10:25 pfSense tail_pfb[97178]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[99329]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense tail_pfb[22]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense lighttpd_pfb[3769]: [pfBlockerNG] DNSBL Webserver started
Feb 14 04:10:25 pfSense tail_pfb[6878]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense php_pfb[2261]: [pfBlockerNG] filterlog daemon started
Feb 14 04:10:25 pfSense php[6957]: [pfBlockerNG] DNSBL parser daemon started
Feb 14 04:10:25 pfSense check_reload_status[425]: Rewriting resolv.conf
Feb 14 04:10:25 pfSense php_pfb[7747]: [pfBlockerNG] filterlog daemon started
...Note that pfSense couldn't even follow the pace : a first instance of "pfb_filter.sh" (a PHP stand alone process) was already gone before it could have been signaled to stop ...
I know, I'm not very helpful here.
Your logs by themselves show me a successful reconnect.
I advise you to check if all processes that are restarted are actually 'up and running' :
Example :dig @127.0.0.1 google.fr +short dig @192.168.1.1 google.fr +short
If 192.168.1.1 is your pfSense LAN interface.
Etc. -
Make sure you have set the default IPv4 gateway to PPPOE_GW in System > Routing > Gateways. Otherwise it may be switching to one of the VPNs as default which might not work.
-
@Gertjan said in NIC issues and advise:
A, ok, thanks, that explains the rapid sequential IP attribution.
Your real WAN was/is...
110.79.143.248 -> 110.79.210.9-four
....and the attribution of that one looks normal.
As the logs show, when your pppoe time out, and reconnects, many (like a lot) of 'packages == system processes restart.
And because you have not one WAN IP, but 3 (the two VPNs) processes get restarted even more often and faster.No need to be a specialist to draw a simple conclusion from this :
...
Feb 14 04:10:24 pfSense tail_pfb[83338]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:24 pfSense php_pfb[84182]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:24 pfSense php-fpm[11175]: /rc.start_packages: The command '/usr/local/etc/rc.d/pfb_filter.sh stop' returned exit code '1', the output was 'kill: 15788: No such process'
Feb 14 04:10:24 pfSense php-fpm[11175]: [pfBlockerNG] Restarting firewall filter daemon
Feb 14 04:10:25 pfSense tail_pfb[89730]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[93021]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense lighttpd_pfb[97157]: [pfBlockerNG] DNSBL Webserver stopped
Feb 14 04:10:25 pfSense tail_pfb[97178]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[99329]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense tail_pfb[22]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense lighttpd_pfb[3769]: [pfBlockerNG] DNSBL Webserver started
Feb 14 04:10:25 pfSense tail_pfb[6878]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense php_pfb[2261]: [pfBlockerNG] filterlog daemon started
Feb 14 04:10:25 pfSense php[6957]: [pfBlockerNG] DNSBL parser daemon started
Feb 14 04:10:25 pfSense check_reload_status[425]: Rewriting resolv.conf
Feb 14 04:10:25 pfSense php_pfb[7747]: [pfBlockerNG] filterlog daemon started
...Note that pfSense couldn't even follow the pace : a first instance of "pfb_filter.sh" (a PHP stand alone process) was already gone before it could have been signaled to stop ...
I know, I'm not very helpful here.
Your logs by themselves show me a successful reconnect.
I advise you to check if all processes that are restarted are actually 'up and running' :
Example :dig @127.0.0.1 google.fr +short dig @192.168.1.1 google.fr +short
If 192.168.1.1 is your pfSense LAN interface.
Etc.Thanks with the intel nic the last 2 nights the connection dropped out at 4am. My ISP must be resetting connections at this time or something.
From the little time i had tested all is good with connection from pfsense and its only lan hosts that can not reach the internet(no ICMP, http etc..).
I changed the wan to the realtek nic(not connected) and change it back again and it usually works but not this morning for some reason until i change System > Routing > Gateways to automatic
On previous versions of Pfsense i use to have a lot of issues with unbound if the connection was down for more than 30 seconds. restarting unbound would usually resolve the issue but it was a pain in the backside as at the time my connection was from a bridge mobile broadband router that couldnt hold a connection for more than 24 hours at a time.
@stephenw10 said in NIC issues and advise:
Make sure you have set the default IPv4 gateway to PPPOE_GW in System > Routing > Gateways. Otherwise it may be switching to one of the VPNs as default which might not work.
Thanks for the suggestion but it was always PPPOE_GW. I changed it to automatic this morning and it seemed to bring the connection back up. Ill leave it at automatic and see if the same issue happens again tonight.
There is possible to run a script or something when the PPPOE connection drops like this so services will restart after the connection comes back up?
-
Just saving that page would likely have brought it back by re-applying the default gateway. It should be set to PPPoE_GW though to avoid it trying to default to one of the VPNs.
-
@stephenw10 said in NIC issues and advise:
Just saving that page would likely have brought it back by re-applying the default gateway. It should be set to PPPoE_GW though to avoid it trying to default to one of the VPNs.
Same thing happened again last night.
your right just saving the routing page brought the connection back up.
Im going to have to figure out some way of automatically saving the routing page for me a minute or so after the PPPOE connection goes down hopefully that will save me doing it manually.
-
OK, then firstly check Diag > Routes when this next happens. Is there a valid default route shown?
If not check the system and routing logs for what happened when the PPPoE went down (and when it came back up).
-
@stephenw10 said in NIC issues and advise:
OK, then firstly check Diag > Routes when this next happens. Is there a valid default route shown?
If not check the system and routing logs for what happened when the PPPoE went down (and when it came back up).
Thank you. You were correct so when the connection went down the other night i checked Diag > Routes like you mentioned and it looked like there was a default route on the lan interface for some reason. by re-applying the default gateway the connection was brought back up.
Diag > Routes when No lan internet connectivity before re-applying the default gateway
Diag > Routes when everything working as expected.
Looking at the IP it was my wan address before and i had a script run at boot to set that IP as Pfsense would not allow me use a /32 as a wan address. The script was disabled long ago so im not sure why that address is popping up as a route.
Here is the script with the route commented out. I can remember how i ran this at boot to be honest so just commented out the route in the script when i changed ISPs a few years back.
cat configure-route.sh # # # #route add -net 62.210.109.1/32 -IFACE em1 #route add default 62.210.109.1 #
I can actually see that route listed but it is disabled and has been since i changed ISP. I just deleted it there now so hopefully that will resolve the issue unless you have any other suggestions?
-
Ah, some custom script!
Well I would try to find where that is run from and remove it if you can. There is no reason to have that present still.
-
I checked /usr/local/etc/rc.d/ and its not in there.
I must of deleted it so it no longer runs at boot. I deleted the route instead of just disabled it from system default gateway so hopefully that will do the trick. Also the script had all the commands commented out so even if it does run it doesnt actually add the route.
Hopefully this will resolve it but ill find out soon enough i suppose.
-
it happened again and the 62.210 route was in dia > status even though it was deleted from system > routing.
i am greping through the file system to see where i can find the script that runs adding the route but i am certain that script only exists in one location and the route commands are commented out. im also nearly certain this script only runs at boot so not sure why it pops up after connection goes down.
other than /usr/local/etc/rc.d/ can you think of anywhere else a script/command can be ran at boot?
-
It could be in a shellcmd:
https://docs.netgate.com/pfsense/en/latest/development/boot-commands.html?highlight=shellcmd#shellcmd-option -
@stephenw10 said in NIC issues and advise:
It could be in a shellcmd:
https://docs.netgate.com/pfsense/en/latest/development/boot-commands.html?highlight=shellcmd#shellcmd-optionthanks i managed to find it and remove it.
issue has not happened in a week or so now.
Next step is to figure out the realtek nic issue.
-
-