NIC issues and advise
-
after install the Realtek NIC i can can get it working with the same settings as intel NIC. but if i do something like a speedtest the connection drops.
It looks like the connection is stopped from the wan side. Maybe they dont like my NIC?
Symptoms are the exact same as the drop out i mentioned in first post but in this case i can switch the NIC to use from WAN from realtek to Intel and it will work again. If i leave it as realtek it wont work.
attached are the ppp logs when realtek connected and when intel is connected.
ppp-logs.zipShould i be changing the frame size maybe or any suggestions?
-
Hmm, so it just stops responding on the server side after 2mins? Is it possible you have some other client trying to login with the same credentials somewhere?
Do you see the link physically go down at any point or is it just the ppp session that fails?
-
@stephenw10 said in NIC issues and advise:
Hmm, so it just stops responding on the server side after 2mins? Is it possible you have some other client trying to login with the same credentials somewhere?
Do you see the link physically go down at any point or is it just the ppp session that fails?
The link shows flashing and it show up in the UI.
No there is definitely nothing else connected to the wan port. The wan cable come from the ISP huawei module that you can see here https://forum.netgate.com/post/1125275
The credentials dont actually matter here as i found that even with gibberish entered the connection still comes up.
Can you suggest any other logs to check or any other suggestions?
-
I have seen PPPoE providers that will connect you to a test account with the wrong credentials but you still need to real login to get the correct bandwidth. And perhaps other restrictions.
Can you see what login the Huawei router was using?
-
I should be more clear here its not a hauwei router it the fiber break point. have a look at the screenshot i linked to in my last post.
I have been using the intel NIC for months without much issue for WAN and i am using the exact same credentials for the realtek NIC.
-
Hmm, hard to say then. I could imagine the NIC losing link for some reason but that would be logged. Perhaps the ONT is trying to negotiate some N-base rate and the NIC falls back to 1G?
-
@stephenw10 said in NIC issues and advise:
Hmm, hard to say then. I could imagine the NIC losing link for some reason but that would be logged. Perhaps the ONT is trying to negotiate some N-base rate and the NIC falls back to 1G?
If that was the case would it not be logged and would the NIC not be able to fallback to 1G?
As i said something similar happens with the Intel NIC but its rare.
Can you think of anything else i can do to debug whats going on. My ISP support is absolutely useless so i am not even going to try and waste time contacting them.
-
Is there nothing additionally logged in the system log when it disconnects?
You could run a pcap on the parent NIC when it fails and see if the server is responding at all.
-
Swap the two interfaces.
If the Intel was LAN, and the Realtek was WAN, make the Intel now WAN, etc.
If the issues follows the NIC, you know it's not the cable neither the equipment in front of the NIC.Also, keep in mind, when a connection becomes unstable (see the system log, the UP and DOWN events - and a realtek NIC is involved), stop whatever you are doing, throw the Realtek out of the window, get an Intel NIC, and be member of the club "should have done that earlier".
-
To be fair the Realtek 2.5G NICs seem OK so far. I haven't seen any catastrophic behaviour...yet.
-
So the disconnection issue happened again with the intel NIC and i was lucky enough to get a few minutes to look into it. Seems to be a different issue than the realtek issue.
So from logs it looks like the PPPoe connection went down and came back up and after it came back up pfsense still have internet access(tested ping to google DNS) while everything on LAN was unable to ping google DNS.
After i changed the wan interface and changed it back to the intel again everything worked as expected.
Attached are the ppp and the logs around the time from system.
-
...
Feb 14 04:10:28 pfSense php-fpm[28297]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.16.8.10 -> 10.16.8.10 - Restarting packages.
Feb 14 04:10:28 pfSense check_reload_status[425]: Starting packages
Feb 14 04:10:28 pfSense check_reload_status[425]: Reloading filter
Feb 14 04:10:29 pfSense php-fpm[386]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.199.8.3 -> 10.199.8.3 - Restarting packages.
...
within one seconds : 2 WAN IPs ?
10.16.8.10 and 10.199.8.3 ?You use a VPN ?
-
@Gertjan
yes there is a number of openvpn clients running on this machine. those connections are separate and only some traffic is routed through them. those IPs in that log are from the openvpn clientsovpnc3: flags=1008043<UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=80000<LINKSTATE> inet 10.16.8.10 netmask 0xffffff00 broadcast 10.16.8.255 inet6 fe80::20c:29ff:fe82:caec%ovpnc3 prefixlen 64 scopeid 0xe groups: tun openvpn nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Opened by PID 55805 ovpnc5: flags=1008043<UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=80000<LINKSTATE> inet 10.199.8.3 netmask 0xffffff00 broadcast 10.199.8.255 inet6 fe80::20c:29ff:fe82:caec%ovpnc5 prefixlen 64 scopeid 0xf groups: tun openvpn nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Opened by PID 71191
-
A, ok, thanks, that explains the rapid sequential IP attribution.
Your real WAN was/is...
110.79.143.248 -> 110.79.210.9-four
....and the attribution of that one looks normal.
As the logs show, when your pppoe time out, and reconnects, many (like a lot) of 'packages == system processes restart.
And because you have not one WAN IP, but 3 (the two VPNs) processes get restarted even more often and faster.No need to be a specialist to draw a simple conclusion from this :
...
Feb 14 04:10:24 pfSense tail_pfb[83338]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:24 pfSense php_pfb[84182]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:24 pfSense php-fpm[11175]: /rc.start_packages: The command '/usr/local/etc/rc.d/pfb_filter.sh stop' returned exit code '1', the output was 'kill: 15788: No such process'
Feb 14 04:10:24 pfSense php-fpm[11175]: [pfBlockerNG] Restarting firewall filter daemon
Feb 14 04:10:25 pfSense tail_pfb[89730]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[93021]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense lighttpd_pfb[97157]: [pfBlockerNG] DNSBL Webserver stopped
Feb 14 04:10:25 pfSense tail_pfb[97178]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[99329]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense tail_pfb[22]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense lighttpd_pfb[3769]: [pfBlockerNG] DNSBL Webserver started
Feb 14 04:10:25 pfSense tail_pfb[6878]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense php_pfb[2261]: [pfBlockerNG] filterlog daemon started
Feb 14 04:10:25 pfSense php[6957]: [pfBlockerNG] DNSBL parser daemon started
Feb 14 04:10:25 pfSense check_reload_status[425]: Rewriting resolv.conf
Feb 14 04:10:25 pfSense php_pfb[7747]: [pfBlockerNG] filterlog daemon started
...Note that pfSense couldn't even follow the pace : a first instance of "pfb_filter.sh" (a PHP stand alone process) was already gone before it could have been signaled to stop ...
I know, I'm not very helpful here.
Your logs by themselves show me a successful reconnect.
I advise you to check if all processes that are restarted are actually 'up and running' :
Example :dig @127.0.0.1 google.fr +short dig @192.168.1.1 google.fr +short
If 192.168.1.1 is your pfSense LAN interface.
Etc. -
Make sure you have set the default IPv4 gateway to PPPOE_GW in System > Routing > Gateways. Otherwise it may be switching to one of the VPNs as default which might not work.
-
@Gertjan said in NIC issues and advise:
A, ok, thanks, that explains the rapid sequential IP attribution.
Your real WAN was/is...
110.79.143.248 -> 110.79.210.9-four
....and the attribution of that one looks normal.
As the logs show, when your pppoe time out, and reconnects, many (like a lot) of 'packages == system processes restart.
And because you have not one WAN IP, but 3 (the two VPNs) processes get restarted even more often and faster.No need to be a specialist to draw a simple conclusion from this :
...
Feb 14 04:10:24 pfSense tail_pfb[83338]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:24 pfSense php_pfb[84182]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:24 pfSense php-fpm[11175]: /rc.start_packages: The command '/usr/local/etc/rc.d/pfb_filter.sh stop' returned exit code '1', the output was 'kill: 15788: No such process'
Feb 14 04:10:24 pfSense php-fpm[11175]: [pfBlockerNG] Restarting firewall filter daemon
Feb 14 04:10:25 pfSense tail_pfb[89730]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[93021]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense lighttpd_pfb[97157]: [pfBlockerNG] DNSBL Webserver stopped
Feb 14 04:10:25 pfSense tail_pfb[97178]: [pfBlockerNG] Firewall Filter Service stopped
Feb 14 04:10:25 pfSense php_pfb[99329]: [pfBlockerNG] filterlog daemon stopped
Feb 14 04:10:25 pfSense tail_pfb[22]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense lighttpd_pfb[3769]: [pfBlockerNG] DNSBL Webserver started
Feb 14 04:10:25 pfSense tail_pfb[6878]: [pfBlockerNG] Firewall Filter Service started
Feb 14 04:10:25 pfSense php_pfb[2261]: [pfBlockerNG] filterlog daemon started
Feb 14 04:10:25 pfSense php[6957]: [pfBlockerNG] DNSBL parser daemon started
Feb 14 04:10:25 pfSense check_reload_status[425]: Rewriting resolv.conf
Feb 14 04:10:25 pfSense php_pfb[7747]: [pfBlockerNG] filterlog daemon started
...Note that pfSense couldn't even follow the pace : a first instance of "pfb_filter.sh" (a PHP stand alone process) was already gone before it could have been signaled to stop ...
I know, I'm not very helpful here.
Your logs by themselves show me a successful reconnect.
I advise you to check if all processes that are restarted are actually 'up and running' :
Example :dig @127.0.0.1 google.fr +short dig @192.168.1.1 google.fr +short
If 192.168.1.1 is your pfSense LAN interface.
Etc.Thanks with the intel nic the last 2 nights the connection dropped out at 4am. My ISP must be resetting connections at this time or something.
From the little time i had tested all is good with connection from pfsense and its only lan hosts that can not reach the internet(no ICMP, http etc..).
I changed the wan to the realtek nic(not connected) and change it back again and it usually works but not this morning for some reason until i change System > Routing > Gateways to automatic
On previous versions of Pfsense i use to have a lot of issues with unbound if the connection was down for more than 30 seconds. restarting unbound would usually resolve the issue but it was a pain in the backside as at the time my connection was from a bridge mobile broadband router that couldnt hold a connection for more than 24 hours at a time.
@stephenw10 said in NIC issues and advise:
Make sure you have set the default IPv4 gateway to PPPOE_GW in System > Routing > Gateways. Otherwise it may be switching to one of the VPNs as default which might not work.
Thanks for the suggestion but it was always PPPOE_GW. I changed it to automatic this morning and it seemed to bring the connection back up. Ill leave it at automatic and see if the same issue happens again tonight.
There is possible to run a script or something when the PPPOE connection drops like this so services will restart after the connection comes back up?
-
Just saving that page would likely have brought it back by re-applying the default gateway. It should be set to PPPoE_GW though to avoid it trying to default to one of the VPNs.
-
@stephenw10 said in NIC issues and advise:
Just saving that page would likely have brought it back by re-applying the default gateway. It should be set to PPPoE_GW though to avoid it trying to default to one of the VPNs.
Same thing happened again last night.
your right just saving the routing page brought the connection back up.
Im going to have to figure out some way of automatically saving the routing page for me a minute or so after the PPPOE connection goes down hopefully that will save me doing it manually.
-
OK, then firstly check Diag > Routes when this next happens. Is there a valid default route shown?
If not check the system and routing logs for what happened when the PPPoE went down (and when it came back up).
-
@stephenw10 said in NIC issues and advise:
OK, then firstly check Diag > Routes when this next happens. Is there a valid default route shown?
If not check the system and routing logs for what happened when the PPPoE went down (and when it came back up).
Thank you. You were correct so when the connection went down the other night i checked Diag > Routes like you mentioned and it looked like there was a default route on the lan interface for some reason. by re-applying the default gateway the connection was brought back up.
Diag > Routes when No lan internet connectivity before re-applying the default gateway
Diag > Routes when everything working as expected.
Looking at the IP it was my wan address before and i had a script run at boot to set that IP as Pfsense would not allow me use a /32 as a wan address. The script was disabled long ago so im not sure why that address is popping up as a route.
Here is the script with the route commented out. I can remember how i ran this at boot to be honest so just commented out the route in the script when i changed ISPs a few years back.
cat configure-route.sh # # # #route add -net 62.210.109.1/32 -IFACE em1 #route add default 62.210.109.1 #
I can actually see that route listed but it is disabled and has been since i changed ISP. I just deleted it there now so hopefully that will resolve the issue unless you have any other suggestions?