Please help to debug a network connection issue
-
I have an Ubuntu-server 24.04.1 on a Dell PowerEdge T30 and each few days I lost connection to it (connection from my laptop by ssh and my phone by sftp). The Ubuntu-server computer is connected by Ethernet 5E cable to pfSense 4port nic card (replaced the cable twice), no vlan or other switch.
When I could not ssh to the Ubuntu-server (sftp is failing too from the phone), I have tried next steps:
- ping from Ubuntu-server to pfSense ip- works;
- ping from pfSense (source address automatically selected) to Ubuntu-server ip - works
(source address the interface of Ubuntu-server) - works
(source address other interface my laptop and phone is) - doesn't work - ping from my laptop - doesn't work
- the Ubuntu-server /var/log/syslog contains
2024-09-30T11:00:12.377806+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down 2024-09-30T11:00:12.386243+00:00 t30 systemd-networkd[24623]: enp0s31f6: Lost carrier 2024-09-30T11:00:12.386490+00:00 t30 dbus-daemon[1594]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.356' (uid=998 pid=24623 comm="/usr/lib/systemd/systemd-networkd" label="unconfined") 2024-09-30T11:00:12.386563+00:00 t30 systemd-networkd[24623]: enp0s31f6: DHCP lease lost 2024-09-30T11:00:12.391230+00:00 t30 systemd-networkd[24623]: enp0s31f6: DHCPv6 lease lost 2024-09-30T11:00:12.396213+00:00 t30 systemd[1]: Starting systemd-hostnamed.service - Hostname Service... 2024-09-30T11:00:12.448404+00:00 t30 dbus-daemon[1594]: [system] Successfully activated service 'org.freedesktop.hostname1' 2024-09-30T11:00:12.448569+00:00 t30 systemd[1]: Started systemd-hostnamed.service - Hostname Service. 2024-09-30T11:00:12.452512+00:00 t30 systemd-hostnamed[55495]: Hostname set to <t30> (static) 2024-09-30T11:00:16.800670+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None 2024-09-30T11:00:16.800709+00:00 t30 systemd-networkd[24623]: enp0s31f6: Gained carrier 2024-09-30T11:00:16.808251+00:00 t30 systemd-networkd[24623]: enp0s31f6: Could not set route: Nexthop has invalid gateway. Network is unreachable 2024-09-30T11:00:16.808346+00:00 t30 systemd-networkd[24623]: enp0s31f6: Failed 2024-09-30T11:00:16.808631+00:00 t30 systemd-timesyncd[1513]: Network configuration changed, trying to establish connection. 2024-09-30T11:00:16.825560+00:00 t30 systemd-timesyncd[1513]: Network configuration changed, trying to establish connection. 2024-09-30T11:00:17.623794+00:00 t30 nut-monitor[1696]: Poll UPS [ups-4@10.10.52.20] failed - Server disconnected 2024-09-30T11:00:17.624077+00:00 t30 nut-monitor[1696]: Communications with UPS ups-4@10.10.52.20 lost 2024-09-30T11:00:17.889744+00:00 t30 nut-monitor[55497]: Network UPS Tools upsmon 2.8.1 2024-09-30T11:00:19.624554+00:00 t30 nut-monitor[1696]: UPS [ups-4@10.10.52.20]: connect failed: Connection failure: Network is unreachable
- replaced twice the network cable between Ubuntu-server and pfSense (pfSense has a 4 port nic card)
Only way to regain connection is to restart the network by running ```
systemctl restart systemd-networkdWhat I have found that during cable replacing the connection was lost exactly the same way. Is this a hardware issue? I have run the same computer with Ubuntu desktop (22.04) before as plex media server for years and didn't had any network issues. Why pfSense allow to ping from some interfaces and fails from my laptop one (firewall rule for pass ping is present)? Any suggestion how to debug further is much appreciated. Thanks
-
@ady2 said in Please help to debug a network connection issue:
2024-09-30T11:00:12.377806+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
That message means that the 'electrical' connection is lost. The NIC couldn't 'see' the NIC on the other side of the cable anymore.
When the connection goes down, the IP is lost - or more accurate : the 'DHCP lease' is lost (== gateway, network, DNS, etc). Like you ripped out the cable.
Again :
NIC Link is Down
Was this a mechanical or electrical reason ?
Or was it the system itself that took the NIC down because no traffic was going in or out .... because', for, example, the switch on the other side has issues ?A solution would be : change the NIC in the "Ubuntu-serve", get another cable, and change the switch or switch port on the other side and you've excluded all hardware issues.
-
Yup it shows it actually lost link. If that wasn't you it's a problem.
Can you assign a different NIC to it in pfSense? At the server end?
Or, yes, try putting a switch in between them as a test.
-
@Gertjan said in Please help to debug a network connection issue:
@ady2 said in Please help to debug a network connection issue:
2024-09-30T11:00:12.377806+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
That message means that the 'electrical' connection is lost. The NIC couldn't 'see' the NIC on the other side of the cable anymore.
When the connection goes down, the IP is lost - or more accurate : the 'DHCP lease' is lost (== gateway, network, DNS, etc). Like you ripped out the cable.
Again :
NIC Link is Down
Was this a mechanical or electrical reason ?
Or was it the system itself that took the NIC down because no traffic was going in or out .... because', for, example, the switch on the other side has issues ?A solution would be : change the NIC in the "Ubuntu-serve", get another cable, and change the switch or switch port on the other side and you've excluded all hardware issues.
@stephenw10
The NIC on the Ubuntu-server machine is on the motherboard and is connected with Ethernet cable directly to pfSense computer nic port.
Regarding the mechanical or electrical, I don't know, the connection is lost without any intervention, the cable still connected. I have checked and the Ubuntu-server looks like don't have any sleep or other option to save power by default. I don't know if the no traffic should cause the NIC Link down, but I think it should not for a server that is waiting for clients to connect, maybe I'm wrong.
I also forget to mention, that I have changed to a different port on the pfSense computer (it has a nic card with 4 ports) but it didn't helped. Cable changed once and it happened again after ~ 2 days, so changed again and will see if that will help. -
@stephenw10 said in Please help to debug a network connection issue:
Yup it shows it actually lost link. If that wasn't you it's a problem.
Can you assign a different NIC to it in pfSense? At the server end?
Or, yes, try putting a switch in between them as a test.
Yes, I tried this (forget to mention) by assigning different port on pfSense NIC card, as it has 4 ports, but it didn't helped.
Why do you think adding a switch could help, as that means adding an additional piece of hardware into the chain? Will look to add a switch in between Ubuntu-server computer and pfSense. -
What I don't understand why I could ping from pfSense (source address automatically selected) and from ubuntu-server computer to pfsense ip, when the
NIC Link is Down
If the DHCP ip address lease is lost, doesn't that also means the ping to that address shouldn't work?
-
plugging the PC into pfSense will work but be aware if the PC restarts or shuts down or turns on pfSense sees that as an interface going down/up and restarts packages.
Which interface is enp0s31f6?
-
@SteveITS said in Please help to debug a network connection issue:
plugging the PC into pfSense will work but be aware if the PC restarts or shuts down or turns on pfSense sees that as an interface going down/up and restarts packages.
Which interface is enp0s31f6?
@SteveITS
The enp0s31f6 as well as the logs I posted are from ubuntu-server computer.Don't quite understand/know what are the negative consequences of having a computer connected directly to pfSense instead of going through a switch in between. In my particular case the ubuntu-server is only computer connected to that interface as a compartmentalization.
-
When directly connected, when pfSEnse goes down (reboot) you see this on your ubuntu server :
@ady2 said in Please help to debug a network connection issue:
2024-09-30T11:00:12.377806+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
and in that case, there is no issue, as when you shut down the network, the network (== ubuntu interface) will be shut down.
And the other way around : when ubuntu shuts down, the connected LAN interface on pfSense will taken down.
-
@Gertjan said in Please help to debug a network connection issue:
When directly connected, when pfSEnse goes down (reboot) you see this on your ubuntu server :
@ady2 said in Please help to debug a network connection issue:
2024-09-30T11:00:12.377806+00:00 t30 kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
and in that case, there is no issue, as when you shut down the network, the network (== ubuntu interface) will be shut down.
And the other way around : when ubuntu shuts down, the connected LAN interface on pfSense will taken down.
@Gertjan
Good point.
The time theNIC Link is Down
matches when my pfSense restarted, so that is expected.
The problem looks like is in the ubuntu-server, when NIC is UP in a few seconds after going Down (much faster than my pfSense rebooting time) and the network never comes back after that.I could create a small bash script to be run to restart the network on ubuntu-server as a work around.
But who could explain why the ping is working from pfSense to ubuntu-server and vice-versa if the network is not working? Is that a glitch? I always trusted ping as a way to test network connection, but this is the first time when ping is working and network is not.
-
Its normal that the pfSense NIC comes up pretty fast, as it activates as soon as the driver is loaded and initialized the hardware.
The thing that will take some time, and you can see this very clearly happening when you follow the pfSense boot process : the DHCP server process on any given LAN type interface will be activated somewhat later.Or, as soon as the interface comes up on the unbuntu side, it will kick-off a DHCP client process and it will start to requests for a DHCP lease.
If there wasn't an answer yet, it will add a small delay, and request again, and if no answer, it will double the delay, and request again.
And so on.
This means that even if it takes 30 seconds or a minute, or even more, the DHCP client will get a lease.
This concept is on billions of devices ... every day.@ady2 said in Please help to debug a network connection issue:
But who could explain why the ping is working from pfSense to ubuntu-server and vice-versa if the network is not working? Is that a glitch? I always trusted ping as a way to test network connection, but this is the first time when ping is working and network is not.
'ping' needs the IP network to be up as ARP needs to work.
Device should have a IP setup on both sides, static or DHCP.
Next time, when you see the situation, run a global packet capture on your Ubuntu device, and you should see the ICMP packets coming in. -
If the link is actually down then ping cannot work.
So either it wasn't down and that log is incorrect or the pings you were seeing were misleading, like something else replying perhaps.
Putting a switch in between two devices like that as a test allows one side only to lose link without affecting the other one. Thus if one device has a problem you can find out which one.
If it's a link negotiation issue it may also negate the problem which is also useful troubleshooting info.But here it looks like that log was caused by rebooting pfSense?
-
@stephenw10 said in Please help to debug a network connection issue:
If the link is actually down then ping cannot work.
So either it wasn't down and that log is incorrect or the pings you were seeing were misleading, like something else replying perhaps.
Putting a switch in between two devices like that as a test allows one side only to lose link without affecting the other one. Thus if one device has a problem you can find out which one.
If it's a link negotiation issue it may also negate the problem which is also useful troubleshooting info.But here it looks like that log was caused by rebooting pfSense?
I don't know what happens here, as after the pfSense is restarting (checked today by restarting pfSense computer) the ping and ssh from my laptop (that is on a different interface than ubuntu -server) to ubuntu-server computer is not working anymore till restart the network on ubuntu-server computer. At the same time the ping from ubuntu-server to pfSense, and from pfSense (source address automatically selected) to ubuntu-server works, but when selecting in pfsense the interface my laptop is on, the ping doesn't work.
After restarting the ubuntu-server network, everything works again as expected.I added a switch between ubuntu-server and pfSense and now restarting the pfSense doesn't impact ping and ssh anymore (after pfSense reboot finish).
The issue is solved and I really appreciate all the help I received. I did not know about the directo connection vs using a switch between a client and the pfSense. Learned something new.Thanks
-
That sounds like the server is blocking those pings from outside it's subnet.
You can confirm that by running a pcap on the interface connected to the server in pfSense whilst pinging from the laptop.