TNSR in proxmox droping connectivity
-
Hi! I've been able to install tnsr without difficulty on my proxmox cluster, but I've run into an issue. The system appears to work great for several hours, and then simply stops forwarding traffic. I don't see any errors, nor do the interfaces go down. It just...stops. Has anyone seen something similar to this, or is there something obvious that I'm missing because I'm new to the platform?
-
@alan-jones Really difficult to say based on the information provided. Is there anything interesting in /var/log/messages at the time it appears to stop passing traffic?
-
@derelict I know this isn't helpful, but unfortunately that's exactly my problem. The system appears completely fine. No errors or anything. It just stops altogether. It isn't even able to resolve MAC addresses from the data plane. I'll pull the /var/log/messages tonight to double check.
-
@derelict finally got back around to this. The following appears in /var/log/messages when the system stops forwarding traffic:
Jan 14 17:14:38 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:14:43 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:14:50 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:14:53 router01 vnet[1227]: linux-cp/router: Failed to delete neighbor: 172.16.40.1 WAN Jan 14 17:14:57 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:15:00 router01 vnet[1227]: linux-cp/router: Failed to delete neighbor: 172.16.40.1 WAN Jan 14 17:15:16 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:15:19 router01 vnet[1227]: linux-cp/router: Failed to delete neighbor: 172.16.40.1 WAN Jan 14 17:15:33 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47) Jan 14 17:15:36 router01 vnet[1227]: linux-cp/router: Failed to delete neighbor: 172.16.40.1 WAN Jan 14 17:15:54 router01 dhclient[1619]: DHCPREQUEST on vpp1 to 172.16.40.1 port 67 (xid=0x1a4d3f47)
-
@derelict definitely an arp issue. The WAN interface has all the DHCP attained information, but "show neighbor" shows no WAN addresses.
-
@alan-jones That does not look like it has received a DHCP response. Did you add ACLs? DHCP responses need to be passed if so.
I would start with a statically-configured WAN and move to DHCP.
-
@derelict no ACLs, just routing for now. I tried a static IP and a static MAC entry for the gateway. Neither worked. It's like the network dropped out from under it. This VM is connected to an Open vSwitch bridge, could that be the issue?
-
@alan-jones Seems like no Layer 2 connectivity to me.
-
@derelict I concur, but I can't find anything to indicate that and other VMs on the same host are not having L2 connectivity issues. I'll continue to troubleshoot.
-
@derelict for what it's worth, I swapped to VMXNET3 interfaces and now its completely stable. So weird.
-
@alan-jones That is strange. Everyone here uses virtio.
-
@derelict yeah the NICs are igb and e1000 so they're supported by DPDK. Dunno...
-
@alan-jones That means nothing unless you are passing them through directly. Tnsr only sees virtio or vmxnet3. The underlying hardware is obfuscated.