Connection loss after rebooting machines
-
I've added all hosts as static leases to de DHCP server, so there isn't any conflicting mac or ip address.
That doesn't guarantee no conflicting MACs or IPs, none will be assigned by the firewall, but it's possible to configure something else as such. You didn't answer my question, when there is a problem before you flush the ARP cache, is the real actual MAC in there? If so, and just flushing the ARP cache at which point it gets ARP exactly the same as it did previously, it sounds like a vswitch issue where somehow it gets kicked back into reality by an ARP request. Doing a packet capture on the affected NIC from the firewall will confirm or deny that. It sees what traffic is "on the wire" (not quite a wire in this case, but what the vswitch sends to the NIC, before all processing). If you don't see the traffic coming in, then it's a vswitch issue.
-
Eventhough it does indeed not guarantee conflicting MAC addresses. Pfsense does not allow entering the same MAC twice and "all" hosts are being supplied with there IP by pfsense and they "all" have connectiviy with the pfsense box.
Didn't notice any conflicting addresses in the arp table, not sure this is possible though.I'll test your question tonight (The cron job currently keeps the issue from happening), but as far as I can tell from previous tests, I did not notice any change in the arp table row from the affected host besides a new timeout.
But is does however has connectivity on system reboot (which seems to result in the same issue in a predictable and repeatable manner), pinging the pfsense box from within the affected host does work, it gets an arp entry even though the system itself isn't reachable or able to connect to anything outside it's network anymore untill pfsense gets forced to delete it's arp entries.
I'll check for any inconsistencies in the arp table tonight when I reboot a machine after disabling the cache flush cron job.
-
Checked it and it's exactly as I wrote in my previous post. All the arp entries remain the same. Nothing seems to change on the pfsense side.
-
Ok that should rule out IP conflict or proxy ARP gone mad, so the important question now is does tcpdump show the traffic coming into that NIC on the firewall when it's not working? My guess is no, and somehow frequent ARP requests fix the vswitch.
-
Sorry for the somewhat late reaction, this issue has become a lower priority since it basically works now.
I will check what happens with tcpdump, will report back. -
I've been testing with tcpdump. Starting a machine after pfsense started (and no ARP flush has been issued) within one of the virtual networks does not show any traffic in ttcpdump, but does get an IP (from pfsense) and is able to ping the pfsense box even though the packages don't show up in tcpdump. DNS queries don't show up either and are getting refused (Standard reply when the DNS server is not available?).
Machines running before a manual ARP flush do show traffic when pinging the pfsense box and can query the DNS just fine.
Some weird stuff is happening here!
I can't imagine no one else is experiencing this. There must be more people running pfSense in a KVM environment.Side note: This does happen randomly with running machines as well, testing this is quite cumbersome though.
I've been experimenting with STP configurations, but this did not change anything besides flooding the tcpdump with STP change packages (Every 2 seconds).
-
If you're getting ping replies, and not seeing the request and reply in tcpdump, that goes back to my earlier suggestion of IP conflict. Something is responding to those pings, and if it's not in tcpdump there, it has to be some other device assuming you're capturing on the right NIC and not filtering out that traffic.
-
Your pesistent thoughts about conflicting IP's and the missing tcpdump entries made me thinking, and I decided to do something radical, kill every running machine within the DMZ's and stop all services for the night.
I've turned of every virtual machines but one while pinging the pfsense ip address from that box.
It stopped…. but it suddenly came back and continued having a reply.
You were right! Something did reply even though nothing was running anymore. After searching what the hell this could be I found the problem.
The virtual network uses the first available IP address for itself and that IP was assigned to pfsense as well.
Nothing in the configuration in the virt-manager indicates the virtual network itself uses an address at all, but after digging through the config files on the server itself the suspecting IP's popped up.I feel like a complete moron ditching your first suggestion as it was right all along.
I can't thank you enough, I've spent countless hours trying to figure out what caused this problem and it was right in my face the hole time. I've even looked at pfsense alternatives concluding every time it didn't compete.Again cmb THANKS a million!!! ;D
-
I feel like a complete moron ditching your first suggestion as it was right all along.
I can't thank you enough, I've spent countless hours trying to figure out what caused this problem and it was right in my face the hole time.I've seen about everything there is to see with this kind of stuff countless times, people would be a lot better off if they just believed me. ;D At least you fessed up to it, thanks for the follow up.
Glad you found and fixed it. And that I was right. ;)
-
@cmb:
I've seen about everything there is to see with this kind of stuff countless times, people would be a lot better off if they just believed me. ;D At least you fessed up to it, thanks for the follow up.
Glad you found and fixed it. And that I was right. ;)
No need to rub it in though.. ;)