LAN interface crashes after 2.3 upgrade
-
I've noticed a very strange, but reproducible problem after upgrading to 2.3. I have a site-to-site IKEv2 IPSEC tunnel between my house and my office. At home, I upgraded to 2.3 release, while the office is still running 2.2.6. I have a SIP desk phone at my house that connects through the VPN to an Asterisk server at the office. Everything works fine until I get on a call with the SIP phone. After a couple of minutes on a call I lose internet connectivity. I can hear the other party for a few seconds after I notice the loss in connectivity (RTP UDP packets are still getting to the phone at for at least 5 seconds after the connection drops). Initially I thought it was a kernel panic, so I hooked my laptop up to the serial console, but found that the system was still responsive after the loss in Internet connectivity. From the shell I can still access resources on the WAN interface (ping 8.8.8.8, ping google.com), but anything on the LAN side is unresponsive. Restarting the box fixes the problem, but the next time I get on a call it happens again.
I was out of town over the weekend, therefore not making any phone calls, and the firewall stayed up the whole time (approximately 3 days). I was able to RDP into my desktop at home, and access other various resources without issue. I also tested transferring large files via SMB over the VPN, and that seems to work fine. So, on the surface at least, it doesn't seem like there's an issue elsewhere that is just aggravated by the phone call over the VPN tunnel.
The box in question is a APU1D4, with 20GB of mSATA storage. I'm running the full version (non nanobsd), with a serial console. I'm sure I'll need to provide more detailed logs/info/etc… but at this point I'm just not sure where to start.
Thanks in advance!
-
I'm going to send you a PM with an alternate kernel to try that disables netmap. It's starting to look like that symptom is caused by something to do with netmap (which most everyone isn't using anyway), though I haven't yet gotten enough feedback to determine that definitively.
-
I've experienced same thing - last sunday I rolled out 8 upgrades to different places/customers. What happens with 3 of my installations is that the LAN interface craps out between once every 2 days to 2-3 times/day. The funny thing is - all the virtual installations (some ESXi's and one virtualbox) keep running fine only the physical hosts that go bad it seems in my case.
Definition of 'craps out' is - stops responding to ping/http/https on lan interface. Although WAN/DMZ interfaces keeps passing traffic like nothing happened. On the physical host I simply login and choose reboot and hope to run for a few days again.
Any help would be much appreciated. :-)
-
Thanks to thx2000, I have a replicable test case for this problem now. We're working on tracking down the root cause.
-
Great - Happy to hear so. I'll be rolling back to previous version for now (tired of being called to restarts ind the middle of the night) and hope thx2000 would be a dear and post whether his problem got solved, which would make me confident to try upgrading again. :-)
Keep up the good work you guys. :D - have a great weekend.
-
Same hardware (APU), also IPSEC from pfsense2.3 to office (Netscreen in that case) and same symptom. Box is there. WAN is active. LAN is simply not doing anything.
ifconfig igb2 down
ifconfig igb2 upfixes the issue as well. However this was the case right after a desperate reboot!
Keep us posted. Just upgraded to the latest 2.3.1 snapshot (and after the reboot to that igb2 was dead until the down/up). Let's see what happens. No other kernel yet (since I do not have access and others report it does not change the problem).
Regards,
JP -
We've tracked down the issue to likely being caused by SMP. thx2000 confirmed disabling the second core on his system makes the problem stop happening. I posted instructions for disabling additional cores here:
https://forum.pfsense.org/index.php?topic=110710.msg618388#msg618388for the few who hit it repeatedly, that seems to be a viable immediate workaround while we track down the exact root cause and get it fixed.
-
Has there been any more findings on this issue?
I have the same problem, but are running pfsense 3.2.3 on a watchguard firebox with only one CPU core. So no extra cores to disable there…
In my case I have a MineCraft server running, and it only crashes while playing mine craft. The symptoms are same as for thx2000. When connected with console cable I can ping stuff and my WAN interface seems fine. LAN interface on the other hand is completley dead. I can get LAN up and running by bringing the interface down and up again, but then the link between WAN and LAN seems to be broken. I can access webinterface and ping computers on the network but no internet access.
Would be really nice to solve this but I don't really know what to try next :p
-
This specific issue was fixed long ago. If you have what appears to be a similar issue on 2.3.2, it's unlikely to be this. Start a fresh thread with as much detail as possible about your config, hardware, network, and so on.