Issues with 2.0-RC3



  • I cannot seem to narrow this down. I've been running version 1.2.3 for a very long time now. I decided to give 2.0 AMD64 a shot now that's it's at RC3 status. I really like the new version but it seems to be giving me some trouble on a fresh new install (I didn't bring config over). It appears that everything runs fine and then all of a sudden the machines receive a "no internet connection" for the status. At this point I cannot access the internet and I cannot repair (release/renew) the connection. I also cannot reach the internal IP address of the box @ 192.168.1.1. If I wait several minutes it apears to come back and all the Windows machines go back to a "we have internet" status.

    I can't seem to narrow this down and it's driving me nuts! This is running on all good hardware and Intel NICs. Is anyone else having this problem? Could it be specific to AMD64 version?

    Help! Thanks!



  • Have you checked the pfSense logs (web GUI: Status -> System Logs) for 'unexpected' events at around the same time you lost internet access?

    Have you checked the Status -> RRD Graphs for spikes in CPU usage or memory use or process count at around the time you lost internet access?

    Do you have any packages installed?



  • Check this thread out and see if if you are having the MFUF issues that occurs with certain NICs:

    http://forum.pfsense.org/index.php/topic,37754.msg194854.html#msg194854



  • I have checked the pfSense logs as well at the RRD graphs to try and track down the cause and I'm not finding a single reason. I do no have any packages installed on this router currently, it's just a fresh 2.0 RC3 AMD64 install without any customizations. It's also got 4Gig of memory running Intel "EXPI9402PT" Dual Port NIC (quality NIC).

    Is the MBUF issue you describe a problem only with the 64-bit install? It's been running for 6 months on 1.2.3 without an issue and I've also have 2.0 RC1 AMD64 installed without this problem. It appears to be something introduced between RC2 and RC3?

    Sounds like something that should tracked to have addressed in the Gold release? How do I go about submitting a ticket?



  • I would assume most people are running the i386 install of the 2.0 code for those running it? Curious as to whether this is specific to the AMD64 version of the product…



  • Is the i386 version any different? Is the console still responsive when it's not on the network? What exactly is in your system log? What hardware are you running?



  • I am essentially having the same issue with my SUPER MICRO X7SLM-L-B board, dual core Intel E3400, 2GB RAM, dual integrated NICs and a Trendnet Gigabit PCI NIC. Same 2.0 RC3 amd64 build. Web Gui becomes unresponsive, unable to ping lan interface, stops routing internet traffic. Showing no errors in the console I can reboot and everything come back up for several hours or even a few days.



  • mbuf exhaustion? (mbuf leaks?)

    When this happens, please do # netstat -m on the pfSense console (or ssh session) and post the output here.



  • I've been using 2.0-RC1 for 3 months now in both i386 and amd64 platforms as virtualbox 4.08 virtual machines. I've installed Tiny-DNS in all my installations because I want custom A, and MX records created for bogus domains (I'm developing a course that is to be run in a lab behind a NAT router which will also be a pfsense physical box). Everything was working great until this morning until I took the RC3 update.

    Issues I'm having since upgrading to RC3:
    Name resolution not working when pfsense box tries to use itself for name resolution. No errors in logs that I can see. I've tried restarting svscan, and the tiny-dns service. I've tried removing, rebooting and reinstalling the package. There is still internet/network connectivity, as I can ping hosts by IP without issue.

    I can disable tiny-dns, then use the default forwarder, then name resolution works again. As soon as I disable the default dns and enable tiny-dns, name resolution fails.

    I of course thought I had screwed something up (Still a viable possibility). So I did the following first on a Mac as the host, then on a Windows7 host:

    Installed pfsense 2.0 RC1 from scratch on both i386 and amd64 platforms. Took no updates. Installed tiny-dns. Works great.
    Took the update to RC3. Name resolution stops working.

    Installed pfsense 2.0 RC3 from scratch on both i386 and amd64 platforms. Installed tiny-dns. Name resolution doesn't work.

    Thanks for reading. Any help would be appreciated. I may just have to stick with RC1, but I thought I'd take a shot in the forums.



  • @jrmitchell83
    Are you running any form of power management?



  • Not running any power management and checked the BIOS, no power management there either. I'm running this on an HP 500B Pentium dual-core E5700 with Intel EXPI9402PT PIC-Express gigabit server adapter. I will try and capture a netstat -m and post to this thread.



  • omorgan: its no clear to me your post is relevant to the original post. Did you mean to post it in a new topic?



  • @wallabybob

    I thought my post was relevant to this topic, since I'm having issues with RC3. I thought it important to give some background. If you think this is should be a new topic, then I'll post it as a new one.



  • I'm having the same problem after upgrading from "2.0-BETA4 (i386) built on Wed Dec 15 07:49:38 EST 2010" to 2.0 RC3.  After the upgrade I'm unable to "ping" the management LAN inteferface and/or use pfSense as DNS resolver for my LAN clients.  The work around was to enter another DNS servers (i.e. 8.8.8.8) on my clients.

    However, after a few hours or days I start running into problems - Web Gui becomes unresponsive, unable to ping lan interface, stops routing internet traffic.  Showing no errors in the console (i.e. panic) I can reboot and everything come back up for several hours or even a few days.  Netstat is showing all 0's after these symptons start.

    I downgraded to 2.0-BETA 4 and I'm very stable again.  I'm running a PowerEdge 2900 III with Broadcom NetXtreme II BCM5708 1000BaseT.



  • @omorgan:

    I thought my post was relevant to this topic, since I'm having issues with RC3. I thought it important to give some background. If you think this is should be a new topic, then I'll post it as a new one.

    I understand its a "RC3 issue" and in that sense relevant to the topic. However I can't see any connection with the original post so I think its highly likely your problem won't get much attention in this topic. Having two apparently different problems in the one topic makes things a little bit harder for those who might contribute to figure out what is going on. For example, immediately after your post myjunkman wrote "I'm having the same problem …" Because this came soon after your post I first thought he was having the same problem as you are, but in the absence of any obvious connection with what you reported I have to conclude he was referring to the originally posted problem.

    I'd recommend you post it wherever you think it will get the most exposure to people who are likely to be able to help. Because it seems to me to be a DNS issue the DHCP and DNS forum might be a good place. However because it seems to be something newly broken in a 2.0 snapshot build I would post it in a new topic in this forum.



  • @myjunkman, Did you upgrade to 2.0 RC3 AMD64? I'm now testing with 2.0 RC3 i386 to see if the problem is specific to 2.0 RC3 AMD64. So far I'm not experiencing the problem with i386…I need to give it some more time however.

    Can you guys test something for me. Go to Status->System Logs->Settings. Enable "Show log entries in reverse order (newest entries on top)" and set "Number of log entries to show:" to 2000. Does you machine then become unresponsive? This seems to create a problem for me on both 2.0 RC3 i386 and amd64


  • Rebel Alliance Developer Netgate

    @jrmitchell83:

    Can you guys test something for me. Go to Status->System Logs->Settings. Enable "Show log entries in reverse order (newest entries on top)" and set "Number of log entries to show:" to 2000. Does you machine then become unresponsive? This seems to create a problem for me on both 2.0 RC3 i386 and amd64

    Trying to show that many lines in the GUI log would probably put a bit of a burden on the log parsing functions, since it will have to read that much data into memory, run it through whichever mojo is needed for the tab you're looking at, and since you are using reverse mode, flip the log array at the end. If your system is low on CPU/RAM that isn't likely to be a fast operation.



  • @jimp:

    @jrmitchell83:

    Can you guys test something for me. Go to Status->System Logs->Settings. Enable "Show log entries in reverse order (newest entries on top)" and set "Number of log entries to show:" to 2000. Does you machine then become unresponsive? This seems to create a problem for me on both 2.0 RC3 i386 and amd64

    Trying to show that many lines in the GUI log would probably put a bit of a burden on the log parsing functions, since it will have to read that much data into memory, run it through whichever mojo is needed for the tab you're looking at, and since you are using reverse mode, flip the log array at the end. If your system is low on CPU/RAM that isn't likely to be a fast operation.

    ssh into the box and run top. After clicking accept to the 2000 line log file and reversing the order, apache should spike to 100% cpu. Also look at how much ram it starts to take up. If it starts cutting into the swap file then the system will be horribly slow. If it's a single core processor then it would be natural for the duration of parsing the log file to make the system unresponsive. I would recommend a minimum of a dual cpu/core processor if you ever plan on viewing large amounts of log files or whatever. That way the system doesn't choke your firewall/router while you are trying to view logs of that size.


  • Rebel Alliance Developer Netgate

    Or just view the raw logs without the GUI parsing involved. (Obviously not as convenient…)



  • back up the log files every night to another system which can parse the files very fast ;)

    (I backup the log files and config files anyways, just in case…)


  • Rebel Alliance Developer Netgate

    Why back them up when you can push them off "live" with remote syslog? That way you can't lose any logs if your box dies 5 minutes before the backup job runs. :-)



  • @jimp:

    Why back them up when you can push them off "live" with remote syslog? That way you can't lose any logs if your box dies 5 minutes before the backup job runs. :-)

    I wasn't trying to overwhelm the OP person when he's trying to view virtually the entire log file from a browser  :o

    ;D



  • Circling back around with the original issue in this thread, I'm confirming that downgrading from 2.0 RC3 amd64 to 2.0 RC3 i386 has resolved my issue with the box becoming unresponsive and not routing traffic. There seems to be a definite problem with the 2.0 RC3 amd64 at this point. How can we track down the root cause so this can be addressed with the GA release?

    @hytek and others. The box I'm running this on is a 3GHz dual core with 4Gig RAM and an approved Intel NIC so it's pretty beefy. I was able to view 2000 lines of log and flip in with the same hardware in 1.2.3 without an issue.

    I'm just trying to report back findings as we move closer to GA. I'd really prefer to be running the 64-bit instantiation of pfSense, so resolving that problem would be great.


  • Rebel Alliance Developer Netgate

    We'd need a lot more info about the system, though it may have been some level of mbuf exhaustion or similar. Are those Intel NICs em or igb? If they are igb, they default to using four network queues which can run you out of mbufs quickly. There are sysctl tweaks to reduce that to 1, and also to raise the number of mbufs. We may have to set those by default for 2.0, it seems amd64 is more prone to hitting those.



  • It appears I've just run into the same issue…  Running RC3 June 28.  I'm at home and decided to check on new firewall.  I could login to the webgui, but trying to access any other tabs just timed out.  I could not access any of our port forwards either.  Fortunately, the reboot link in the gui did load after a minute or two and I was able to reboot remotely.  Everything is working now.  When I did get logged on and saw the dashboard, the MBUFs didn't look any different than normal.


  • Rebel Alliance

    Im running 2.0-RC3 (amd64) built on Mon Jun 20 13:26:12 EDT 2011 without any trouble ( Uptime: 11 days, 09:34 ).

    Running 2.0-RCx (amd64 ) since end of April, ( RC1 –> RC2, now RC3 ) on an Asus M2N32SLI with an Athlon X2 5600+ with 4Gb of Corsair XMS2 DDRII 800 Ram ( 2 x 2Gb ).

    Using 4 Interfaces ( 2 WANs in Failover mode, LAN and OPT connected to a Softswitch ) the 2 onboard "nfe" + 2 "re" ( pci-e TP-link TG-3468 ) and, yes it works, even using this non recommended HW.... :D

    Installed packages:  BandwidthD & Siproxd

    Till now im really happy with it... Thanks pfSense team!



  • I've downloaded version 2.0rc3 i386 and amd64 and noticed that none seems to be able to install in a Dell Poweredge r210 with or without raid1. The problem starts on 38% of installation, when kernel files are being copied, cdrom freezes and stops loading data, installation becomes frozen on 38% and then forever loading.. I've suspected that my internal cd drive could be the problem, so I've put a usb cdrom and tried the same process and for my surprise, pfsense installation can't automount from the correct cd device. I guess because it recognizes my internal cd and tries to mount it anyways.. So that's pretty much it. I can't install pfsense 20rc3 on my new poweredge r210 dell. At the moment I'm writing this message, I'm burning pfsense 1.2.3 to give it a try before going mad. I will be avail to clarify or try something else.

    thanks
    Rodrigo


  • Rebel Alliance Developer Netgate

    @delphus:

    I've downloaded version 2.0rc3 i386 and amd64 and noticed that none seems to be able to install in a Dell Poweredge r210 with or without raid1. The problem starts on 38% of installation, when kernel files are being copied, cdrom freezes and stops loading data, installation becomes frozen on 38% and then forever loading.. I've suspected that my internal cd drive could be the problem, so I've put a usb cdrom and tried the same process and for my surprise, pfsense installation can't automount from the correct cd device. I guess because it recognizes my internal cd and tries to mount it anyways.. So that's pretty much it. I can't install pfsense 20rc3 on my new poweredge r210 dell. At the moment I'm writing this message, I'm burning pfsense 1.2.3 to give it a try before going mad. I will be avail to clarify or try something else.

    There are already multiple threads for Dell r<x>10 issues, search the forum a bit. Try look for for r210, r310, r510, etc.</x>



  • VMWare bypasses most of these hardware issues…not much downside unless you are super concerned with the security implications of adding another operating system/hypervisor into the environment.

    AND

    you can create mutliple pfsense vm's to test different versions etc or have on on line as a hot backup...



  • @jimp:

    We'd need a lot more info about the system, though it may have been some level of mbuf exhaustion or similar. Are those Intel NICs em or igb? If they are igb, they default to using four network queues which can run you out of mbufs quickly. There are sysctl tweaks to reduce that to 1, and also to raise the number of mbufs. We may have to set those by default for 2.0, it seems amd64 is more prone to hitting those.

    Let me know what you need to troubleshoot. I'm running 2.0 RC3 i386 without a problem. Loading the 64 bit version reintroduces the issue. The Intel NICs are EM btw


  • Rebel Alliance Developer Netgate

    @jrmitchell83:

    Let me know what you need to troubleshoot. I'm running 2.0 RC3 i386 without a problem. Loading the 64 bit version reintroduces the issue. The Intel NICs are EM btw

    At a minimum, run netstat -m periodically and watch the total/max mbuf clusters and see if it's maxing out.


Locked