Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Serial console not working on Alix after upgrading from 2.1.4 to 2.1.5

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    17 Posts 4 Posters 4.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      There was initially an issue with the Nano update files for 2.1.5. It was switching to Nano+VGA which removes the serial console. It was fixed though so if you actually upgraded yesterday from the webgui you won't have seen this problem.
      Is it just the console menu you aren't seeing? You still see the boot messages up to 'bootup complete'?

      Steve

      1 Reply Last reply Reply Quote 0
      • S
        silvertip257
        last edited by

        @ MaxPF:
        gotcha - I'm not extremely worried about the flashing LEDs.
        I'm sure if I took the time, I could use the libraries pfsense uses to flash the LEDs and write something to stop them from flashing.  Would be fun, but I have to figure out my other current problems first! :-\

        @stephenw10:

        There was initially an issue with the Nano update files for 2.1.5. It was switching to Nano+VGA which removes the serial console. It was fixed though so if you actually upgraded yesterday from the webgui you won't have seen this problem.
        Is it just the console menu you aren't seeing? You still see the boot messages up to 'bootup complete'?

        Steve

        Hi Steve!

        Yes, I did "just" upgrade yesterday.
        I saw a thread about the 2.1.4 nanobsd URLs pointing at the VGA instead of serial ones (or whatever exactly).  I figured I'd be in the clear since we're on 2.1.5 now…

        Your reply to this thread made me think to check ... and "darn it!"  -- I have a pound prompt # once again.  Instead of the expected pfsense menu.

        
        # echo $SHELL
        /bin/sh
        
        # uptime
         8:35PM  up 1 day, 36 mins, 1 user, load averages: 0.01, 0.02, 0.08
        
        # uname -a
        FreeBSD fw0.localdomain 8.3-RELEASE-p16 FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:41 EDT 2014     root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_wrap.8.i386  i386
        
        # cat /etc/version*
        2.1.5-RELEASE
        Mon Aug 25 07:44:26 EDT 2014
        a94b9befbdee3beaea5741e105336c7fba77e3de
        8.3
        8.1
        
        ; now jump into the ALIX via SSH
        [2.1.5-RELEASE][admin@fw0.localdomain]/root(2): fstat | awk '\!/CMD/{print $2}' | sort | uniq -c | sort -n | tail -n5
          25 racoon
          36 minicron
          36 syslogd
          52 php
         355 filterdns
        
        [2.1.5-RELEASE][admin@fw0.localdomain]/root(4): top -d1 | head -n5
        last pid: 95974;  load averages:  0.03,  0.05,  0.07  up 1+00:42:50    20:41:35
        41 processes:  2 running, 39 sleeping
        
        Mem: 37M Active, 11M Inact, 71M Wired, 1008K Cache, 33M Buf, 110M Free
        Swap:
        
        [2.1.5-RELEASE][admin@fw0.localdomain]/root(17): grep "`date +'%b %d'`" /var/log/system.log | grep kill
        Sep 10 10:13:36 fw0 kernel: pid 5517 (php), uid 0, was killed: out of swap space
        Sep 10 10:13:38 fw0 kernel: pid 6078 (php), uid 0, was killed: out of swap space
        Sep 10 10:13:40 fw0 kernel: pid 5853 (php), uid 0, was killed: out of swap space
        Sep 10 10:13:42 fw0 kernel: pid 7292 (php), uid 0, was killed: out of swap space
        Sep 10 12:26:53 fw0 kernel: pid 86265 (php), uid 0, was killed: out of swap space
        Sep 10 12:26:55 fw0 kernel: pid 87438 (php), uid 0, was killed: out of swap space
        Sep 10 12:26:57 fw0 kernel: pid 86911 (php), uid 0, was killed: out of swap space
        Sep 10 12:26:59 fw0 kernel: pid 85831 (php), uid 0, was killed: out of swap space
        Sep 10 13:18:26 fw0 kernel: pid 51718 (php), uid 0, was killed: out of swap space
        Sep 10 13:18:28 fw0 kernel: pid 52134 (php), uid 0, was killed: out of swap space
        Sep 10 13:18:30 fw0 kernel: pid 51889 (php), uid 0, was killed: out of swap space
        Sep 10 17:39:45 fw0 kernel: pid 66578 (php), uid 0, was killed: out of swap space
        Sep 10 17:39:46 fw0 kernel: pid 66237 (php), uid 0, was killed: out of swap space
        
        

        Holy cow … what in the blazes is going on with filterdns!?!?!?
        355 open file handles in just over 24 hours.

        And the notion of "out of swap space" is just not right, there is no swap space.  I guess the likely case is that the system is low on RAM, throws this error, and starts OOM killing procs.

        I'm open to ideas from the pfSense community as well as pfSene admins/developers.
        I suspect my next step will be to re-examine my firewall rules … I don't have many and there aren't that many aliases.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Your use of awk and piped commands is impresive!  :)
          Check /etc/platform to make sure you're still on Nano. You are seeing far more on the serial port than you would in nano+vga.
          Yes the swap errors are really out of memory in Nano. Are you running any packages?

          Steve

          1 Reply Last reply Reply Quote 0
          • S
            silvertip257
            last edited by

            @stephenw10:

            Your use of awk and piped commands is impresive!  :)
            Check /etc/platform to make sure you're still on Nano. You are seeing far more on the serial port than you would in nano+vga.
            Yes the swap errors are really out of memory in Nano. Are you running any packages?

            Steve

            awk, sed, regexes are part of my day job (I'm no expert, but I'm exercising my mind!) :)

            Below includes my uptime (just as a reference, doesn't mean much as I had only a # prompt on serial console two days ago) and the output from /etc/platform … and yep it's nanobsd as expected.

            
            # uptime
             8:30PM  up 3 days, 32 mins, 1 user, load averages: 0.00, 0.00, 0.03
            
            # cat /etc/platform
            nanobsd
            
            

            I am not running any additional packages.
            I have the following daemons running:

            • apinger
            • dhcpd
            • dnsmasq
            • ntpd
            • racoon (one IPSec tunnel to work)

            I noticed the following in logs, which may indicate the source of the problem I'm seeing.  But again, why in the heck does filterdns have so many open file handles!?

            
            [2.1.5-RELEASE][admin@fw0.localdomain]/root(3): fstat | awk '\!/CMD/{print $2}' | sort | uniq -c | sort -n | tail -n5
              25 racoon
              36 minicron
              36 syslogd
              53 php
             969 filterdns               <------- holy smokes!
            
            [2.1.5-RELEASE][admin@fw0.localdomain]/root(3): grep "`date +'%b %d'`" /var/log/system.log | grep -i ipsec
            
            <...snipped...>
            
            Sep 12 20:07:02 fw0 check_reload_status: Restarting ipsec tunnels
            Sep 12 20:07:02 fw0 check_reload_status: Restarting ipsec tunnels
            Sep 12 20:07:28 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
            Sep 12 20:07:29 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
            Sep 12 20:38:33 fw0 check_reload_status: Restarting ipsec tunnels
            Sep 12 20:38:33 fw0 check_reload_status: Restarting ipsec tunnels
            Sep 12 20:39:01 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
            
            

            I expect the large amount of IPSec instability is relative to low memory, which for now I'll blame on filterdns.

            I do have a domain name in my IPSec config, which is largely unnecessary since that host has a static IP.  I'm setting it to the corresponding static IP and will probably reboot my pfsense box to see if that remedies the ~24 hour OOM killing and the serial console droping to a # pound prompt.

            Also some of my firewall rules have pf aliases, but I don't expect they'd cause filterdns to go ape.

            1 Reply Last reply Reply Quote 0
            • S
              silvertip257
              last edited by

              Less than 12 hours later (I removed the host name from my IPSec config at nearly 21:00 and it's 08:15 now),  I'm looking at a # pound prompt again.

              
              [2.1.5-RELEASE][admin@fw0.localdomain]/root(2): uptime
               8:13AM  up 11:18, 2 users, load averages: 0.07, 0.35, 0.33
              
              [2.1.5-RELEASE][admin@fw0.localdomain]/root(3): fstat | awk '\!/CMD/{print $2}' | sort | uniq -c | sort -n | tail -n5
                23 dhclient
                25 racoon
                36 minicron
                36 syslogd
                54 php
              
              [2.1.5-RELEASE][admin@fw0.localdomain]/root(5): grep "`date +'%b %d'`" /var/log/system.log | egrep -i 'ipsec|check_reload_status'
              
              <...snipped...>
              
              Sep 13 08:08:55 fw0 check_reload_status: updating dyndns <apinger_gw_name>Sep 13 08:08:55 fw0 check_reload_status: Restarting ipsec tunnels
              Sep 13 08:08:55 fw0 check_reload_status: Restarting OpenVPN tunnels/interfaces
              Sep 13 08:08:55 fw0 check_reload_status: Reloading filter
              Sep 13 08:08:55 fw0 check_reload_status: updating dyndns <apinger_gw_name>Sep 13 08:08:55 fw0 check_reload_status: Restarting ipsec tunnels
              Sep 13 08:08:55 fw0 check_reload_status: Restarting OpenVPN tunnels/interfaces
              Sep 13 08:08:55 fw0 check_reload_status: Reloading filter
              Sep 13 08:09:22 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
              Sep 13 08:09:22 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
              Sep 13 08:16:05 fw0 check_reload_status: updating dyndns <apinger_gw_name>Sep 13 08:16:05 fw0 check_reload_status: Restarting ipsec tunnels
              Sep 13 08:16:05 fw0 check_reload_status: Restarting OpenVPN tunnels/interfaces
              Sep 13 08:16:05 fw0 check_reload_status: Reloading filter
              Sep 13 08:16:05 fw0 check_reload_status: updating dyndns <apinger_gw_name>Sep 13 08:16:05 fw0 check_reload_status: Restarting ipsec tunnels
              Sep 13 08:16:05 fw0 check_reload_status: Restarting OpenVPN tunnels/interfaces
              Sep 13 08:16:05 fw0 check_reload_status: Reloading filter
              Sep 13 08:16:32 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
              Sep 13 08:16:32 fw0 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.</apinger_gw_name></apinger_gw_name></apinger_gw_name></apinger_gw_name> 
              

              apinger logs show the custom/user-defined VPN end-point that I'm monitoring is down, then alarm canceled - but those are within no-less-than 12 minute periods (which doesn't match up with other logs).

              –---

              One positive result though, there are not a bunch of open file handles by filterdns since I removed the domain name from the IPSec config.  Unfortunately the problem still raises its head, so I'm back to square one.

              I have gateway monitoring set up for a host at the other end of the IPSec VPN tunnel.  (I toned down the ping interval and all, so it's not steady every second.)  In logs, it is odd that check_reload_status refers to the name of my apinger/gateway as a dyndns name (which it is not), then says the IP changed (it didn't), and then reloads my tunnel at insanely short intervals.

              My next step is to disable that gateway monitoring to:

              • see if the IPSec tunnel is reloaded at the short intervals still
              • see if the pfSense serial console is again killed off in 12 hours (which might be closer to the actual elapsed time after reboot)
              1 Reply Last reply Reply Quote 0
              • S
                silvertip257
                last edited by

                So I'm looking at just over 60 hours uptime and the serial console is back to a pound symbol (not the menu as expected).  I did check at various intervals the previous day(s) and the menu displayed just fine.

                I'm about out of things to try (short of disabling my IPSec tunnel, which is not a long-term option – defeats my reasons for running pfSense at that point).

                Open to ideas.

                
                [2.1.5-RELEASE][admin@fw0.localdomain]/root(1): uptime
                10:32PM  up 2 days, 13:48, 2 users, load averages: 0.40, 0.20, 0.08
                
                [2.1.5-RELEASE][admin@fw0.localdomain]/root(3): top -d1 | head -n5
                last pid:  6435;  load averages:  0.16,  0.17,  0.07  up 2+13:48:56    22:33:39
                40 processes:  2 running, 38 sleeping
                
                Mem: 41M Active, 49M Inact, 72M Wired, 33M Buf, 68M Free
                Swap:
                
                [2.1.5-RELEASE][admin@fw0.localdomain]/root(4): fstat | awk '\!/CMD/{print $2}' | sort | uniq -c | sort -n | tail -n5
                  24 tcsh
                  25 racoon
                  36 minicron
                  36 syslogd
                  53 php
                
                # date
                Mon Sep 15 22:36:09 EDT 2014
                # uptime
                10:36PM  up 2 days, 13:52, 2 users, load averages: 0.07, 0.13, 0.07
                
                
                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, puzzling. Are you still seeing 'out of swap space' errors? What about the filesystem? Are either of the ramdisk devices full, what does df show?

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • S
                    silvertip257
                    last edited by

                    @stephenw10:

                    Hmm, puzzling. Are you still seeing 'out of swap space' errors? What about the filesystem? Are either of the ramdisk devices full, what does df show?

                    So last evening I rebooted my Alix before going to bed … just to see if I could get a 2+ day pattern out of the serial console menu/shell dying.  Well here I am to say no … between then and now - shy of 20 hours it died off.

                    
                    # uptime
                     6:42PM  up 19:58, 1 user, load averages: 0.00, 0.02, 0.00
                    
                    

                    I do not see swap space errors (or the kill messages) in /var/log/system.log

                    As a recap, I have removed a DNS host name from my IPSec configuration and disabled apinger from monitoring my VPN end point.

                    There are not any physical ramdisk devices.

                    
                    [2.1.5-RELEASE][admin@fw0.localdomain]/root(9): gpart show
                    =>     63  2006865  ad0  MBR  (980M)
                           63   922257    1  freebsd  [active]  (450M)
                       922320       63       - free -  (31k)
                       922383   922257    2  freebsd  (450M)
                      1844640   102816    3  freebsd  (50M)
                      1947456    59472       - free -  (29M)
                    
                    =>     0  922257  ad0s1  BSD  (450M)
                           0      16         - free -  (8.0k)
                          16  922241      1  !0  (450M)
                    
                    =>     0  922257  ad0s2  BSD  (450M)
                           0      16         - free -  (8.0k)
                          16  922241      1  !0  (450M)
                    
                    [2.1.5-RELEASE][admin@fw0.localdomain]/root(11): df
                    Filesystem        1K-blocks   Used  Avail Capacity  Mounted on
                    /dev/ufs/pfsense0    453327 150964 266097    36%    /
                    devfs                     1      1      0   100%    /dev
                    /dev/ufs/cf           50527   4015  42470     9%    /cf
                    /dev/md0              39406    404  35850     1%    /tmp
                    /dev/md1              59246  21396  33112    39%    /var
                    devfs                     1      1      0   100%    /var/dhcpd/dev
                    
                    

                    devfs  file systems are 100% utilized  – not sure if that's normal (ex: immediately post-reboot for example).

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      In nano /var and /tmp run in ram drives as standard. You can see them listed as md0 and md1 in df. If either of those become full you get problems but you would normally also see errors.

                      [2.1.5-RELEASE][root@pfsense.fire.box]/root(1): df
                      Filesystem        1K-blocks   Used  Avail Capacity  Mounted on
                      /dev/ufs/pfsense0    453327 246254 170807    59%    /
                      devfs                     1      1      0   100%    /dev
                      /dev/ufs/cf           50527   8345  38140    18%    /cf
                      /dev/md0              39406   3004  33250     8%    /tmp
                      /dev/md1              79086  26334  46426    36%    /var
                      devfs                     1      1      0   100%    /var/dhcpd/dev
                      
                      

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • N
                        nwebber
                        last edited by

                        @silvertip257:

                        Holy cow … what in the blazes is going on with filterdns!?!?!?
                        355 open file handles in just over 24 hours.

                        I have this same issue - seems to be triggered by using names (FQDN) in firewall IP aliases and I've worked around it (avoided it) by using hardcoded IP addresses instead. The IPs I need to put into those rules are static anyway but it was a (conceptually) nice feature to be able to refer to them by name instead of number.

                        1 Reply Last reply Reply Quote 0
                        • S
                          silvertip257
                          last edited by

                          Been a bit busy lately and haven't rebooted my pfsense device since I last said I did.  That was almost 13 days ago, but of course after 20 hours it had already dropped the serial console to a /bin/sh shell with a # pound prompt.

                          @nwebber:
                          Yeah, that's precisely what I ended up doing for my IPSec tunnel – no FQDNs.  And I'll agree, being able to use FQDNs was a nice feature so I could use them as documentation of sorts.

                          @stephenw10:
                          md[01] are not full, none of them have utilization greater than 39% (md1 which is /var).
                          I have a 1GB Compact Flash card in my ALIX 2D13 which does not likely have any bearing on the problems.  Thank you for your continued suggestions.

                          This is quite irritating … while I still have functional SSH access, in case of emergency the serial console is of little use.

                          At this point, I probably should:

                          • create another complete thread (without the [SOLVED] portion in the title)

                          • link back to this thread

                          • continue troubleshooting

                          Community feedback and suggestions are welcome … I'm running out of ideas.

                          In a way, I'd almost be pacified knowing how to restart the serial console menu when it dies!  (Clearly the serial shell dying should not happen, but so far I don't have a root cause.  And being able to fix it is almost good enough for the time being.)

                          • https://www.freebsd.org/doc/handbook/serialconsole-setup.html

                          • https://www.freebsd.org/doc/en_US.ISO8859-1/articles/console-server/freebsd.html

                          • https://www.freebsd.org/doc/faq/serial.html

                          FreeBSD's init

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Hmm, confusing indeed. It looks like the console itself is not failing rather the shell is. The only vaguely similar thing I have seen is where the menu fails to spawn completely at the end of boot and you're left with no prompt at all. There were a few workarounds in that case. One was to simply change the serial baudrate. Seems unlikely to make a difference here but it's an gui setting now so an easy test. For example:
                            https://forum.pfsense.org/index.php?topic=34814.0
                            Could try running at the #:

                            /etc/rc.initial
                            

                            Not entirely sure what that might do to an already running system so probably best not to do it in 'production'!

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • M
                              MaxPF
                              last edited by

                              @silvertip257:

                              At this point, I probably should:

                              • create another complete thread (without the [SOLVED] portion in the title)

                              • link back to this thread

                              • continue troubleshooting

                              I removed [SOLVED] from the title.

                              1 Reply Last reply Reply Quote 0
                              • S
                                silvertip257
                                last edited by

                                @stephenw10:

                                @stephenw10:

                                Hmm, confusing indeed. It looks like the console itself is not failing rather the shell is. The only vaguely similar thing I have seen is where the menu fails to spawn completely at the end of boot and you're left with no prompt at all. There were a few workarounds in that case. One was to simply change the serial baudrate. Seems unlikely to make a difference here but it's an gui setting now so an easy test. For example:
                                https://forum.pfsense.org/index.php?topic=34814.0
                                Could try running at the #:

                                /etc/rc.initial
                                

                                Not entirely sure what that might do to an already running system so probably best not to do it in 'production'!

                                Steve

                                I previously had the baud rate set to 9600 (default), but my ALIX BIOS uses 38400 … so a few weeks back I switched it to 38400 on pfsense as well.

                                In my case the menu does spawn on boot, but dies some hours later.  Each time (in this thread) I've said I rebooted my ALIX, I did verify I had a workable menu on the serial console.

                                I just tested running /etc/rc.initial from the serial console ... which so far appears not to have any ill effects (though I'm typing this up a few minutes after running it!).  And restores the serial console menu.

                                
                                [2.1.5-RELEASE][admin@fw0.localdomain]/root(4): date
                                Sat Oct 11 18:17:45 EDT 2014
                                
                                [2.1.5-RELEASE][admin@fw0.localdomain]/root(5): uptime
                                 6:17PM  up 25 days, 19:33, 2 users, load averages: 0.00, 0.00, 0.00
                                
                                [2.1.5-RELEASE][admin@fw0.localdomain]/root(10): ps auxff | grep rc.init
                                root   83274  0.0  0.6  3644  1512  u0  I+    6:11PM   0:00.02 /bin/sh /etc/rc.initial
                                root   36527  0.0  0.4  2444  1060   0  R+    6:21PM   0:00.00 grep rc.init
                                root   80618  0.0  0.5  3644  1320   0  Is    6:11PM   0:00.02 /bin/sh /etc/rc.initial
                                
                                

                                @MaxPF:

                                @MaxPF:

                                <…snipped...>

                                I removed [SOLVED] from the title.

                                Thanks Max!
                                I'll continue to use this thread at this point.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.