LCDProc 0.5.4-dev
-
2. The LCDd process can start to use a lot of CPU time causing the box to become unresponsive (but still routes traffic OK).
I have just acquired an x700 from work and trying to get it running smoothly and am encountering this problem. It is a clean full install (on HDD) of 2.0.1 with IPv6 git sync'd as of 21 Jan.
I keep running into this within 16-24 hours uptime where the system is non-responsive but still routes although more slowly than normal. I cannot SSH use the webif, or console when this happens. The RRD graphs stop as well. I have only the load and states screens enabled thinking maybe the number of screens was an issue, but today when it became unresponsive it was stuck on the load screen and the current load was displayed as a 24.4!!!I am also of the (pretty much worthless) opinion that it is too many instances of the LCDProc client as I can see the log entries posted above. I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
-
Also check states table size and usage on Dashboard.
-
@tix:
I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.[2.0.1-RELEASE][root@pfsense.fire.box]/root(4): ps aux | grep lcd root 16576 0.0 0.3 3656 1508 ?? S 10:59PM 0:00.00 /bin/sh /tmp/lcdclient.sh root 16628 0.0 3.4 47452 17388 ?? S 10:59PM 0:00.31 /usr/local/bin/php -f /usr/local/pkg/lcdproc_client.php [2.0.1-RELEASE][root@pfsense.fire.box]/root(5): ps aux | grep LCD nobody 16562 0.0 0.3 3368 1464 ?? Ss 10:59PM 0:00.02 /usr/local/sbin/LCDd -c /usr/local/etc/LCDd.conf
You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
-
Here is my recent log of LCDproc after a fresh reboot of my Firebox, right now I have System Time and System Stats working.
EDIT: I had my WAN cable unplugged. I replaced my X500 with my X700 while I did this…
Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:21770 on socket 13 Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:34709 on socket 12 Jan 22 20:37:20 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 20:37:20 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 20:37:20 LCDd: LCDd version 0.5.5 starting Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: : Restarting/Starting all packages. Jan 22 20:37:11 check_reload_status: Starting packages Jan 22 20:37:11 php: : pfSense package system has detected an ip change 0.0.0.0 -> ... Restarting packages. Jan 22 20:37:11 php: : OpenNTPD is starting up. Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:57 php: : ERROR! PPTP enabled but could not resolve the $pptpdtarget Jan 22 20:36:56 php: : The command '/usr/bin/killall 'ntpd'' returned exit code '1', the output was 'killall: warning: kill -TERM 59839: No such process' Jan 22 20:36:56 php: : Creating rrd update script Jan 22 20:36:56 php: : Resyncing OpenVPN instances for interface LAN. Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
-
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
Thanks Steve but I've tried that and even after more than an hour I still cannot get the menu so I usually give up and do a power-cycle as that is the only way I've found to recover.
I would love to be able to confirm if there are multiple instances as well as memory usage and such but so far that is not an option.I forgot to mention previously that someone had suggested that when the connection drops that all the packages are restarted so I disabled the gateway monitor but that made no difference.
The next time I get to watch it for a full day (next weekend?), I will keep an SSH session up running top and hopefully capture some useful information.
EDIT:
One more thing: The looking at the RRD graphs (processor, memory, & states), there is nothing indicating an increase in processes or utilization. Everything is "normal" and then just stops graphing until I reboot. -
Here is the X700's System Log pertaining LCDproc 0.5.5 v0.8
Jan 22 22:26:33 LCDd: Connect from host 127.0.0.1:42443 on socket 13 Jan 22 22:26:31 LCDd: Connect from host 127.0.0.1:61742 on socket 12 Jan 22 22:26:29 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 22:26:29 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 22:26:29 LCDd: LCDd version 0.5.5 starting Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: : Restarting/Starting all packages. Jan 22 22:26:21 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 22:26:20 check_reload_status: Starting packages Jan 22 22:26:20 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
top -S for LCDd
PID.....USERNAME.THR.PRI.NICE.SIZE...RES....STATE.TIME.WCPU.COMMAND 27586.nobody......1.....74..r30...3368K.1488K.RUN.....0:00.0.00%.LCDd
-
I turned LCDd on overnight and left my SSH sessions active running top -SH
This morning I have an unresponsive machine except the SSH sessions are still active. (Also, everyone was asleep so there was no usage from LAN to WAN.)I killed the LCDd process after capturing the information below and load drops to less than .10 but I cannot restart it. If I use the webif it will generate the previously mentioned log entries and the same occurs starting it from the CLI.
top -SH
last pid: 31188; load averages: 7.99, 7.75, 6.91 up 0+17:13:06 08:55:32 92 processes: 11 running, 68 sleeping, 1 zombie, 12 waiting CPU: 18.9% user, 0.0% nice, 80.6% system, 0.5% interrupt, 0.0% idle Mem: 62M Active, 15M Inact, 39M Wired, 34M Buf, 118M Free Swap: 512M Total, 512M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 34608 nobody 74 r30 3368K 1488K RUN 77:11 100.00% LCDd 10 root 171 ki31 0K 8K RUN 797:50 0.00% idle 11 root -28 - 0K 96K WAIT 1:57 0.00% {swi5: +} 45412 root 44 0 6588K 4776K kqread 0:56 0.00% lighttpd 11 root -32 - 0K 96K WAIT 0:50 0.00% {swi4: clock} 0 root -16 0 0K 48K sched 0:44 0.00% {swapper} 34700 root 76 0 33092K 16244K RUN 0:31 0.00% php 45906 root 44 0 3712K 1960K RUN 0:31 0.00% top 32600 root 76 0 33092K 16136K RUN 0:31 0.00% php 40548 root 76 20 3656K 1492K piperd 0:14 0.00% sh 13 root -16 - 0K 8K - 0:13 0.00% yarrow 24126 root 44 0 4948K 2516K select 0:12 0.00% syslogd 19804 root 44 0 7992K 3604K select 0:11 0.00% sshd 19 root 44 - 0K 8K syncer 0:07 0.00% syncer 35740 root 48 0 37188K 18800K piperd 0:07 0.00% php 21686 root 44 0 3316K 924K piperd 0:06 0.00% logger
System.log tail
tail /var/log/system.log Jan 23 02:35:21 pfsense dnsmasq[6805]: possible DNS-rebind attack detected: appsforbb.com Jan 23 02:58:07 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:26:29 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:58:37 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 05:34:21 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 06:34:48 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:35:19 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:59:59 pfsense dhclient: RENEW Jan 23 07:59:59 pfsense dhclient: Creating resolv.conf
-
Further confirmation of a problem. Thanks.
By the way pfSense uses circular logs so you have to use clog to tail them.
See: http://doc.pfsense.org/index.php/LOGS:Why_can%27t_I_view_view_log_files_with_cat/grep/etc%3F%28clog%29Steve
-
Well, today when I woke up and saw my Watchguard I noticed the "Thanks for using pfSense" and hung LCDd… I can restart it, but it would be nice for it not to crash. Will keep testing... All I did was add Interface (and on the dropdown kept WAN)
-
Hello everybody,
I just updated the package. The news are the following:- Fix the uptime screen
- In the settings, "Enable LCDproc" becomes "Enable LCDproc at startup"
- Added "After Install Info" message
- Limited the client loop to three times. After three errors connecting to the LCDd service the client will end
I hope the 4th point will solve the problems for the Firebox boxes! Just wait for the 0.9 version of the package and update it…
Thanks to all for the feedbacks,
MicheleEDIT: With a feedback, please post the driver/port you're using, the values of the options of the service and the list of the screens active, thanks!!
-
Hello,
can I leave the LED backlight on all the time? Is this implemented in this update?
Or will it come in a future version?M.
-
@power_matz: The backlight control is part of the driver not the package so this won't have changed.
@mdima: Awesome work! ;D I can't wait to test it.
Steve
-
can I leave the LED backlight on all the time? Is this implemented in this update?
We are currently focusing on fixing the issue of multiple instances of the server, as well as building the code for driving the LEDs. Once that is stable, we can revisit the user parameters.
-
Well the update didn't help for me… :(
Mem: 44M Active, 15M Inact, 37M Wired, 28M Buf, 139M Free Swap: 512M Total, 512M Free 48731 nobody 74 r30 3368K 1488K RUN 77:11 100.00% LCDd
Running clog of /var/log/system.log showed no entries related to the LCD processes.
However it DID continue to work properly this time although the screens changed about every 7 secs or so instead of 5 secs with the webif non-responsive and new SSH sessions not completing to connect. Existing SSH sessions continue just slowly as expected under the load.
For reference all my settings for the Firebox x700 are "default" with "2x20 columns" and the SDEC driver on /dev/lpt0. The only screens enabled are: Uptime, Load, States, Mbuf, & Interface Traffic (WAN).
GW monitoring is disabled.EDIT: Forgot to add
- like the new Uptime screen! Much cleaner.
- in my set up, it seems to be failing after around 16-18 hours uptime and/or around 4-5am, not sure which yet, with no network activity occurring.
-
@tix:
For reference all my settings for the Firebox x700 are "default" with "2x20 columns" and the SDEC driver on /dev/lpt0. The only screens enabled are: Uptime, Load, States, Mbuf, & Interface Traffic (WAN).
GW monitoring is disabled.Hi Tix,
thanks for the info! I am now running my secondary machine with the same screens in order to test the "client" part of the package. If there is something wrong in the client for this screens I will find out soon. Of course I can only test the client because I use a different driver… but if the test goes OK it means some problem in LCDd 0.5.5 or the driver.@tix:
EDIT: Forgot to add
- like the new Uptime screen! Much cleaner.
- in my set up, it seems to be failing after around 16-18 hours uptime and/or around 4-5am, not sure which yet, with no network activity occurring.
Thanks for apreciating! I will see tomorrow how my secondary machine is doing…
Ciao,
Michele -
You think you can program more stuff on the LCDproc MENU?
Maybe stuff like webGUI restart, backup config, restore config, reboot system
Thanks! Doing an awesome job!
To all, I am using GWXepc for the Arm/Disarm LED to turn it green when pfSense comes up and red when it goes down.
What I did was edit the beep.sh and in "start" I added GWXepc -l green and in "stop" I added GWXepc -l red:)
-
I've now been running 0.54 versions of LCDd and sdeclcd for 24hrs and am still experiencing the same LCDd CPU usage lock out. So I think we can rule out going to 0.55 as a problem.
Interestingly I was able to observe it happening this afternoon and it ramps up slowly as if it's looping around creating steadily more and more processes until it hits 100% cpu.One way to test this would be to compile the old driver against 0.55 and run that. I don't have a suitable compile environment setup at the moment though.
Steve
-
Steve,
how many instances of the client were running at that time? Do you have any log that evidence problems or concurrent clients running in the same time?Thanks,
Michele -
Hmm, I'm not sure what you mean.
There were two distinct problems I experienced.
Firstly where multiple copies of the client ended up running. This happened immediately after either rebooting or restarting the service.
Secondly where LCDd ends up using 100% cpu. Just one client can be running when this happens.
I don't have much by way of logging. Can we increase the logging level in lcdd.conf?Steve
-
Hmm, I'm not sure what you mean.
There were two distinct problems I experienced.
Firstly where multiple copies of the client ended up running. This happened immediately after either rebooting or restarting the service.Ok, but thanks to the last change the "second client" was stopped after 2 more attempts to connect, I guess.
Secondly where LCDd ends up using 100% cpu. Just one client can be running when this happens.
I don't have much by way of logging. Can we increase the logging level in lcdd.conf?Ok, that's awesome. At least we know that the issue is not related to multiple instances of the client running, which in any case should not give any problem (since LCDd is made to support different clients in the same time), but for sure it is not a "resoruce leak" because of the PHP client bothering the system…
Thanks,
Michele