LCDProc 0.5.4-dev
-
I'm running the Time and Uptime screens. Both boxes.
Steve
-
I'm running the Time and Uptime screens. Both boxes.
Steve
mmmhhh… I am trying to mess it up as much as I can, but really, after every try I only get 1 server and 1 client running (also with enable/disable of ipsec that foces a package resync)... with:
Refresh: 1 second
Driver: sureelec
Screens: ALLI don't even find the problem on the stats and time screens, both from a machine running since 31 days or a machine just booted...
Anyway, I will change the package in order to let the client make only 3 attempts to connect to the service then quit...
-
I appreciate all your efforts, thanks. :)
It's looking like a driver issue is more likely then.
There seem to be two issues here:
1. When the package is started/restarted in quick succession some delay in the process can result in LCDd trying to start before the previous version has stopped. Or possibly the previous version simply does not stop as I have found when trying to kill it manually.2. The LCDd process can start to use a lot of CPU time causing the box to become unresponsive (but still routes traffic OK).
For testing purposes we can either:
Build the old driver against 0.55 or
Swap out LCDd with a 0.54 version and equivalent driver.What do think? I may have 0.54 version files the fmertz mailed to me. I'll try that if I do.
Steve
-
looking like a driver issue is more likely then.
Well, a driver is a shared library. It is coded so there is just one copy of the code between multiple instances of LCDd server. For instance data, everything is dynamically allocated, so each instance behaves independently. Obviously, as there is just one set of interface ports, and one LCD, this flexibility is not necessarily helping. I think fixing the multiple instances of the server will go a long way towards fixing these problems. I can look at the code again to see if I can fail initialization on second instance and up. Let me know if I can help compiling anything.
-
2. The LCDd process can start to use a lot of CPU time causing the box to become unresponsive (but still routes traffic OK).
I have just acquired an x700 from work and trying to get it running smoothly and am encountering this problem. It is a clean full install (on HDD) of 2.0.1 with IPv6 git sync'd as of 21 Jan.
I keep running into this within 16-24 hours uptime where the system is non-responsive but still routes although more slowly than normal. I cannot SSH use the webif, or console when this happens. The RRD graphs stop as well. I have only the load and states screens enabled thinking maybe the number of screens was an issue, but today when it became unresponsive it was stuck on the load screen and the current load was displayed as a 24.4!!!I am also of the (pretty much worthless) opinion that it is too many instances of the LCDProc client as I can see the log entries posted above. I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
-
Also check states table size and usage on Dashboard.
-
@tix:
I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.[2.0.1-RELEASE][root@pfsense.fire.box]/root(4): ps aux | grep lcd root 16576 0.0 0.3 3656 1508 ?? S 10:59PM 0:00.00 /bin/sh /tmp/lcdclient.sh root 16628 0.0 3.4 47452 17388 ?? S 10:59PM 0:00.31 /usr/local/bin/php -f /usr/local/pkg/lcdproc_client.php [2.0.1-RELEASE][root@pfsense.fire.box]/root(5): ps aux | grep LCD nobody 16562 0.0 0.3 3368 1464 ?? Ss 10:59PM 0:00.02 /usr/local/sbin/LCDd -c /usr/local/etc/LCDd.conf
You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
-
Here is my recent log of LCDproc after a fresh reboot of my Firebox, right now I have System Time and System Stats working.
EDIT: I had my WAN cable unplugged. I replaced my X500 with my X700 while I did this…
Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:21770 on socket 13 Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:34709 on socket 12 Jan 22 20:37:20 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 20:37:20 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 20:37:20 LCDd: LCDd version 0.5.5 starting Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: : Restarting/Starting all packages. Jan 22 20:37:11 check_reload_status: Starting packages Jan 22 20:37:11 php: : pfSense package system has detected an ip change 0.0.0.0 -> ... Restarting packages. Jan 22 20:37:11 php: : OpenNTPD is starting up. Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:57 php: : ERROR! PPTP enabled but could not resolve the $pptpdtarget Jan 22 20:36:56 php: : The command '/usr/bin/killall 'ntpd'' returned exit code '1', the output was 'killall: warning: kill -TERM 59839: No such process' Jan 22 20:36:56 php: : Creating rrd update script Jan 22 20:36:56 php: : Resyncing OpenVPN instances for interface LAN. Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
-
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
Thanks Steve but I've tried that and even after more than an hour I still cannot get the menu so I usually give up and do a power-cycle as that is the only way I've found to recover.
I would love to be able to confirm if there are multiple instances as well as memory usage and such but so far that is not an option.I forgot to mention previously that someone had suggested that when the connection drops that all the packages are restarted so I disabled the gateway monitor but that made no difference.
The next time I get to watch it for a full day (next weekend?), I will keep an SSH session up running top and hopefully capture some useful information.
EDIT:
One more thing: The looking at the RRD graphs (processor, memory, & states), there is nothing indicating an increase in processes or utilization. Everything is "normal" and then just stops graphing until I reboot. -
Here is the X700's System Log pertaining LCDproc 0.5.5 v0.8
Jan 22 22:26:33 LCDd: Connect from host 127.0.0.1:42443 on socket 13 Jan 22 22:26:31 LCDd: Connect from host 127.0.0.1:61742 on socket 12 Jan 22 22:26:29 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 22:26:29 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 22:26:29 LCDd: LCDd version 0.5.5 starting Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: : Restarting/Starting all packages. Jan 22 22:26:21 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 22:26:20 check_reload_status: Starting packages Jan 22 22:26:20 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
top -S for LCDd
PID.....USERNAME.THR.PRI.NICE.SIZE...RES....STATE.TIME.WCPU.COMMAND 27586.nobody......1.....74..r30...3368K.1488K.RUN.....0:00.0.00%.LCDd
-
I turned LCDd on overnight and left my SSH sessions active running top -SH
This morning I have an unresponsive machine except the SSH sessions are still active. (Also, everyone was asleep so there was no usage from LAN to WAN.)I killed the LCDd process after capturing the information below and load drops to less than .10 but I cannot restart it. If I use the webif it will generate the previously mentioned log entries and the same occurs starting it from the CLI.
top -SH
last pid: 31188; load averages: 7.99, 7.75, 6.91 up 0+17:13:06 08:55:32 92 processes: 11 running, 68 sleeping, 1 zombie, 12 waiting CPU: 18.9% user, 0.0% nice, 80.6% system, 0.5% interrupt, 0.0% idle Mem: 62M Active, 15M Inact, 39M Wired, 34M Buf, 118M Free Swap: 512M Total, 512M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 34608 nobody 74 r30 3368K 1488K RUN 77:11 100.00% LCDd 10 root 171 ki31 0K 8K RUN 797:50 0.00% idle 11 root -28 - 0K 96K WAIT 1:57 0.00% {swi5: +} 45412 root 44 0 6588K 4776K kqread 0:56 0.00% lighttpd 11 root -32 - 0K 96K WAIT 0:50 0.00% {swi4: clock} 0 root -16 0 0K 48K sched 0:44 0.00% {swapper} 34700 root 76 0 33092K 16244K RUN 0:31 0.00% php 45906 root 44 0 3712K 1960K RUN 0:31 0.00% top 32600 root 76 0 33092K 16136K RUN 0:31 0.00% php 40548 root 76 20 3656K 1492K piperd 0:14 0.00% sh 13 root -16 - 0K 8K - 0:13 0.00% yarrow 24126 root 44 0 4948K 2516K select 0:12 0.00% syslogd 19804 root 44 0 7992K 3604K select 0:11 0.00% sshd 19 root 44 - 0K 8K syncer 0:07 0.00% syncer 35740 root 48 0 37188K 18800K piperd 0:07 0.00% php 21686 root 44 0 3316K 924K piperd 0:06 0.00% logger
System.log tail
tail /var/log/system.log Jan 23 02:35:21 pfsense dnsmasq[6805]: possible DNS-rebind attack detected: appsforbb.com Jan 23 02:58:07 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:26:29 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:58:37 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 05:34:21 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 06:34:48 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:35:19 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:59:59 pfsense dhclient: RENEW Jan 23 07:59:59 pfsense dhclient: Creating resolv.conf
-
Further confirmation of a problem. Thanks.
By the way pfSense uses circular logs so you have to use clog to tail them.
See: http://doc.pfsense.org/index.php/LOGS:Why_can%27t_I_view_view_log_files_with_cat/grep/etc%3F%28clog%29Steve
-
Well, today when I woke up and saw my Watchguard I noticed the "Thanks for using pfSense" and hung LCDd… I can restart it, but it would be nice for it not to crash. Will keep testing... All I did was add Interface (and on the dropdown kept WAN)
-
Hello everybody,
I just updated the package. The news are the following:- Fix the uptime screen
- In the settings, "Enable LCDproc" becomes "Enable LCDproc at startup"
- Added "After Install Info" message
- Limited the client loop to three times. After three errors connecting to the LCDd service the client will end
I hope the 4th point will solve the problems for the Firebox boxes! Just wait for the 0.9 version of the package and update it…
Thanks to all for the feedbacks,
MicheleEDIT: With a feedback, please post the driver/port you're using, the values of the options of the service and the list of the screens active, thanks!!
-
Hello,
can I leave the LED backlight on all the time? Is this implemented in this update?
Or will it come in a future version?M.
-
@power_matz: The backlight control is part of the driver not the package so this won't have changed.
@mdima: Awesome work! ;D I can't wait to test it.
Steve
-
can I leave the LED backlight on all the time? Is this implemented in this update?
We are currently focusing on fixing the issue of multiple instances of the server, as well as building the code for driving the LEDs. Once that is stable, we can revisit the user parameters.
-
Well the update didn't help for me… :(
Mem: 44M Active, 15M Inact, 37M Wired, 28M Buf, 139M Free Swap: 512M Total, 512M Free 48731 nobody 74 r30 3368K 1488K RUN 77:11 100.00% LCDd
Running clog of /var/log/system.log showed no entries related to the LCD processes.
However it DID continue to work properly this time although the screens changed about every 7 secs or so instead of 5 secs with the webif non-responsive and new SSH sessions not completing to connect. Existing SSH sessions continue just slowly as expected under the load.
For reference all my settings for the Firebox x700 are "default" with "2x20 columns" and the SDEC driver on /dev/lpt0. The only screens enabled are: Uptime, Load, States, Mbuf, & Interface Traffic (WAN).
GW monitoring is disabled.EDIT: Forgot to add
- like the new Uptime screen! Much cleaner.
- in my set up, it seems to be failing after around 16-18 hours uptime and/or around 4-5am, not sure which yet, with no network activity occurring.
-
@tix:
For reference all my settings for the Firebox x700 are "default" with "2x20 columns" and the SDEC driver on /dev/lpt0. The only screens enabled are: Uptime, Load, States, Mbuf, & Interface Traffic (WAN).
GW monitoring is disabled.Hi Tix,
thanks for the info! I am now running my secondary machine with the same screens in order to test the "client" part of the package. If there is something wrong in the client for this screens I will find out soon. Of course I can only test the client because I use a different driver… but if the test goes OK it means some problem in LCDd 0.5.5 or the driver.@tix:
EDIT: Forgot to add
- like the new Uptime screen! Much cleaner.
- in my set up, it seems to be failing after around 16-18 hours uptime and/or around 4-5am, not sure which yet, with no network activity occurring.
Thanks for apreciating! I will see tomorrow how my secondary machine is doing…
Ciao,
Michele -
You think you can program more stuff on the LCDproc MENU?
Maybe stuff like webGUI restart, backup config, restore config, reboot system
Thanks! Doing an awesome job!
To all, I am using GWXepc for the Arm/Disarm LED to turn it green when pfSense comes up and red when it goes down.
What I did was edit the beep.sh and in "start" I added GWXepc -l green and in "stop" I added GWXepc -l red:)