LCDProc 0.5.4-dev
-
I have a feeling that's because every time pfSense detects an IP change on an interface it restarts all the packages.
I think it may be caused by trying to start a new instance of LCDd before the old one has been killed. LCDd then cannot open local port 13366 and quits leaving lcdclient.sh running and spamming the logs. In the worst case this happens several times with the result that several instances of lcdclient are running.
Perhaps this is caused by the sleep delay in between the kill statements? Perhaps it should kill LCDd first? :-\In my opinion lcdclient.sh needs to quit after, say 3, tries if lcdproc_client.php cannot connect to LCDd. Is that what Seth had in mind when he setup the counter variable in lcdclient?
What do you think?
Steve
-
In my opinion lcdclient.sh needs to quit after, say 3, tries if lcdproc_client.php cannot connect to LCDd. Is that what Seth had in mind when he setup the counter variable in lcdclient?
What do you think?
Steve
Hi Steve,
yes, I agree… I will do this improvement in the next update of the package...Michele
-
I tried this LCDProc 0.5.4-dev and all it does is flash a few screens real fast then it says "Thank you for using pfSense" and sits on that screen forever… :O
-
Could you show us your system logs?
I suspect you are seeing the same as me. Is that on your X-Core box?Steve
-
I tried this LCDProc 0.5.4-dev and all it does is flash a few screens real fast then it says "Thank you for using pfSense" and sits on that screen forever… :O
Hi m4f,
Did you set all the new properties in the service configuration?
Can you tell me what values did you enter?Thanks,
Michele -
Steve: Yes, it's on the X500 (I also have an X700 but they are the same hardware)
Hrm! I had uninstalled it, when I installed it again it installed 0.5.5 v 0.8 :O
Here are my settings:
Under Server tab:
Enable LCDProc: checked
Com Port: Parallel port 1 (/dev/lpt0)
Display Size: 2 rows 20 columns
Driver: Watchguard Firebox with SDEC (x86 only)
Refresh frequency: 5 seconds
The rest are set Default and the LED option is disabledUnder Screens tab I have enabled:
Version, Time, Uptime, System, Disk, Load, States,
Load Balancer, Interfaces, CPU Freq and Internet Traffic -
I get these on boot, but after it boots my LCD works with this newer version.
Last 50 system log entries Jan 20 22:05:37 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:37 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:25 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:25 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:13 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:13 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:02 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:02 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:05:01 php: : Hotplug event detected for opt2 but ignoring since interface is configured with static IP () Jan 20 22:05:01 php: : Hotplug event detected for opt4 but ignoring since interface is configured with static IP () Jan 20 22:05:01 php: : Hotplug event detected for opt1 but ignoring since interface is configured with static IP () Jan 20 22:05:01 php: : Hotplug event detected for opt3 but ignoring since interface is configured with static IP () Jan 20 22:04:54 check_reload_status: Linkup starting re5 Jan 20 22:04:54 check_reload_status: Linkup starting re4 Jan 20 22:04:54 check_reload_status: Linkup starting re3 Jan 20 22:04:54 kernel: re5: link state changed to DOWN Jan 20 22:04:54 kernel: re4: link state changed to DOWN Jan 20 22:04:54 kernel: re3: link state changed to DOWN Jan 20 22:04:54 kernel: re2: link state changed to DOWN Jan 20 22:04:54 check_reload_status: Linkup starting re2 Jan 20 22:04:50 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:50 sshlockout[17089]: sshlockout/webConfigurator v3.0 starting up Jan 20 22:04:50 php: /index.php: Successful webConfigurator login for user 'admin' from 192.168.2.1 Jan 20 22:04:50 php: /index.php: Successful webConfigurator login for user 'admin' from 192.168.2.1 Jan 20 22:04:50 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:39 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:38 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:27 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:27 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:16 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:15 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:04:04 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:53 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:53 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:42 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:41 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:30 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:30 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:19 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:18 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:07 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:03:07 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:56 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:55 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:45 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:44 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:33 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:33 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 20 22:02:32 apinger: /usr/local/bin/rrdtool respawning too fast, waiting 300s.
EDIT:
I removed everything except SYSTEM TIME and SYSTEM STATS, good enough for me :)
-
Steve,
can you pls tell me which are your options/screen?Thanks,
Michele -
I'm running the Time and Uptime screens. Both boxes.
Steve
-
I'm running the Time and Uptime screens. Both boxes.
Steve
mmmhhh… I am trying to mess it up as much as I can, but really, after every try I only get 1 server and 1 client running (also with enable/disable of ipsec that foces a package resync)... with:
Refresh: 1 second
Driver: sureelec
Screens: ALLI don't even find the problem on the stats and time screens, both from a machine running since 31 days or a machine just booted...
Anyway, I will change the package in order to let the client make only 3 attempts to connect to the service then quit...
-
I appreciate all your efforts, thanks. :)
It's looking like a driver issue is more likely then.
There seem to be two issues here:
1. When the package is started/restarted in quick succession some delay in the process can result in LCDd trying to start before the previous version has stopped. Or possibly the previous version simply does not stop as I have found when trying to kill it manually.2. The LCDd process can start to use a lot of CPU time causing the box to become unresponsive (but still routes traffic OK).
For testing purposes we can either:
Build the old driver against 0.55 or
Swap out LCDd with a 0.54 version and equivalent driver.What do think? I may have 0.54 version files the fmertz mailed to me. I'll try that if I do.
Steve
-
looking like a driver issue is more likely then.
Well, a driver is a shared library. It is coded so there is just one copy of the code between multiple instances of LCDd server. For instance data, everything is dynamically allocated, so each instance behaves independently. Obviously, as there is just one set of interface ports, and one LCD, this flexibility is not necessarily helping. I think fixing the multiple instances of the server will go a long way towards fixing these problems. I can look at the code again to see if I can fail initialization on second instance and up. Let me know if I can help compiling anything.
-
2. The LCDd process can start to use a lot of CPU time causing the box to become unresponsive (but still routes traffic OK).
I have just acquired an x700 from work and trying to get it running smoothly and am encountering this problem. It is a clean full install (on HDD) of 2.0.1 with IPv6 git sync'd as of 21 Jan.
I keep running into this within 16-24 hours uptime where the system is non-responsive but still routes although more slowly than normal. I cannot SSH use the webif, or console when this happens. The RRD graphs stop as well. I have only the load and states screens enabled thinking maybe the number of screens was an issue, but today when it became unresponsive it was stuck on the load screen and the current load was displayed as a 24.4!!!I am also of the (pretty much worthless) opinion that it is too many instances of the LCDProc client as I can see the log entries posted above. I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
-
Also check states table size and usage on Dashboard.
-
@tix:
I am going to disable LCDProc-dev for now and see if everything stays stable for at least 24 hours.
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.[2.0.1-RELEASE][root@pfsense.fire.box]/root(4): ps aux | grep lcd root 16576 0.0 0.3 3656 1508 ?? S 10:59PM 0:00.00 /bin/sh /tmp/lcdclient.sh root 16628 0.0 3.4 47452 17388 ?? S 10:59PM 0:00.31 /usr/local/bin/php -f /usr/local/pkg/lcdproc_client.php [2.0.1-RELEASE][root@pfsense.fire.box]/root(5): ps aux | grep LCD nobody 16562 0.0 0.3 3368 1464 ?? Ss 10:59PM 0:00.02 /usr/local/sbin/LCDd -c /usr/local/etc/LCDd.conf
You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
-
Here is my recent log of LCDproc after a fresh reboot of my Firebox, right now I have System Time and System Stats working.
EDIT: I had my WAN cable unplugged. I replaced my X500 with my X700 while I did this…
Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:21770 on socket 13 Jan 22 20:37:22 LCDd: Connect from host 127.0.0.1:34709 on socket 12 Jan 22 20:37:20 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 20:37:20 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 20:37:20 LCDd: LCDd version 0.5.5 starting Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Sync: End package sync Jan 22 20:37:17 php: lcdproc: Sync: Begin package sync Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:17 php: : Restarting/Starting all packages. Jan 22 20:37:11 check_reload_status: Starting packages Jan 22 20:37:11 php: : pfSense package system has detected an ip change 0.0.0.0 -> ... Restarting packages. Jan 22 20:37:11 php: : OpenNTPD is starting up. Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:37:05 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:57 php: : ERROR! PPTP enabled but could not resolve the $pptpdtarget Jan 22 20:36:56 php: : The command '/usr/bin/killall 'ntpd'' returned exit code '1', the output was 'killall: warning: kill -TERM 59839: No such process' Jan 22 20:36:56 php: : Creating rrd update script Jan 22 20:36:56 php: : Resyncing OpenVPN instances for interface LAN. Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 20:36:54 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
-
The more people running this and giving feedback the faster it's likely to be solved. So thanks. :)
If you are running it again try to SSH in. After you enter the password it appears to have frozen but if you wait long enough it will log in (several minutes). Then you can see what's running. E.g.You should see one instance of each of those three programs running. If you have more then it has started or stopped incorrectly.
You can also run```
top -SHLCDd should be consuming 1% or less cpu. Steve
Thanks Steve but I've tried that and even after more than an hour I still cannot get the menu so I usually give up and do a power-cycle as that is the only way I've found to recover.
I would love to be able to confirm if there are multiple instances as well as memory usage and such but so far that is not an option.I forgot to mention previously that someone had suggested that when the connection drops that all the packages are restarted so I disabled the gateway monitor but that made no difference.
The next time I get to watch it for a full day (next weekend?), I will keep an SSH session up running top and hopefully capture some useful information.
EDIT:
One more thing: The looking at the RRD graphs (processor, memory, & states), there is nothing indicating an increase in processes or utilization. Everything is "normal" and then just stops graphing until I reboot. -
Here is the X700's System Log pertaining LCDproc 0.5.5 v0.8
Jan 22 22:26:33 LCDd: Connect from host 127.0.0.1:42443 on socket 13 Jan 22 22:26:31 LCDd: Connect from host 127.0.0.1:61742 on socket 12 Jan 22 22:26:29 LCDd: Listening for queries on 127.0.0.1:13666 Jan 22 22:26:29 LCDd: Using Configuration File: /usr/local/etc/LCDd.conf Jan 22 22:26:29 LCDd: LCDd version 0.5.5 starting Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: lcdproc: Sync: End package sync Jan 22 22:26:26 php: lcdproc: Sync: Begin package sync Jan 22 22:26:26 php: : Restarting/Starting all packages. Jan 22 22:26:21 php: lcdproc: Failed to connect to LCDd process Operation timed out (60) Jan 22 22:26:20 check_reload_status: Starting packages Jan 22 22:26:20 php: lcdproc: Failed to connect to LCDd process Operation timed out (60)
top -S for LCDd
PID.....USERNAME.THR.PRI.NICE.SIZE...RES....STATE.TIME.WCPU.COMMAND 27586.nobody......1.....74..r30...3368K.1488K.RUN.....0:00.0.00%.LCDd
-
I turned LCDd on overnight and left my SSH sessions active running top -SH
This morning I have an unresponsive machine except the SSH sessions are still active. (Also, everyone was asleep so there was no usage from LAN to WAN.)I killed the LCDd process after capturing the information below and load drops to less than .10 but I cannot restart it. If I use the webif it will generate the previously mentioned log entries and the same occurs starting it from the CLI.
top -SH
last pid: 31188; load averages: 7.99, 7.75, 6.91 up 0+17:13:06 08:55:32 92 processes: 11 running, 68 sleeping, 1 zombie, 12 waiting CPU: 18.9% user, 0.0% nice, 80.6% system, 0.5% interrupt, 0.0% idle Mem: 62M Active, 15M Inact, 39M Wired, 34M Buf, 118M Free Swap: 512M Total, 512M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 34608 nobody 74 r30 3368K 1488K RUN 77:11 100.00% LCDd 10 root 171 ki31 0K 8K RUN 797:50 0.00% idle 11 root -28 - 0K 96K WAIT 1:57 0.00% {swi5: +} 45412 root 44 0 6588K 4776K kqread 0:56 0.00% lighttpd 11 root -32 - 0K 96K WAIT 0:50 0.00% {swi4: clock} 0 root -16 0 0K 48K sched 0:44 0.00% {swapper} 34700 root 76 0 33092K 16244K RUN 0:31 0.00% php 45906 root 44 0 3712K 1960K RUN 0:31 0.00% top 32600 root 76 0 33092K 16136K RUN 0:31 0.00% php 40548 root 76 20 3656K 1492K piperd 0:14 0.00% sh 13 root -16 - 0K 8K - 0:13 0.00% yarrow 24126 root 44 0 4948K 2516K select 0:12 0.00% syslogd 19804 root 44 0 7992K 3604K select 0:11 0.00% sshd 19 root 44 - 0K 8K syncer 0:07 0.00% syncer 35740 root 48 0 37188K 18800K piperd 0:07 0.00% php 21686 root 44 0 3316K 924K piperd 0:06 0.00% logger
System.log tail
tail /var/log/system.log Jan 23 02:35:21 pfsense dnsmasq[6805]: possible DNS-rebind attack detected: appsforbb.com Jan 23 02:58:07 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:26:29 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 03:58:37 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 05:34:21 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 06:34:48 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:35:19 pfsense dnsmasq[6805]: read /etc/hosts - 31 addresses Jan 23 07:59:59 pfsense dhclient: RENEW Jan 23 07:59:59 pfsense dhclient: Creating resolv.conf
-
Further confirmation of a problem. Thanks.
By the way pfSense uses circular logs so you have to use clog to tail them.
See: http://doc.pfsense.org/index.php/LOGS:Why_can%27t_I_view_view_log_files_with_cat/grep/etc%3F%28clog%29Steve