LCDProc 0.5.4-dev
-
I think we are making progress for the sdeclcd driver. I installed the sdeclcd.so and LCDd versions provided by Steve and I'm happy to report that after 13 hours of uptime I still have a working LCD display and a responsive machine.
This may be short-lived as I am seeing the usage of LCDd climb - not as quickly as with the newer versions: after 13 hours, LCDd has ran for 10:15 and showing 0% CPU.
I'm going to stay with the current configuration until I reach 24 hours uptime or LCDd hits 100% before I change to a refresh interval of 1 sec as suggested by Michele.
I will post the status later when I get back home….. but it's looking better ;D
-
Here's something perhaps of note:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(11): clog /var/log/system.log | grep huh Jan 26 04:24:24 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 26 15:41:46 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 03:45:35 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:01:05 pfsense LCDd: error: huh? Too much data received... quiet down!
Because I was able to predict when it would happen I could watch top and found that even though the logs show the event taking only 10 seoconds in fact LCDd is stuck at 100% for 15 minutes before that.
That is with LCDd 0.53, old sdec driver, 0.8 package code and refresh set to 5 seconds.
Testing now as above but refresh set to 2 seconds. Can't set to 1 second with 0.53:
Jan 27 15:09:39 LCDd: Waittime should be at least 2 (seconds). Set to 2 seconds.
Steve
@tix: Are you seeing errors in the logs?
-
Steve,
looking my secondary machine, I have the feeling that the problems are related to the "scrolling" feature of the panel.In fact I see sometime frozen screens where there is the scrolling… I will keep an eye on it and try to see if it is the problem...
Ciao,
Michele -
Steve I get the same log entries but they occur at the same time yet the display continues to work unlike with the newer code.
Jan 27 05:45:18 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 05:45:18 pfsense LCDd: Client on socket 11 disconnected Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:43 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 05:45:44 pfsense LCDd: Connect from host 127.0.0.1:8170 on socket 11
What's interesting to me is that this is right at the 10 hour uptime mark where the newer versions stopped working. I wonder if there is something time related causing this as anything newer than 0.53 version of LCDd breaks on my system after 10 hours?? I wouldn't think so but it's strange it was always around 10 hours before reverting…. weird...
-
Interesting that your box (X700?) takes a lot longer than 10 seconds to sort itself out in the log.
The 0.53 code just gives up and errors out where as newer versions include code to handle the extra data so they keep trying.Steve
-
@tix:
Well I had no luck with the "test" sdeclcd.so driver. Hit 100% CPU after 10:35 uptime. Interestingly I watched it go from 72% at 10 hours to 100% 35 mins later.
Ok, so leaving the process out of "realtime round robin", and leaving it with default priority had no effect.
Long shot: When running at 100%, try and "kill" LCDd with signal 6 (kill -6 <pid of="" lcdd="">). This should give a memory image of the process (core dump). If you can make the core file available, I can give a try to loading it up in the debugger and see where the execution ended. The trick is that this needs to be a version of LCDd I have the code for, like V0.5.5, so the debugger can match the binary with the source. I have never done this, so this is will probably lead nowhere…</pid>
-
Could try compiling LCDd with the debug option enabled to get far more logging output.
Steve
-
Could try compiling LCDd with the debug option enabled to get far more logging output.
MyCommand = YourWish;
-
I will try using kill -6 tomorrow, for now I'm enjoying everything working on my x700 for now. ;D
I'm still hung up on the idea of some kind of time issue. I see a problem every 10 hours. Here is the log from this morning and after running during the day today:
Jan 27 05:45:18 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 05:45:18 pfsense LCDd: Client on socket 11 disconnected Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:43 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 05:45:44 pfsense LCDd: Connect from host 127.0.0.1:8170 on socket 11 ... Jan 27 15:48:23 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:48:23 pfsense LCDd: Client on socket 11 disconnected Jan 27 15:48:23 pfsense LCDd: sock_send: socket write error Jan 27 15:48:49 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 15:48:50 pfsense LCDd: Connect from host 127.0.0.1:8576 on socket 11
10 hours apart and the 05:45 was 10 hours of uptime!
As it stands, everything is working great (excluding the log entries) on v0.53 kernel module and v0.53 LCDd. The display continues to work with the default refresh of 5 secs and the webif and ssh connections are responsive. In fact, I would happily accept this level of functionality permanently. :)
But in the interest of perfection, I will apply the v0.9 package kernel mod and LCDd and when it stops responding on the webif after what I believe will be 10 hours of uptime, will kill it with the -6 option (instead of 15). The next step for me after that will be to use the debug-enabled LCDd and wait.
-
A very interesting result:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(2): clog /var/log/system.log | grep huh Jan 26 04:24:24 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 26 15:41:46 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 03:45:35 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:01:05 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 17:13:45 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 19:16:44 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 21:18:07 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 23:23:00 pfsense LCDd: error: huh? Too much data received... quiet down!
I changed the refresh time from 5 seconds to 1 second at 15.09. (1 second was seemingly auto changed to 2)
The logs show that gap between errors reduced from ~11 hours to ~ 2 hours.
This implies that the problem lies in the total data or number of screen refreshes sent not the actual time or uptime.Steve
-
Steve,
can you please try this: Add only screens that do not have any scrolling. When I stopped to give "scrolling screens" the problem look solved on my machine.
For "scroll" I mean when the text is bigger than the width of your screen, so it scrolls left/right.Thanks,
Michele -
Long shot: When running at 100%, try and "kill" LCDd with signal 6 (kill -6 <pid of="" lcdd="">). This should give a memory image of the process (core dump). If you can make the core file available, I can give a try to loading it up in the debugger and see where the execution ended. The trick is that this needs to be a version of LCDd I have the code for, like V0.5.5, so the debugger can match the binary with the source. I have never done this, so this is will probably lead nowhere…</pid>
fmertz - LCDd hit 100% after 10 hours as suspected. I kill LCDd with "kill -6 <pid>" but it did not leave a core file, or not one I can find. I assume it would be named core–-- or similar and a find on the filesystem doesn't locate any corefiles. I'm I just looking in the wrong place?
My next step is to test with the debug-enabled LCDd, leaving the rest of v0.9 untouched.
A very interesting result:
I changed the refresh time from 5 seconds to 1 second at 15.09. (1 second was seemingly auto changed to 2)
The logs show that gap between errors reduced from ~11 hours to ~ 2 hours.
This implies that the problem lies in the total data or number of screen refreshes sent not the actual time or uptime.Steve
By my calculations, you are reaching a problem at (7200[2hrs in secs]/2updates=) 3600 'updates' and I'm reaching it in (36000[10hrs in secs]/5updates=) 7200 'updates'. Which is interesting as well as 3600 is half of 7200.</pid>
-
I ran into that twice installing the pfSense LCDproc 5.5 Dev v0.8 package. So I had to manually install the package file after installing the pfSense package because no LCDproc 5.5 core files just the pfSene php front end.
So first install pfSense LCDproc 5.5 Dev package and then next do the following.
Here is the link to the core files to install go to console and do this:
pkg_add -r http://files.pfsense.org/packages/8/All/lcdproc-0.5.5.tbz
-Joe Cowboy
-
I ran into that twice installing the pfSense LCDproc 5.5 Dev package. So I had to manually install the package file after installing the pfSense package because no LCDproc 5.5 core files just the pfSene php front end.
So first install pfSense LCDproc 5.5 Dev package and then next do the following.
Here is the link to the core files to install go to console and do this:
pkg_add -r http://files.pfsense.org/packages/8/All/lcdproc-0.5.5.tbz
-Joe Cowboy
what ver of pfsense are your running? i'm using 2.1-dev and have to manually install binaries because the box is trying to install pbi instead… gets annoying but i've gotten used to it..
-
I am running 2.1-dev – LCDProc 0.5.5-dev v0.8 I didn't realize he had just updated to v0.9..... So, I just did a reinstall and seemed to install correctly this time. Sorry for not posting the version last time and now have v0.9 installed. Unless, something was fixed in one of the last gitsyncs for 2.1-dev??? Thanks for all you hard work...
-Joe Cowboy
-
Steve,
looking my secondary machine, I have the feeling that the problems are related to the "scrolling" feature of the panel.In fact I see sometime frozen screens where there is the scrolling… I will keep an eye on it and try to see if it is the problem...
Ciao,
MicheleHello,
I am running LCDproc 0.5.5 with the package 0.9 and only the "traffic (wan)" screen since 2 days and everything is going fine…Do anyone else has tried to avoid screens that do not scroll with a positive result?
Thanks,
Michele -
None of the screens I have enabled scroll (Uptime, States, Mbuf, & WAN with 5 second refresh) yet the display will still stop responding and the system cannot be connected to after 10 hours. The firewall continues to function normally as near as I can tell other than that - by that I mean, DHCP still works, existing hosts can continue to send/recv traffic and initate new traffic. Just the webif and SSH access no longer can connect due to the high load averages.
I have tried all combinations of sdecld.so and LCDd (v0.53, v0.55, debug-enabled LCDd) and having success ONLY on v0.53 of both the module and LCDd. Any mix of the various versions with that exception all send load to 100% at the 10 hour uptime mark. All testing was performed with the LCDdproc-dev-v0.9 package files changing only the sdeclcd.so and LCDd files - no other file was changed or modified.
I have over 48 hours of uptime without any visible problem running v0.53. This version will generate the following log entries every 10 hours but load stays less than 0.20 and the display works.
Jan 30 19:16:18 LCDd: error: huh? Too much data received... quiet down! Jan 30 19:16:18 LCDd: Client on socket 11 disconnected Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:43 php: lcdproc: Connection to LCDd process lost () Jan 30 19:16:45 LCDd: Connect from host 127.0.0.1:5248 on socket 11
I'm not sure what was changed between 0.53 and 0.55 and would be willing to test 0.54 if someone can provide those files.
-
@tix:
None of the screens I have enabled scroll (Uptime, States, Mbuf, & WAN with 5 second refresh) yet the display will still stop responding and the system cannot be connected to after 10 hours.
Hi,
in my case the the states screen definitely scrolls… I have a 20x4 LCD display, max states: 500'000. When the states are more than 10'000 the screen scrolls.
Can you pls tell me what is your display size and what is your max states setting?Thanks,
Michele -
Hi,
in my case the the states screen definitely scrolls… I have a 20x4 LCD display, max states: 500'000. When the states are more than 10'000 the screen scrolls.
Can you pls tell me what is your display size and what is your max states setting?Thanks,
MicheleThe display is the 2x20 standard included on the Firebox X series (X700). My states are only 50000, so it doesn't scroll.
My display finally stopped working on v0.53 but it took 50 hours or the 5th 10-hour interval. Interestingly, the LCDd just died but the client continued to function and the box is as responsive as normal. The log shows the 'normal for me on this version' entries except for the missing "reconnect' entry.
Jan 31 05:19:29 LCDd: error: huh? Too much data received... quiet down! Jan 31 05:19:29 LCDd: Client on socket 11 disconnected Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:54 php: lcdproc: Connection to LCDd process lost ()
-
Do anyone else has tried to avoid screens that do not scroll with a positive result?
Yes. Running interface traffic with WAN selected as the only screen has eliminated the 100% CPU problem (after 15hours testing at 1sec refresh).
I am also running the LCDd the fmertz compiled with debugging enabled but it gives me only a tiny amount of extra information. This is with logging set to level 5. I think I've not under stood how that's supposed to work. Time to re-read the developer guide!
I also tried running LCDd with different nice levels in order to be able access the box during the problem event but it made no difference. It seems like you should be able to set it to Nice 20 and it will be very low priority but that doesn't happen.
In fact if you look at the output of top it runs at 'r30' which I cannot find any reference to anywhere. :-\last pid: 27372; load averages: 0.02, 0.08, 0.06 up 0+23:30:46 12:43:59 48 processes: 1 running, 47 sleeping CPU: 0.0% user, 0.0% nice, 0.4% system, 0.4% interrupt, 99.3% idle Mem: 55M Active, 16M Inact, 55M Wired, 1060K Cache, 49M Buf, 359M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 44832 root 1 76 20 3656K 1396K wait 1:01 0.00% sh 4015 root 1 45 0 47452K 17464K nanslp 0:57 0.00% php 2024 nobody 1 74 r30 3368K 1500K nanslp 0:39 0.00% LCDd
Steve