LCDProc 0.5.4-dev
-
Well I had no luck with the "test" sdeclcd.so driver. Hit 100% CPU after 10:35 uptime. Interestingly I watched it go from 72% at 10 hours to 100% 35 mins later.
I'm now going to install the same config as stephenw10 as well and try. stephenw would you mind reposting the tarball in this thread for ease of finding?
Hi, according to the Steve's experience, I am trying to "slow down" a bit the panel… for example I changed TitleSpeed to 5 (as was in the configuration of the tarball package). This impacts on the screens that have a "scrolling"... let's see how it goes, I test it for some days...
Ciao,
MicheleMy screens default refresh interval is 5 seconds.
-
Here you go. Remove the .png extension.
We need to test either the new driver compiled against 0.53 or the old driver compiled against 0.55. I'm sure I have one of those here somewhere.
Bah! I have many files all named sdeclcd.so. ::)Steve
-
Which one is this tarball complied against? I have just completed installing the sdeclcd.so and LCDd from your from this tarball, all other files are unchanged from the -DEV v. 0.9 (lcdproc-0.5.5) package. Should know something in about 10 hours ;D
I will be happy to coordinate testing with everyone - Just let me know what version you're using and I will run another configuration…
-
Anyone with a compiler setup want to help test this EZIO-100/MTB134 driver? I found it online, but it appears abandoned - I'm not sure if it will work or not. I tried to get it to compile, but clearly pfSense isn't meant to be used for compiling.
Attachments are trailed with .png for attachment rules sake.
-
@tix:
Well I had no luck with the "test" sdeclcd.so driver. Hit 100% CPU after 10:35 uptime. Interestingly I watched it go from 72% at 10 hours to 100% 35 mins later.
…
My screens default refresh interval is 5 seconds.Guys, I have 2 servers running pfSense, one with refresh 1 second, and in this I have NO PROBLEMS, one with refresh 5 seconds and I get the problem. The servers use the same panel (sureelect).
The client goes to "sleep" for the seconds set in the refresh multiplied for the number of screens available (I thought this is the best way to not to waste resources, since every screen is shown every that seconds).
Can you please ALL try a refresh of 1 second??
Thanks,
Michele -
I think we are making progress for the sdeclcd driver. I installed the sdeclcd.so and LCDd versions provided by Steve and I'm happy to report that after 13 hours of uptime I still have a working LCD display and a responsive machine.
This may be short-lived as I am seeing the usage of LCDd climb - not as quickly as with the newer versions: after 13 hours, LCDd has ran for 10:15 and showing 0% CPU.
I'm going to stay with the current configuration until I reach 24 hours uptime or LCDd hits 100% before I change to a refresh interval of 1 sec as suggested by Michele.
I will post the status later when I get back home….. but it's looking better ;D
-
Here's something perhaps of note:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(11): clog /var/log/system.log | grep huh Jan 26 04:24:24 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 26 15:41:46 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 03:45:35 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:01:05 pfsense LCDd: error: huh? Too much data received... quiet down!
Because I was able to predict when it would happen I could watch top and found that even though the logs show the event taking only 10 seoconds in fact LCDd is stuck at 100% for 15 minutes before that.
That is with LCDd 0.53, old sdec driver, 0.8 package code and refresh set to 5 seconds.
Testing now as above but refresh set to 2 seconds. Can't set to 1 second with 0.53:
Jan 27 15:09:39 LCDd: Waittime should be at least 2 (seconds). Set to 2 seconds.
Steve
@tix: Are you seeing errors in the logs?
-
Steve,
looking my secondary machine, I have the feeling that the problems are related to the "scrolling" feature of the panel.In fact I see sometime frozen screens where there is the scrolling… I will keep an eye on it and try to see if it is the problem...
Ciao,
Michele -
Steve I get the same log entries but they occur at the same time yet the display continues to work unlike with the newer code.
Jan 27 05:45:18 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 05:45:18 pfsense LCDd: Client on socket 11 disconnected Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:43 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 05:45:44 pfsense LCDd: Connect from host 127.0.0.1:8170 on socket 11
What's interesting to me is that this is right at the 10 hour uptime mark where the newer versions stopped working. I wonder if there is something time related causing this as anything newer than 0.53 version of LCDd breaks on my system after 10 hours?? I wouldn't think so but it's strange it was always around 10 hours before reverting…. weird...
-
Interesting that your box (X700?) takes a lot longer than 10 seconds to sort itself out in the log.
The 0.53 code just gives up and errors out where as newer versions include code to handle the extra data so they keep trying.Steve
-
@tix:
Well I had no luck with the "test" sdeclcd.so driver. Hit 100% CPU after 10:35 uptime. Interestingly I watched it go from 72% at 10 hours to 100% 35 mins later.
Ok, so leaving the process out of "realtime round robin", and leaving it with default priority had no effect.
Long shot: When running at 100%, try and "kill" LCDd with signal 6 (kill -6 <pid of="" lcdd="">). This should give a memory image of the process (core dump). If you can make the core file available, I can give a try to loading it up in the debugger and see where the execution ended. The trick is that this needs to be a version of LCDd I have the code for, like V0.5.5, so the debugger can match the binary with the source. I have never done this, so this is will probably lead nowhere…</pid>
-
Could try compiling LCDd with the debug option enabled to get far more logging output.
Steve
-
Could try compiling LCDd with the debug option enabled to get far more logging output.
MyCommand = YourWish;
-
I will try using kill -6 tomorrow, for now I'm enjoying everything working on my x700 for now. ;D
I'm still hung up on the idea of some kind of time issue. I see a problem every 10 hours. Here is the log from this morning and after running during the day today:
Jan 27 05:45:18 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 05:45:18 pfsense LCDd: Client on socket 11 disconnected Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:18 pfsense LCDd: sock_send: socket write error Jan 27 05:45:43 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 05:45:44 pfsense LCDd: Connect from host 127.0.0.1:8170 on socket 11 ... Jan 27 15:48:23 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:48:23 pfsense LCDd: Client on socket 11 disconnected Jan 27 15:48:23 pfsense LCDd: sock_send: socket write error Jan 27 15:48:49 pfsense php: lcdproc: Connection to LCDd process lost () Jan 27 15:48:50 pfsense LCDd: Connect from host 127.0.0.1:8576 on socket 11
10 hours apart and the 05:45 was 10 hours of uptime!
As it stands, everything is working great (excluding the log entries) on v0.53 kernel module and v0.53 LCDd. The display continues to work with the default refresh of 5 secs and the webif and ssh connections are responsive. In fact, I would happily accept this level of functionality permanently. :)
But in the interest of perfection, I will apply the v0.9 package kernel mod and LCDd and when it stops responding on the webif after what I believe will be 10 hours of uptime, will kill it with the -6 option (instead of 15). The next step for me after that will be to use the debug-enabled LCDd and wait.
-
A very interesting result:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(2): clog /var/log/system.log | grep huh Jan 26 04:24:24 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 26 15:41:46 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 03:45:35 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 15:01:05 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 17:13:45 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 19:16:44 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 21:18:07 pfsense LCDd: error: huh? Too much data received... quiet down! Jan 27 23:23:00 pfsense LCDd: error: huh? Too much data received... quiet down!
I changed the refresh time from 5 seconds to 1 second at 15.09. (1 second was seemingly auto changed to 2)
The logs show that gap between errors reduced from ~11 hours to ~ 2 hours.
This implies that the problem lies in the total data or number of screen refreshes sent not the actual time or uptime.Steve
-
Steve,
can you please try this: Add only screens that do not have any scrolling. When I stopped to give "scrolling screens" the problem look solved on my machine.
For "scroll" I mean when the text is bigger than the width of your screen, so it scrolls left/right.Thanks,
Michele -
Long shot: When running at 100%, try and "kill" LCDd with signal 6 (kill -6 <pid of="" lcdd="">). This should give a memory image of the process (core dump). If you can make the core file available, I can give a try to loading it up in the debugger and see where the execution ended. The trick is that this needs to be a version of LCDd I have the code for, like V0.5.5, so the debugger can match the binary with the source. I have never done this, so this is will probably lead nowhere…</pid>
fmertz - LCDd hit 100% after 10 hours as suspected. I kill LCDd with "kill -6 <pid>" but it did not leave a core file, or not one I can find. I assume it would be named core–-- or similar and a find on the filesystem doesn't locate any corefiles. I'm I just looking in the wrong place?
My next step is to test with the debug-enabled LCDd, leaving the rest of v0.9 untouched.
A very interesting result:
I changed the refresh time from 5 seconds to 1 second at 15.09. (1 second was seemingly auto changed to 2)
The logs show that gap between errors reduced from ~11 hours to ~ 2 hours.
This implies that the problem lies in the total data or number of screen refreshes sent not the actual time or uptime.Steve
By my calculations, you are reaching a problem at (7200[2hrs in secs]/2updates=) 3600 'updates' and I'm reaching it in (36000[10hrs in secs]/5updates=) 7200 'updates'. Which is interesting as well as 3600 is half of 7200.</pid>
-
I ran into that twice installing the pfSense LCDproc 5.5 Dev v0.8 package. So I had to manually install the package file after installing the pfSense package because no LCDproc 5.5 core files just the pfSene php front end.
So first install pfSense LCDproc 5.5 Dev package and then next do the following.
Here is the link to the core files to install go to console and do this:
pkg_add -r http://files.pfsense.org/packages/8/All/lcdproc-0.5.5.tbz
-Joe Cowboy
-
I ran into that twice installing the pfSense LCDproc 5.5 Dev package. So I had to manually install the package file after installing the pfSense package because no LCDproc 5.5 core files just the pfSene php front end.
So first install pfSense LCDproc 5.5 Dev package and then next do the following.
Here is the link to the core files to install go to console and do this:
pkg_add -r http://files.pfsense.org/packages/8/All/lcdproc-0.5.5.tbz
-Joe Cowboy
what ver of pfsense are your running? i'm using 2.1-dev and have to manually install binaries because the box is trying to install pbi instead… gets annoying but i've gotten used to it..
-
I am running 2.1-dev – LCDProc 0.5.5-dev v0.8 I didn't realize he had just updated to v0.9..... So, I just did a reinstall and seemed to install correctly this time. Sorry for not posting the version last time and now have v0.9 installed. Unless, something was fixed in one of the last gitsyncs for 2.1-dev??? Thanks for all you hard work...
-Joe Cowboy
-
Steve,
looking my secondary machine, I have the feeling that the problems are related to the "scrolling" feature of the panel.In fact I see sometime frozen screens where there is the scrolling… I will keep an eye on it and try to see if it is the problem...
Ciao,
MicheleHello,
I am running LCDproc 0.5.5 with the package 0.9 and only the "traffic (wan)" screen since 2 days and everything is going fine…Do anyone else has tried to avoid screens that do not scroll with a positive result?
Thanks,
Michele -
None of the screens I have enabled scroll (Uptime, States, Mbuf, & WAN with 5 second refresh) yet the display will still stop responding and the system cannot be connected to after 10 hours. The firewall continues to function normally as near as I can tell other than that - by that I mean, DHCP still works, existing hosts can continue to send/recv traffic and initate new traffic. Just the webif and SSH access no longer can connect due to the high load averages.
I have tried all combinations of sdecld.so and LCDd (v0.53, v0.55, debug-enabled LCDd) and having success ONLY on v0.53 of both the module and LCDd. Any mix of the various versions with that exception all send load to 100% at the 10 hour uptime mark. All testing was performed with the LCDdproc-dev-v0.9 package files changing only the sdeclcd.so and LCDd files - no other file was changed or modified.
I have over 48 hours of uptime without any visible problem running v0.53. This version will generate the following log entries every 10 hours but load stays less than 0.20 and the display works.
Jan 30 19:16:18 LCDd: error: huh? Too much data received... quiet down! Jan 30 19:16:18 LCDd: Client on socket 11 disconnected Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:18 LCDd: sock_send: socket write error Jan 30 19:16:43 php: lcdproc: Connection to LCDd process lost () Jan 30 19:16:45 LCDd: Connect from host 127.0.0.1:5248 on socket 11
I'm not sure what was changed between 0.53 and 0.55 and would be willing to test 0.54 if someone can provide those files.
-
@tix:
None of the screens I have enabled scroll (Uptime, States, Mbuf, & WAN with 5 second refresh) yet the display will still stop responding and the system cannot be connected to after 10 hours.
Hi,
in my case the the states screen definitely scrolls… I have a 20x4 LCD display, max states: 500'000. When the states are more than 10'000 the screen scrolls.
Can you pls tell me what is your display size and what is your max states setting?Thanks,
Michele -
Hi,
in my case the the states screen definitely scrolls… I have a 20x4 LCD display, max states: 500'000. When the states are more than 10'000 the screen scrolls.
Can you pls tell me what is your display size and what is your max states setting?Thanks,
MicheleThe display is the 2x20 standard included on the Firebox X series (X700). My states are only 50000, so it doesn't scroll.
My display finally stopped working on v0.53 but it took 50 hours or the 5th 10-hour interval. Interestingly, the LCDd just died but the client continued to function and the box is as responsive as normal. The log shows the 'normal for me on this version' entries except for the missing "reconnect' entry.
Jan 31 05:19:29 LCDd: error: huh? Too much data received... quiet down! Jan 31 05:19:29 LCDd: Client on socket 11 disconnected Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:29 LCDd: sock_send: socket write error Jan 31 05:19:54 php: lcdproc: Connection to LCDd process lost ()
-
Do anyone else has tried to avoid screens that do not scroll with a positive result?
Yes. Running interface traffic with WAN selected as the only screen has eliminated the 100% CPU problem (after 15hours testing at 1sec refresh).
I am also running the LCDd the fmertz compiled with debugging enabled but it gives me only a tiny amount of extra information. This is with logging set to level 5. I think I've not under stood how that's supposed to work. Time to re-read the developer guide!
I also tried running LCDd with different nice levels in order to be able access the box during the problem event but it made no difference. It seems like you should be able to set it to Nice 20 and it will be very low priority but that doesn't happen.
In fact if you look at the output of top it runs at 'r30' which I cannot find any reference to anywhere. :-\last pid: 27372; load averages: 0.02, 0.08, 0.06 up 0+23:30:46 12:43:59 48 processes: 1 running, 47 sleeping CPU: 0.0% user, 0.0% nice, 0.4% system, 0.4% interrupt, 99.3% idle Mem: 55M Active, 16M Inact, 55M Wired, 1060K Cache, 49M Buf, 359M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 44832 root 1 76 20 3656K 1396K wait 1:01 0.00% sh 4015 root 1 45 0 47452K 17464K nanslp 0:57 0.00% php 2024 nobody 1 74 r30 3368K 1500K nanslp 0:39 0.00% LCDd
Steve
-
Yes. Running interface traffic with WAN selected as the only screen has eliminated the 100% CPU problem (after 15hours testing at 1sec refresh).
I also tried running LCDd with different nice levels
Maybe another test: run the normal lcdproc client provided by the project. FWIW, I run lcdproc on 2 hosts (a NAS and the router itself), and LCDd on the router itself, and it seems to run just fine for weeks. lcdproc has a bunch of screens with scrolling, vbars, hbars, icons, big nums… This is Linux, but the same code. This could help isolate more info about the problem.
If you need it: https://github.com/downloads/fmertz/sdeclcd/lcdproc
For nice, the driver code sets the process priority to "realtime round robin" as part of the initialization for the portable "wait" routines. Maybe this is the "r" you are seeing. The call to set the priority was removed in the driver I posted earlier.
-
Folks,
I would like to kick off the effort to bring the LED support into the driver again. We have some support already, but only for the box I own (the X-Core-e). I was hoping folks with the other models could run a command to help me identify the EXACT ICH we need to code for. Best I can figure, this command is already in pfSense and should be run as root:
pciconf -r pci0:31:0 0:256
This command reads the PCI configuration area (256 bytes) for the Low Pin count (LPC) device. The LPC device does GPIO, and can control the LEDs. Based on the exact device id, I can look up the spec, and find out the offset for GPIO base register, etc.
I would like the output of the command, for the X-Core and X-Peak models. The key is the first 8 digits, the last 4 being 8086, Intel's vendor ID. Thanks.
-
X-Peak:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(7): pciconf -r pci0:31:0 0:256 25a18086 0280000f 06010002 00800000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000401 00000000 00000000 00000000 00000000 00000000 00000481 00000010 050a0c0b 000000d0 09808080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 000054d5 00000000 00000000 00000000 00000220 00000000 0000000d 00000300 00000000 00000000 05415555 00000000 00000000 00000000 00000000 00000000 00002186 00000f02 00000004 00000000 c0000000 34040000 00112233 45670291 00e40000 00000000 00020f66 00010000 ffffffff
I'll have to try your driver without priority setting and see what happens.
Steve
Edit: You were correct. The driver with priority removed can be set to nice 20. I am testing now with several scrolling screens to see if I can still access the box.
-
Hello everybody,
I am making some tests to improve the stability of the package from the "client side". Until now from what I read all the tries have been made on the binary package and the driver, I think that maybe also a little help from the client can solve some problems.I am testing this changes on my boxes and I find no problems, so I would like to share this changes with you.
The changes are:
-
Added a 20ms delay between each command sent from the client to LCDproc.
-
Better managed errors. Now the client resets the error counter every successful communication session with LCDproc (before was a global counter). The error counter is managed inside the client (lcdproc_client.php).
-
Because of the above change, now the "client script" (lcdclient.sh) do not cycle anymore.
I hope at least some of the problems will be solved… I wait for your feedback. The new version is XXX.0.9.1.
Thanks,
Michele -
-
… and we are with 0.9.2.
I didn't realize that there were some clients pending, that with the new error counter management could work behind. So now all the lcdproc_client.php processes are killed during the package resync.
Sorry for the people that was already upgrading do 0.9.1...
-
Looks good. :)
Now the error counter is in the php script (much better) why bother having lcdclient.sh at all?
Just call the php client from the rc file directly.Also I have been running fmertz's driver he removed real time priority from with Nice level 20. Doing this allows the box to remain responsive even when some error event occours. Since the LCD is really of no importance compared to the firewall functions it seems better to run it at minimum priority. E.g.
$start .= "\t/usr/bin/nice -20 /usr/local/sbin/LCDd -c ". LCDPROC_CONFIG ."\n"; $start .= "\t/usr/bin/nice -20 ". LCDPROC_CLIENT ." &\n";
Steve
-
Looks good. :)
Thanks!
Now the error counter is in the php script (much better) why bother having lcdclient.sh at all?
Just call the php client from the rc file directly.makes sense…
Also I have been running fmertz's driver he removed real time priority from with Nice level 20. Doing this allows the box to remain responsive even when some error event occours. Since the LCD is really of no importance compared to the firewall functions it seems better to run it at minimum priority. E.g.
$start .= "\t/usr/bin/nice -20 /usr/local/sbin/LCDd -c ". LCDPROC_CONFIG ."\n"; $start .= "\t/usr/bin/nice -20 ". LCDPROC_CLIENT ." &\n";
also this makes sense. Just we have to consider if this influences negatively the client or LCDd to refresh the data or communicate with the panel, but it's just a matter of tuning…
-
Folks, the SDEC driver for Fireboxes is now officially part of the upstream lcdproc project! I received confirmation this morning that my code submission was committed. I guess it should come to pfSense as part of the package when the project leaders decide the current development branch is stable enough for release.
Now, on to support for the LEDs…
-
That's great! So the next version of LCDproc will have the SDEC driver… fmerz, consider that the compiling option in pfSense is "WITH_USB=true" only, I don't know if with this option the SDEC driver will be compiled...
BTW, how's going with the new package version? I found out it's more stable, but anyway still give some problems... what do you think?
Thanks,
Michele -
Excellent work! ;D
Steve
-
X-Peak:
[2.0.1-RELEASE][root@pfsense.fire.box]/root(7): pciconf -r pci0:31:0 0:256 25a18086
Device ID 25a1 is a 6300ESB, data sheet: http://ark.intel.com/products/27663/Intel-6300ESB-IO-Controller
For the X-Peak, the LEDs are on GPIO pins 40 and 41. This is part of the second set of pins, so there no blink support in hardware. We already knew this…
Anyone with an X-Core?
-
I would like the output of the command, for the X-Core and X-Peak models. The key is the first 8 digits, the last 4 being 8086, Intel's vendor ID. Thanks.
X-Core (x700)
/root(1): pciconf -r pci0:31:0 0:256 24408086 0280000f 06010005 00800000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004001 00000010 00000000 00000000 00000000 00000000 00004081 00000010 09060b0c 000000d0 0a058003 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00005475 00000000 00000000 00000000 00000200 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00004004 00000000 00000000 00002002 00001f02 00000004 00000000 c0000010 14050000 00112233 45670291 017c000f 00000000 00000f47 00000200 ffffffff
I'm using the WGXepc script and it works flawlessly!
-
@tix:
X-Core (x700)
/root(1): pciconf -r pci0:31:0 0:256 24408086
2440 is an 82801BA, Intel ICH2, datasheet here: http://www.intel.com/content/dam/doc/datasheet/82801ba-i-o-controller-hub-2-82801bam-i-o-controller-hub-2-mobile-datasheet.pdf
Somehow the existing WGXepc code does not seem to line up with the spec…
-
Hi,
how's going with the latest (0.9.2) package?Right now I am working on this:
- I divided / 2 the number of commands sent to LCDproc every cycle. So if before were sent 10 commands, now only 5
- I wrote a little better code for error handling
- I slowed down a bit the scrolling (just a little bit)
- Simplified the script to launch. There's no more lcdproc_client.sh.
Then I will:
4) add and test the "nice" command to the programs running
5) I will add a "top value" for waiting to the next cycle of 10 seconds. So it will be sure that the communication won't timeoutWhat else?
Ciao,
Michele -
Somehow the existing WGXepc code does not seem to line up with the spec…
Hmm, in what way?
There was a problem because the GPIO base is set to a non-standard value on the X-core.
On the other boxes it is at 0x480, which I believe is the standard value where as the X-Core is at 0x4080.Steve