Bandwidthd issues?
-
And one more thing … stopping bandwidthd also kills ntopng (not just the GUI / php-fpm). Very odd ... :(
-
Hi,
I have a change I want to try (locally), but it seems that files inside /usr/local/etc/rc.d get recreated on boot - and I admit, I can't find the source file (in text format at least … ;)). If anyone has any pointers I'd appreciate it, just trying to debug.
Thanks!
-
Hi,
Hoping someone else is smarter than me here (that wouldn't be difficult … :(). I want to change /usr/local/etc/rc.d/bandwidthd.sh as follows,
Current:
rc_start() {
cd /usr/pbi/bandwidthd-amd64/local/bandwidthd
LD_LIBRARY_PATH=/usr/pbi/bandwidthd-amd64/local/lib /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
cd -
}Updated:
rc_start() {
cd /usr/pbi/bandwidthd-amd64/local/bandwidthd
if [ -z "ps auxw | grep "[/]usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd"|awk '{print $2}'
" ];then
LD_LIBRARY_PATH=/usr/pbi/bandwidthd-amd64/local/lib /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
fi
cd -
}This is just to avoid multiple copies of bandwidthd from being started (as I see all the time). But I can't figure out how /usr/local/etc/rc.d/bandwidthd.sh is getting generated on boot.
Help?!?!
Thanks very much!
-
Take a look at /usr/local/pkg/bandwidthd.inc; it writes both bandwidthd.sh and bandwidthd.conf
-
AFAIK it is normal to have 4 bandwidthd processes:
ps aux | grep bandwidthd root 41074 0.0 2.4 15748 5576 - S 9:41PM 0:00.02 /var/bandwidthd/bandwidthd root 41237 0.0 2.4 15748 5588 - S 9:41PM 0:00.01 /var/bandwidthd/bandwidthd root 41425 0.0 2.4 15748 5576 - S 9:41PM 0:00.01 /var/bandwidthd/bandwidthd root 41449 0.0 2.4 15748 5576 - S 9:41PM 0:00.01 /var/bandwidthd/bandwidthd root 44012 0.0 0.9 10396 1952 1 S+ 9:42PM 0:00.01 grep bandwidthd
I think they are related to the recording of daily, weekly, monthly and yearly data/graphs. Each updates data/graphs at different intervals.
-
Hmmm … at least when logging to an external database (PostgreSQL) this causes problems -> I have confirmed multiple entries for the same points in time, and the daily totals in PostgreSQL are 2x what they should be (due to 2 processes running).
Thoughts?
Thanks!
-
Could it be this is why we're seeing different results (working / not working)? I think you're letting bandwidthd generate "local" info, but I'm logging to an external database? Just thinking out loud, trying to figure it out.
It is also interesting that the path to your bandwidth process is different - this looks to be processing the data (as it's inside var, right?), but mine is the rc.d service / daemon? I'm just trying to stop more than one of those existing, as I am getting double entries in the database (not a good thing).
Thoughts?
Thanks very much!
-
Hmm, I have 8 running:
[2.2-RC][root@pfsense.localdomain]/root: ps axfw | grep bandwidthd
93615 - S 0:05.36 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
93674 - S 0:04.71 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
93823 - S 0:04.30 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
94051 - S 0:04.26 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
94636 - S 0:05.40 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
94844 - S 0:04.75 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
95011 - S 0:04.31 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd
95137 - S 0:04.26 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthdMine is a full 64bit install, with both "generate CDF" and "recover CDF" enabled, but no PostgreSQL logging enabled. Bandwidthd runs without issues for me for the past few months, but I haven't tried to reconcile traffic reported by bandwidthd to that reported by the rrd graphs in pfSense. Bandwidthd would be more helpful for me if it could track IPv6 traffic, so I don't use it much.
The nano and full pfSense installs are treated differently by bandwidthd: check /usr/local/pkg/bandwidthd.inc starting around line 302. The nano config for the $rc['start'] stanza runs from lines 315 to 348, while the stanza for full install is lines 351-353. So, I think your different command lines are expected (maybe not bug-free :) but expected).
-
Thanks for the awesome pointer! That does help. I admit, I'm not sure how these packages and files fit together - stupidity on my part!
I modified /usr/local/pkg/bandwidthd.inc (locally), and I now avoid the double processes … so one of my problems is resolved. But this also helped me to see the conditions to cause php-fpm to die - more below,
If I go in to /usr/local/etc/rc.d, I can start and stop bandwidthd (and with my change, I only get one copy of the process now). But I also found the conditions to cause php-fpm to die. If I do a Save from the GUI under bandwidthd (which updates /usr/local/etc/rc.d/bandwidth.sh!), then after this if I run bandwidthd.sh, telling it to stop -> php-fpm dies. If I restart php-fpm (manually) ... I can start or stop bandwidth as much as I want, php-fpm never dies again. Only after another Save (and regeneration of bandwidth.sh), then stopping bandwidthd (the first time) kills php-fpm. I also tried this by just manually stopping bandwidthd, by executing /usr/bin/killall bandwidthd ... and this definitely kills php-fpm. If I restart php-fpm, I can't kill php-fpm again (with starts or stops of bandwidthd) ... until I do a Save from the GUI again, then stopping bandwidthd kills php-fpm.
Does this make sense? It seems odd to me, but it is very repeatable.
Thoughts?
Thanks again for the help!
-
BTW (and here comes my stupid, uneducated question … :() - how can doing a killall to bandwidthd kill other processes (php-fpm, but also ntopng)? It really does seem to kill them - but I can't understand why. Figuring this out really is the root cause.
Thanks!!!
-
My home system had 8 bandwidthd processes, for some unknown reason - I guess this is one of your problems with it starting twice under some conditions.
I did the killall command by hand from the command line to see if that would also break php-fpm, and put "-v" so it would tell me what it thinks it killed:[2.2-RC][root@testoffice-rt-01.xxx]/usr/local/etc/rc.d: ps aux | grep bandwidthd root 17460 0.0 2.7 15748 6072 - S 8:29AM 0:00.45 /var/bandwidthd/bandwidthd root 17871 0.0 2.6 15748 6004 - S 8:29AM 0:00.33 /var/bandwidthd/bandwidthd root 18178 0.0 2.5 15748 5804 - S 8:29AM 0:00.10 /var/bandwidthd/bandwidthd root 18334 0.0 2.5 15748 5808 - S 8:29AM 0:00.05 /var/bandwidthd/bandwidthd root 18587 0.0 2.7 15748 6072 - S 8:29AM 0:00.45 /var/bandwidthd/bandwidthd root 18876 0.0 2.6 15748 5988 - S 8:29AM 0:00.33 /var/bandwidthd/bandwidthd root 18962 0.0 2.5 15748 5804 - S 8:29AM 0:00.10 /var/bandwidthd/bandwidthd root 19024 0.0 2.5 15748 5808 - S 8:29AM 0:00.06 /var/bandwidthd/bandwidthd root 77011 0.0 0.9 10396 1960 0 S+ 8:48AM 0:00.01 grep bandwidthd [2.2-RC][root@testoffice-rt-01.xxx]/usr/local/etc/rc.d: /usr/bin/killall -v bandwidthd kill -TERM 19024 kill -TERM 18962 kill -TERM 18876 kill -TERM 18587 kill -TERM 18334 kill -TERM 18178 kill -TERM 17871 kill -TERM 17460
All worked as expected, and my php-fpm and webGUI still works.
Then I did a few save on the Bandwidthd webGUI page. No problem there either, 4 old processes go away, 4 new ones start.
This system is using local bandwidthd data. I will try in a while with "Log data to a PostgreSQL database" option. -
(which updates /usr/local/etc/rc.d/bandwidth.sh!)
Various scripts an conf files in pfSense are generated from the GUI and startup code, like this one. Once you discover exactly what needs to be changed in the script, then we can change the PHP code to generate the script correctly.
-
Thanks! And you're right, just need to figure out the dependencies - trying to follow the bread crumbs … ;)
This may be nano vs full, local vs. Postgre related - I admit, not quite sure. I did try killall, with -v like you noted. Here is what I get ...
- No Save done first (watching pfp-fpm, ntopng, bandwidthd),
[2.2-RC][root@pfSense.home]/root: ps -aux | grep php-fpm
root 11169 0.0 1.3 236688 51316 - I 7:53AM 0:00.05 php-fpm: pool lighty (php-fpm)
root 45513 0.0 0.9 228364 34508 - Ss 1:29PM 0:00.47 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)[2.2-RC][root@pfSense.home]/root: ps -aux | grep ntopng
root 32603 0.1 1.5 183216 55908 - Ss 4:26PM 3:44.38 /usr/local/bin/ntopng -s -e -i bge0 –dns-mode 1 --local-networks 192.168
root 29103 0.0 0.1 24072 4844 - I 4:26PM 0:09.84 redis-server: /usr/pbi/ntopng-amd64/local/bin/redis-server *:6379 (redis-[2.2-RC][root@pfSense.home]/root: ps -aux | grep bandwidthd
root 10913 0.0 0.2 55728 8844 0 S 1:38PM 0:01.81 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd=> killall -v bandwidthd,
[2.2-RC][root@pfSense.home]/root: killall -v bandwidthd
kill -TERM 10913Result: Only bandwidthd seems to be killed (checked by command above again, all running but bandwidthd).
- Save done first (watching pfp-fpm, ntopng, bandwidthd),
[2.2-RC][root@pfSense.home]/root: ps -aux | grep bandwidthd
root 55458 0.0 0.2 55728 8312 - S 8:00AM 0:00.00 /usr/pbi/bandwidthd-amd64/local/bandwidthd/bandwidthd[2.2-RC][root@pfSense.home]/root: killall -v bandwidthd
kill -TERM 55458But, see the following …
[2.2-RC][root@pfSense.home]/root: ps -aux | grep php-fpm
root 78190 0.0 0.1 18900 2400 1 S+ 8:01AM 0:00.00 grep php-fpm[2.2-RC][root@pfSense.home]/root: ps -aux | grep ntopng
root 93951 0.0 0.1 18900 2400 1 S+ 8:01AM 0:00.00 grep ntopng[2.2-RC][root@pfSense.home]/root: ps -aux | grep bandwidthd
root 94123 0.0 0.1 18900 2408 1 S+ 8:01AM 0:00.00 grep bandwidthdResult: All three killed, php-fpm, ntopng and bandwidthd … and perhaps others, the GUI is obvious (it's down), and I just happened to stumble on to ntopng ... :(.
Yell if there are other things I should try. Thanks!
-
Just and FYI, but I had an issue yesterday where I saved OpenVPN settings to restart the server (with a settings change) … and it killed the GUI. So this may be a bit deeper than just Bandwidthd.
-
Hi,
OK, I think this problem goes deeper than bandwidthd … :(. I think it's a more generic issue, and bandwidthd is just how I stumbled on to it - here is why I say this (feel free to tell me I'm full of it!),
I was getting OpenVPN set up on my pfSense box, made one setting change, and hit save -> GUI dead again! And several other services were killed in the process ... ntopng, bandwidthd, and I also noticed that OpenVPN didn't come up. I left my client in a loop, trying to connect to OpenVPN ... no luck, until I restarted php-fpm -> and this also brought OpenVPN back up!
So there seems to be some sort of interaction here between these services / settings (and a bit bigger issue I fear). So I still have the 2x Bandwidthd issue (that I have a patch for myself that works), but this other issue with php-fpm and other services.
Thoughts?
Thanks!
-
I also get '503 Service not available' and no ability to access the webgui on a new install of 2.2RC amd64 nano built on Jan 7, 2015 after adding bandwidthd. No other packages added. Only running traffic shaper on lan. Nothing else special.
The pfsense box is a thousand miles away and appears it is still working otherwise. reggie14 posted earlier in page 2 of this thread that he restarted his pfsense box and the system powered back on with bandwidthd off and the system came back online working normal. Brrm posted that after reboot everything seemed to be working.
Does anyone else have any experience with rebooting pfsense with this bandwidthd issue and being able to get back into a functioning system? Or has anyone experienced worse problems after rebooting?
I'm trying to assess the probability and risk of remotely rebooting pfsense and being able to get back into the webgui of a working system or if pfsense regresses with more issues after a reboot.
Thanks
-
Well you can remove the broken package from pfSense Developer Shell (option 12), IIRC. Then restart webConfigurator (or possibly PHP-FPM if things are really messed up).
-
Hi,
Not sure this is related to just this package. Was making some OpenVPN settings changes yesterday - Saving there also broken php-fpm (confirmed several times). I think this is a bit deeper issue … :(.
Thanks!
-
It's definitely specific to just this package. Changing OpenVPN will restart packages, which is probably why that'd trigger it. I'm still not sure how or why it triggers any problem along those lines, haven't been able to replicate that and not sure why it seems to be so easy for some.
-
Completely agreed, and I understand your pain. Can't fix a problem you can't duplicate!
Is there any way for me to try to "monitor" what is happening when I do this? Not sure if there is a way to generically turn up logging levels, to try to debug it. Willing to do what I can to help, but I can't figure it out either … :(.
Thanks!