No Web Interface on Thu May 29 08:48:37 CDT 2014
-
Still no go for web interface on amd64-20140530-1557.
EDIT: I can get the bootup to complete by running:
ps aux | grep -i rc
Then finding the PID of /usr/local/sbin/fcgicli -f /etc/rc.start_packages and running:
kill -9 xxxxx
Where xxxxx is the PID of the process. This doesn't bring up the web interface though.
-
I am running the 30th snapshot on i386 and sshd works for me after running it manually the last time I booted to create the sshd keys. The GUI doesn't work for me though.
I noticed the same /usr/local/sbin/fcgicli -f /etc/rc.start_packages command in the process list apparently stuck. I killed it and then a few more with different rc scripts specified as arguments were launched by minicron I think. Those are stuck now too.
I ran the tracing command truss manually on one of the command lines and it appears to lock up around the time of writting to /var/run/php-fpm.socket. php-fpm is running.
truss /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data
connect(3,{ AF_UNIX "/var/run/php-fpm.socket" },106) = 0 (0x0) __sysctl(0xbfbfe6c4,0x2,0xbfbfe708,0xbfbfe6c0,0x0,0x0) = 0 (0x0) __sysctl(0xbfbfe6c4,0x2,0xbfbfe808,0xbfbfe6c0,0x0,0x0) = 0 (0x0) __sysctl(0xbfbfe6c4,0x2,0xbfbfe908,0xbfbfe6c0,0x0,0x0) = 0 (0x0) __sysctl(0xbfbfe6c4,0x2,0xbfbfea08,0xbfbfe6c0,0x0,0x0) = 0 (0x0) __sysctl(0xbfbfe6c4,0x2,0xbfbfeb08,0xbfbfe6c0,0x0,0x0) = 0 (0x0) madvise(0x28804000,0x1000,0x5,0x281c15f8,0xbfbfe4f4,0x28120ccf) = 0 (0x0) madvise(0x28816000,0x1000,0x5,0x281c15f8,0xbfbfe4f4,0x28120ccf) = 0 (0x0) madvise(0x28818000,0x1000,0x5,0x281c15f8,0xbfbfe56c,0x28120ccf) = 0 (0x0) madvise(0x28803000,0x3000,0x5,0x281c15f8,0xbfbfe57c,0x28120ccf) = 0 (0x0) write(3,"\^A\^A\0\^A\0\b\0\0\0\^A\0\0\0\0"...,263) = 263 (0x107)
It just sits there forever. Any /usr/local/sbin/fcgicli command executed even by hand gets stuck there.
-
-
As soon as I ran
/usr/local/sbin/fcgicli -f /etc/rc.start_packages
The webgui doesn't work anymore. Any attempts to use php-fpm locks up the process writting to the fpm socket again.
It appears something in the command above locks up php-fpm. If I restart php-fpm it works again. I am going to comment out the command above from /etc/rc and see if the firewalls starts up properly. I have a feeling it will. I will just need to start the packages manually after a reboot. This is at home so it isn't a big deal :).
-
Well… It is not specifically startpackages which kills it. It seems to lock up with other fcgicli commands during boot. If I restart php-fpm and then execute the few fcgicli commands in order from /etc/rc one will eventually cause php-fpm to block on writing to it's socket. I will just manually kill php-fpm and restart it after every boot for now. It appears the GUI doesn't lock it up (I didn't test everything though... only viewing some of the pages).
-
I just updated to the 31st snapshot and the problem is still there. I just manually kill php-fpm and restart it per how it is started in /etc/rc
2.2-ALPHA (i386)
built on Sat May 31 10:32:02 CDT 2014
FreeBSD 10.0-STABLE -
The web interface stopped working again when I uninstalled the Patches package. I killed and restarted php-fpm and it started working again.
-
Great work! I know that commenting out the start_packages line from /etc/rc is not enough to get the webgui working, as you found out. There are a few minicron entries after that line in /etc/rc that use fcgicli as well, one hourly account expire and one daily alias url updater. If I understand the problem correctly, they should be commented out as well, right?
In the old days, I'd look through the recent commits to the pfSense-tools tree, but I haven't taken the steps to regain access to that yet.
Hopefully the devs can find a fix, now that you've narrowed down the problem even more.
-
I am still getting the 100% CPU by check_reload_status too. I killed that and restarted it and the CPU went back to normal again.
-
I'm in the process of cloning the pfsense-tools repo to have a look through the commits. Can someone give me a timeframe for when this issue started showing up?
-
If the previous snapshots are available I can start installing them backwards and see when the issue disappears.
The first version I noticed the problem was the 29th or the 30th build.
EDIT: I am not sure what version I was running previous to the 29th build. It might have been the 26th or 27th. I don't see anything in the logs to show that I rebooted on the 28th. I put in a request for logging the version on boot so that I can easily keep track of what version was installed by going through the logs on my remote syslog server. I am sure the devs are busy though to worry about such things :).
-
If the previous snapshots are available I can start installing them backwards and see when the issue disappears.
The first version I noticed the problem was the 29th or the 30th build.
EDIT: I am not sure what version I was running previous to the 29th build. It might have been the 26th or 27th. I don't see anything in the logs to show that I rebooted on the 28th. I put in a request for logging the version on boot so that I can easily keep track of what version was installed by going through the logs on my remote syslog server. I am sure the devs are busy though to worry about such things :) .
http://snapshots.pfsense.org/FreeBSD_stable/10/amd64/pfSense_HEAD/updates/?C=M;O=D
There were a few versions that showed up on the 29th. Look at the 4G non VGA 21:41hrs and 23:36hrs as an example. still too small up to the last snaps out.
-
I'm having dramas trying to clone the repo, not too sure what's going on but it doesn't look like I'll be able to pull the commit logs any time soon.
-
Also, be aware that there are some issues with the iso and update image names taking an earlier date (in the filename) than they should have. Just FYI, but it can add to the confusion when trying to back out what image was built when, and identify when a problem showed up.
https://forum.pfsense.org/index.php?topic=76744.0 -
Updated to the version below and same issue just as an FYI…
2.2-ALPHA (i386)
built on Mon Jun 02 06:28:31 CDT 2014
FreeBSD 10.0-STABLEManually restart php-fpm and the gui works again through ssh.
killall php-fpm; sleep 2; /usr/local/sbin/php-fpm -c /usr/local/lib/php.ini -y /usr/local/lib/php-fpm.conf -RD 2>&1 >/dev/null
I notice that check_reload_status sometimes goes to 100% too(mainly after reboot). I manually kill and restart that and it goes back to normal low cpu usage. I have to force this one with -9 .
killall -9 check_reload_status; sleep 2; /usr/bin/nice -n20 /usr/local/sbin/check_reload_status
-
Any time I make a change to Suricata php-fpm has to be restarted again. I tried changing the log file size. The web page just sat there forever waiting for the post response I assume. I restart php-fpm and the change was done to to setting. I then tried to start Suricata and got the same waiting forever. I restarted php-fpm and the web gui started working again so I looked at the service did start. It seems like the commands are getting through before the gui stops working(at least enough to make it look like they did anyway).
I am thinking about uninstalling Suricata for testing 2.2 for now just so I don't have to deal with that.
EDIT: I just tried stopping Suricata and it stopped. php-fpm just seems to randomly (seemingly) stop working. Suricata might be a different issue as it goes to 100% CPU when I try to start it and doesn't seem to start anymore. Suricata does eventually go to normal CPU usage but the webui never returns when telling it to start. I still have to restart php-fpm.
Too many things to troubleshoot right now so I am removing Suricata.
-
I'm surprised that more things are not broken, given what you've found with php-fpm.
Do you have other packages installed that work OK, with just Suricata being a problem? The author of Suricata package did suggest that problems be posted in the packages sub-forum, but your problem is quite likely an issue with current state of 2.2 rather than the package:
https://forum.pfsense.org/index.php?topic=77311.msg421820#msg421820 -
I am sure more things will break php-fpm or are broken by php-fpm… whichever the case may be. I just have only been messing with Suricata so that is where I was seeing the issues.
I went ahead and added a rule and applied the changes and that worked I then went to change the client DHCP range in the openvpn config and that locked up php-fpm too.
So this is a more general failure of php-fpm it seems.
EDIT: I just checked the openvpn config after restarting php-fpm and it did make the change to the openvpn configuration even though php-fpm (and gui) stopped working.
-
Maybe it's time to create a bug in redmine, pointing back to these threads; so far, we don't even know if the devs are aware of the issue. I'll do that later tonight unless someone else can get to it first.
That would also be the right place to enter the feature request for version info to go into remote syslog files.
-
https://redmine.pfsense.org/issues/3690