Squid and SquidGuard does not start after reboot
-
I'll have to run some more tests tomorrow to see if I can figure out what is going wrong. You could experiment with different sleep values, maybe up to 30 seconds or so, as a test. I went ahead and committed the 5 second sleep to the package just in case it worked.
-
hm… shouldn't line 1128 of /usr/local/pkg/squid.inc be calling
filter_configure_sync() instead of filter_configure()?
i just looked at the code or filter_configure in /etc/inc/filter.inc and it doesn't look like does much... It basically touches /tmp/filter_dirty and then exits...
when does filter_configure_sync() get called?
I just replaced the call to filter_configure() with filter_configure_sync() in squid.inc and guess what... that seems to work.
pfctl -sn | grep http
rdr on vr0 inet proto tcp from any to ! (vr0) port = http -> 127.0.0.1 port 80
Does this work for you?
--luis
-
Not sure offhand, but if it works, I'll commit it.I was informed that is a Very Bad Thing to do, since it can get you stuck in a loop.I'll investigate it further tomorrow, but at least there is a lead.
-
well it works for me… i assume it will work for you as well since it seems to be a fairly innocuous function. I just cant see how filter_configure() ever worked since the all the work is done in filter_configure_sync(). It seems that filter_configure() just flags the fact that changes to the filter rules need to be synced and then expects some other process to detect this and then do the job.
Anyway, am not familiar with the code enough to know if there are other repercussions to making the change. You would be a better judge of this. On the surface, however, it looks like the right thing to do.
Thanks for all your help. Unless I hear from you I assume that this issue is now closed.
take care.
--luis
-
That is the difference. filter_configure_sync() is synchronous - it does it right then and there and you wait on it. The other method waits on a periodic process to pick up the flag file and run with it.
-
Not sure offhand, but if it works, I'll commit it.I was informed that is a Very Bad Thing to do, since it can get you stuck in a loop.I'll investigate it further tomorrow, but at least there is a lead.
OK. let me know if you need help. I certainly have a system here that can reproduce the problem very reliably.
Is the "VERY BAD THING" caused by the possibility of having two filter_configure_syn() happening at the same time?
–luis
-
That seemed to be the case, but I don't know the specifics in detail.
I'll work on it more tomorrow and see what I can come up with.
-
Now to make things even more strange, I cannot reproduce the problem at all.
I uninstalled and reinstalled both squid and squidguard last night, and now when I reboot the rule is there, and I'm not using any locally modified code. I even switched slices and tried it again, and it came up with the rule intact.
Does it still happen for you if you uninstall and reinstall squid now?
I made a couple changes to the pkg last night but nothing that should have affected this.
Jim
-
Now to make things even more strange, I cannot reproduce the problem at all.
I uninstalled and reinstalled both squid and squidguard last night, and now when I reboot the rule is there, and I'm not using any locally modified code. I even switched slices and tried it again, and it came up with the rule intact.
Does it still happen for you if you uninstall and reinstall squid now?
I made a couple changes to the pkg last night but nothing that should have affected this.
Jim
i booted my system with the manually modified /usr/local/pkg/squid.inc with the change where filter_configure() was replaced with filter_configure_sync() and confirmed that the http redirect rule was in place.
then i uninstalled squid (So there are 0 packages on the system now) and rebooted.
then i checked for the rule… its not there.
pfctl -sn | grep http
the system logs look clean and the boot console shows no errors.
next I installed squid and started the service and noticed that the rule was in place.
pfctl -sn | grep http
rdr on vr0 inet proto tcp from any to ! (vr0) port = http -> 127.0.0.1 port 80
after reboot i get
pfctl -sn | grep http
rdr on vr0 inet proto tcp from any to ! (vr0) port = http -> 127.0.0.1 port 80
so... the rule is in place.
i did notice that the boot sequence now takes a lot longer than it did before. During the Syncing packages line each squid module takes a long time... Here is what I have
Syncing packges: squid squidcache squidmac squid squidtraffic squidupstream squidauth squidusers
So is the delay caused by a full filter_configure_sync for each one? ah... i see that you have the sleep(5) in there. let me take that out to see what happens...
so... when the sleep(5) is removed the boot up sequence is much faster.. as expected.
** BUT **
pfctl -sn | grep http
the rule is gone.
I really don't like the sleep solution.... there is obviously some race condition happening here which is being addressed by slowing down the config process. Having a random sleep is just asking for problems down the road.
anyway, replacing filter_configure() with filter_configure_sync()works for me. Are you sure this is not the answer? the book sequence is fast (maybe a little slower than using filter_configure() but not noticeably so) and it works...
pfctl -sn | grep http
rdr on vr0 inet proto tcp from any to ! (vr0) port = http -> 127.0.0.1 port 80
I look forward to your thoughts.
--luis
-
Given the warnings I was given about not using filter_configure_sync() in that way, the sleep() solution is better. The reason the rule isn't present was because they were trying to add the rules too quickly after squid was told to start up.
If you look in the code, the squid launch process is done using mwexec_bg() which will fork a background process, which will cause a race with whatever code is next, the sleep induces a delay before reconfiguring the filter, which seems to make it work.
Though I swear I tried it yesterday with that sleep() and it didn't make a difference.
I have a couple more ideas, though:
- Remove the sleep, and change mwexec_bg() for squid to mwexec() on line 1123 which may avoid the race.
- Loop a couple times and only sleep for a second or two if squid is not running. That way it doesn't proceed until the process is found, or breaks out if it's not running after a few seconds (5, 10, whatever).
-
I just committed both of those, give it a try.
It works for me. By the time I get to the shell prompt and check, the rule is there.
-
I just committed both of those, give it a try.
It works for me. By the time I get to the shell prompt and check, the rule is there.
sigh… just tried it and....
pfctl -sn | grep http
no rule.
I see the code changes you have made
if (!is_service_running('squid')) {
log_error("Starting Squid");
mwexec("/usr/local/sbin/squid -D");
} else {
log_error("Reloading Squid for configuration sync");
mwexec("/usr/local/sbin/squid -k reconfigure");
}// Sleep for a couple seconds to give squid a chance to fire up fully.
for ($i=0; $i < 10; $i++) {
if (!is_service_running('squid'))
sleep(1);
}The log files show the following
Jun 23 20:42:16 php: : SQUID is installed but not started. Not installing "nat" rules.
Jun 23 20:42:16 php: : SQUID is installed but not started. Not installing "filter" rules.
...
Jun 23 20:42:21 php: : Resyncing configuration for all packages.
Jun 23 20:42:23 php: : Reloading Squid for configuration sync
Jun 23 20:42:25 php: : Reloading Squid for configuration sync
Jun 23 20:42:29 php: : Starting Squid
Jun 23 20:42:29 squid[1080]: Squid Parent: child process 1083 started
Jun 23 20:42:31 php: : Reloading Squid for configuration sync
Jun 23 20:42:35 last message repeated 3 times
Jun 23 20:42:36 check_reload_status: check_reload_status is starting
Jun 23 20:42:36 check_reload_status: rc.newwanip starting
Jun 23 20:42:44 php: : Informational: rc.newwanip is starting vr1.
Jun 23 20:42:44 php: : rc.newwanip working with (IP address: 192.168.0.4) (interface: wan) (interface real: vr1).modifying the code so that it looks like
// Sleep for a couple seconds to give squid a chance to fire up fully.
for ($i=0; $i < 10; $i++) {
if (!is_service_running('squid'))
sleep(1);
}
sleep(5);
filter_configure();makes things work again.
so… either is_service_running() is not working correctly or squid needs a little time after it is spawned before the filters_can be configured.
modifying the code like // Sleep for a couple seconds to give squid a chance to fire up fully.
for ($i=0; $i < 10; $i++) {
sleep(1);
if (is_service_running('squid')) {
break;
}
}shows that the first squid takes a while to spawn when observing the "Synching services line" on the console which means that is_service_running() is working.
what is scary here is that some amount of time between 1 - 5 seconds (on my system, who knows on others) is required after squid is spawned before the async version of filter_configure() is effective.
maybe mwexec("/usr/local/sbin/squid -k reconfigure");
is messing things up.
anyway... i am still confused by
/* reload filter async */
function filter_configure() {
if(isset($config['system']['developerspew'])) {
$mt = microtime();
echo "filter_configure() being called $mt\n";
}
global $g;touch($g['tmp_path'] . "/filter_dirty");
}some external event has to be triggered to cause /filter_dirty to be tested and filter_configure_sync() to be run. So… why would a delay of X seconds matter? once you touch a file the timestamp does not get modified... so even if squid were not running when the file was touched this should not matter.... Something must be removing /filter_dirty before the async process has had a chance to sync up the rules. What would that be? Obviously this process happens soon after squid is spawned... this is why the sleep(5) works. because the thing that deletes /filter_dirty has done its job and filter_configure then creates a file which then gets picked up on the next sync....
the logs show that squid is running
Jun 23 20:42:31 php: : Reloading Squid for configuration sync
Jun 23 20:42:35 last message repeated 3 timesso.. the problem is not "trying to update the rule when squid is not running". This is an issue with the way the filter rules are synced. the Async methods is failing here.
right now i am clueless how to effectively solve this race condition. Placing a sleep(5) is really not a solution IMHO.
--luis
-
question
the following is an excerpt from /etc/rc
–----
....
echo -n "Starting CRON... "
cd /tmp && /usr/sbin/cron -s 2>/dev/null
echo "done."Start packages
/etc/rc.start_packages
rm -rf /usr/local/pkg/pf/CVS
Remove stale files that have already been processed by bootup
scripts
rm -f /tmp/filter_dirty
rm -f /tmp/rc.linkup
nohup /usr/bin/nice -n20 /usr/local/sbin/check_reload_status &
....
is there any chance that rm -f /tmp/filter_dirty
might happen before the syncing of the rules when spawning squid? This would cause filter_config() in squid.inc to be ignored. It would explain why sleep(5) works. the sleep would cause the rm in rc to happen before filter_config() is called.let me test that by commenting out the rm in rc....
hey guess what
pfctl -sn | grep http
rdr on vr0 inet proto tcp from any to ! (vr0) port = http -> 127.0.0.1 port 80
the problem has ** NOTHING TO DO WITH THE SQUID PACKAGE ***
its a race condition cause in /etc/rc.
this is a a serious bug that could have repercussions for any package that rules that get created right about the time rc
removes /tmp/filter_dirty.comment out
rm -f /tmp/filter_dirtyand everything works like expected. no sleeps required in before filter_config() to make things work.
this no longer needed.
// Sleep for a couple seconds to give squid a chance to fire up fully.
/*
for ($i=0; $i < 10; $i++) {
sleep(1);
if (is_service_running('squid')) {
break;
}
}
*/i bet you a $1 that if you put back the mwexec_bg things would still work since the creation of the rules happens asynchronously.
HOWEVER... i still think its a good idea to leave the synchronous spawn and the loop in there just to make DARN sure that squid is up before you try to update the filters.
--luis
-
Good find there. I'm not sure why that rm is there. I asked around and at most that would mean another filter reload, which is harmless, and in this case necessary. It must still be working for me again on this test ALIX now because my WAN is wireless and rc.newwanip doesn't run as quickly.
So removing that rm is safe, but I'm not sure how best to handle that long-term. I'm not sure it should be patched out for everyone.
-
The way I see it the rm in rc is bad… the reason for this is that there are many programs that set /tmp/filter_dirty expecting to have some process later on detect it and then sync up the filters. The issue is that the rm in rc can happen at any time while subsystems are booting up. This is bad... The rc script should 'wait' until the system is fully booted before removing this file. But it doesn't. It just removes it willy-nilly at some random time without considering the effect on the boot process.
This is your classic multi-processing synchronization problem. You have one process removing a resource that another program needs without checking first.
From everything I have looked at it seems that having rc remove the file is unnecessary. Think about it... reloading process set /tmp/filter_dirty and remove it after the synchronization is complete. So... why should removing it once in rc matter? On my system, for example, currently the file does not exist even though the rm in rc is commented out. This is because the file is unlinked after the filter synchronization was completed. The house keeping in rc is unnecessary.
IMHO 1 of two things needs to be done.
1. rm needs to be removed from rc
or
2. some synchronization mechanism needs to be added to rc so that rc does not attempt to remove this file until after all processes have been started up.Please check again with the powers that be... I believe that removing the rm in rc will not negatively impact the operation of the system for anyone and therefore should be done since there are now documented examples where in some instances it breaks the boot process. No telling what other users are having weird unexplained behavior because of this. In our case it was squid. but for some other user it might be.... Also, keep in mind that this thread has been open since Oct 2009... so this is a long standing problem. Not something that just cropped up.
summing up... having rm -f /tmp/filter_dirty in rc causes inconsistent behavior in pfSense and it is therefore detrimental to the project.
--luis
PS... here is another thread where boot up problems are probably caused by removal of /tmp/filter_dirty in /etc/rc
http://forum.pfsense.org/index.php?topic=19381.0#lastPost
this one was reported in Sept 2009. -
Taking the rm out is the solution, but since that is a system file and not a package file, I'm hesitant to remove that as part of the package installation - that's what I was referring to.
/tmp/filter_dirty is removed automatically by check_reload_status after it reloads the filter – nothing needs to remove that file manually.
-
Taking the rm out is the solution, but since that is a system file and not a package file, I'm hesitant to remove that as part of the package installation - that's what I was referring to.
/tmp/filter_dirty is removed automatically by check_reload_status after it reloads the filter – nothing needs to remove that file manually.
Yes… i agree with you 100% this should not be removed by the package....
This needs to be patched in the distribution.
Who takes care of that?
--luis
-
Well 1.2.3 won't see any more updates to the base system. I can remove that line on 2.0, though (it's still there), after running it by a few others.
-
OK. thanks…
Its a pity that there are no more updates to 1.2.3...
I will add this to our list of things to do when building a new distribution...
I will let the folks at the applianceshop know as well.
Take care and thanks for your help.
--luis
-
It might be possible to get it committed into the 1.2.3 tree so a rebranded release for a customer could pick it up, failing that, an overlay for the rc file could be setup for a custom build.
Or I could just make a little patch package that removes it, too, but not bundle it with squid (since it doesn't affect everyone)