Varnish stops working after few days
-
Hi guys,
Got a fresh install of 2.0.1 with pfBlocker, pfflowd, squid, squidGuard and varnish. On average, CPU usage no greater than 30%, RAM at 16%, SWAP 0% and HDD 4%. After reboot everything is working as it should. However, anywhere from few hours to few days after reboot, Varnish stops passing headers and routing to different servers. Restarting service (from gui) doesn't rectify the situation. Only after pfs restart everything starts working… and again after few hours or days it does the same thing.
BTW this is a second box (same pfs version + packages) that has this behavior.
Any ideas?
-
I'm using varnish2 in production for months without any restart.
Do you have any alerts on logs or could you check if varnish daemon is still up after these hours/few days?
Did you checked if there is any other service running on same port like pfsense gui or redirect rule?
-
marcelloc, thank you for quick reply. Looking at the output (see below) varnish is the only service on 80 and it is running. No alerts that I can see.
There are 4 domains and 3 internal HTTP servers. No matter which domain I try to access from the outside, it always takes me to the first server that is configured under backends. Even after I removed the first backed, it still takes me to it no matter which domain I'm trying to access!
Any thoughts?
[2.0.1-RELEASE][root@xld.noc]/root(10): sockstat -4 -l USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root sshd 28374 5 tcp4 *:22 *:* root lighttpd 36318 10 tcp4 *:8088 *:* dhcpd dhcpd 9264 16 udp4 *:67 *:* dhcpd dhcpd 9264 20 udp4 *:20148 *:* nobody dnsmasq 1966 3 udp4 *:53 *:* nobody dnsmasq 1966 4 tcp4 *:53 *:* nobody dnsmasq 1966 10 udp4 *:35409 *:* root php 5813 10 udp4 *:* *:* root php 53032 10 udp4 *:* *:* root php 10759 10 udp4 *:* *:* root php 24700 10 udp4 *:* *:* proxy squid 26703 13 udp4 *:60333 *:* proxy squid 26703 19 tcp4 192.168.192.2:3128 *:* proxy squid 26703 20 tcp4 192.168.77.2:3128 *:* proxy squid 26703 21 tcp4 192.168.71.1:3128 *:* proxy squid 26703 22 tcp4 127.0.0.1:3128 *:* proxy squid 26703 23 udp4 *:4827 *:* proxy squid 26703 26 udp4 *:3401 *:* nobody varnishd 58060 6 tcp4 *:80 *:* root varnishd 57880 4 tcp4 127.0.0.1:81 *:* root syslogd 10165 14 udp4 *:514 *:* root bsnmpd 59549 9 udp4 *:* *:* root bsnmpd 59549 10 udp4 *:161 *:* _ntp ntpd 6099 10 udp4 192.168.171.2:123 *:* _ntp ntpd 6099 11 udp4 192.168.192.2:123 *:* _ntp ntpd 6099 13 udp4 192.168.71.1:123 *:* _ntp ntpd 6099 14 udp4 192.168.177.2:123 *:* root miniupnpd 50392 10 tcp4 *:2189 *:* root miniupnpd 50392 11 udp4 *:1900 *:* root miniupnpd 50392 12 udp4 192.168.192.2:55451 *:* root php 63861 10 udp4 *:* *:* root php 62960 10 udp4 *:* *:* root inetd 50089 10 udp4 127.0.0.1:6969 *:* [2.0.1-RELEASE][root@xld.noc]/root(11): ps aux | grep varnish root 57880 0.0 4.1 85968 84512 ?? Ss 12:03PM 0:01.84 varnishd: Varnish-Mgr xld.noc (varnishd) nobody 58060 0.0 4.1 92664 85452 ?? I 12:03PM 0:24.59 varnishd: Varnish-Chld xld.noc (varnishd) root 46966 0.0 0.1 3524 1260 0 S+ 4:06PM 0:00.01 grep varnish
-
Something is really screwed up. I disabled varnish in gui and I'm still able to access the server behind pfs! Mind you, I still have original issue with being presented with the first backend regardless of the domain being access.
-
Check if there is a remain nat that forward requests to this first server.
With varnish stopped, you can't be able to reach backends.
Also check on system -> advanced if webgui redirect rule is disabled.
-
No NAT and webgui is disabled.
As I stated in the first post. Everything is working properly after the reboot. This system was running for almost two weeks w/o issues.
Not sure if it matters, but varnish was installed first then two weeks after squid & squidguard was installed. Could the order of installation make difference?
-
Not sure if it matters, but varnish was installed first then two weeks after squid & squidguard was installed. Could the order of installation make difference?
Probably not.
There is something really weird on this setup. With varnish stopped, you can get access to port 80 so how could it forward to internal host???? -
Had the same behavior on previous box, so I rebuild the whole solution on the new box. The issue showed up on the new build. Two different boxes with this bug.
Just rebooted pfs. Everything is working. The question is for how long. I have a feeling that it has to do with squid/squidguard. I guest next time it happens, I'll remove squid/squidguard and see if it'll make a difference.
-
It happen again! 10 days after the reboot varnish is malfunctioning.
-
Dear all,
I encounter the same problems…after a couple of days Varnish is stopped and won't come online until a reboot of pfSense. Then again a couple days later the system doesn't respond to my external requests. How come?
Thanks,
Canefield -
Is there any log or alert or message during manual service restart to help on identifying this problem?
I`m using it on amd64 for a long time without crashes.
att,
Marcello Coutinho -
Yes, after a couple of tests the following error emerge:
"php: : The command '/usr/local/etc/rc.d/varnish.sh' returned exit code '2', the output was 'kern.ipc.nmbclusters: 65536 sysctl: kern.ipc.nmbclusters: Invalid argument kern.ipc.somaxconn: 16384 -> 16384 kern.maxfiles: 131072 -> 131072 kern.maxfilesperproc: 104856 -> 104856 kern.threads.max_threads_per_proc: 4096 -> 4096 NB: Storage size limited to 2GB on 32 bit architecture, NB: otherwise we could run out of address space. Message from VCC-compiler: Reference to unknown backend 'CANLB' at ('input' Line 55 Pos 28) .backend = CANLB; –-------------------------###########- In director specification starting at: ('input' Line 53 Pos 1) director CA client { ########------------ Running VCC-compiler failed, exit 1 VCL compilation failed'"
Thanks a lot,
Canefield -
canefield,
something is messing up config:
Message from VCC-compiler: Reference to unknown backend 'CANLB' at ('input' Line 55 Pos 28) .backend = CANLB; –-------------------------###########-
In director specification starting at: ('input' Line 53 Pos 1) director CA client { ########------------ Running VCC-compiler failed, exit 1 VCL compilation failed'" -
marcelloc, is there any particular command that I could run that would help with finding the root cause?
TIA,
Dave -
xudus,
try to run the startup command on console/ssh
/usr/local/etc/rc.d/varnish.sh restart
-
This is what I'm getting:
kern.ipc.nmbclusters: 65536 sysctl: kern.ipc.nmbclusters: Invalid argument kern.ipc.somaxconn: 16384 -> 16384 kern.maxfiles: 131072 -> 131072 kern.maxfilesperproc: 104856 -> 104856 kern.threads.max_threads_per_proc: 4096 -> 4096 storage_malloc: max size 128 MB. Using old SHMFILE
-
This is what I'm getting:
There is no varnish fatal errors on this log, so it should be runinng.
-
Sorry, it is running as I just restarted pfs. I'll post the output when it'll malfunction next time.
-
marcelloc, it did it again.
The output from /usr/local/etc/rc.d/varnish.sh restart is same as before (no errors). Is there any other place that I could poke to see what is braking varnish?
-
xudus,
check with netstat -an if varnish port are still up
check with ps ax | grep -i varnish if varnish is still running.You can also create a cron job to restart varnish after two days for example to prevent this random error.
att,
Marcello Coutinho