Varnish stops working after few days

xudus

Hi guys,

Got a fresh install of 2.0.1 with pfBlocker, pfflowd, squid, squidGuard and varnish. On average, CPU usage no greater than 30%, RAM at 16%, SWAP 0% and HDD 4%. After reboot everything is working as it should. However, anywhere from few hours to few days after reboot, Varnish stops passing headers and routing to different servers. Restarting service (from gui) doesn't rectify the situation. Only after pfs restart everything starts working… and again after few hours or days it does the same thing.

BTW this is a second box (same pfs version + packages) that has this behavior.

Any ideas?

marcelloc

I'm using varnish2 in production for months without any restart.

Do you have any alerts on logs or could you check if varnish daemon is still up after these hours/few days?

Did you checked if there is any other service running on same port like pfsense gui or redirect rule?

xudus

marcelloc, thank you for quick reply. Looking at the output (see below) varnish is the only service on 80 and it is running. No alerts that I can see.

There are 4 domains and 3 internal HTTP servers. No matter which domain I try to access from the outside, it always takes me to the first server that is configured under backends. Even after I removed the first backed, it still takes me to it no matter which domain I'm trying to access!

Any thoughts?


[2.0.1-RELEASE][root@xld.noc]/root(10): sockstat -4 -l
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS
root     sshd       28374 5  tcp4   *:22                  *:*
root     lighttpd   36318 10 tcp4   *:8088                *:*
dhcpd    dhcpd      9264  16 udp4   *:67                  *:*
dhcpd    dhcpd      9264  20 udp4   *:20148               *:*
nobody   dnsmasq    1966  3  udp4   *:53                  *:*
nobody   dnsmasq    1966  4  tcp4   *:53                  *:*
nobody   dnsmasq    1966  10 udp4   *:35409               *:*
root     php        5813  10 udp4   *:*                   *:*
root     php        53032 10 udp4   *:*                   *:*
root     php        10759 10 udp4   *:*                   *:*
root     php        24700 10 udp4   *:*                   *:*
proxy    squid      26703 13 udp4   *:60333               *:*
proxy    squid      26703 19 tcp4   192.168.192.2:3128    *:*
proxy    squid      26703 20 tcp4   192.168.77.2:3128     *:*
proxy    squid      26703 21 tcp4   192.168.71.1:3128     *:*
proxy    squid      26703 22 tcp4   127.0.0.1:3128        *:*
proxy    squid      26703 23 udp4   *:4827                *:*
proxy    squid      26703 26 udp4   *:3401                *:*
nobody   varnishd   58060 6  tcp4   *:80                  *:*
root     varnishd   57880 4  tcp4   127.0.0.1:81          *:*
root     syslogd    10165 14 udp4   *:514                 *:*
root     bsnmpd     59549 9  udp4   *:*                   *:*
root     bsnmpd     59549 10 udp4   *:161                 *:*
_ntp     ntpd       6099  10 udp4   192.168.171.2:123     *:*
_ntp     ntpd       6099  11 udp4   192.168.192.2:123     *:*
_ntp     ntpd       6099  13 udp4   192.168.71.1:123      *:*
_ntp     ntpd       6099  14 udp4   192.168.177.2:123     *:*
root     miniupnpd  50392 10 tcp4   *:2189                *:*
root     miniupnpd  50392 11 udp4   *:1900                *:*
root     miniupnpd  50392 12 udp4   192.168.192.2:55451   *:*
root     php        63861 10 udp4   *:*                   *:*
root     php        62960 10 udp4   *:*                   *:*
root     inetd      50089 10 udp4   127.0.0.1:6969        *:*
[2.0.1-RELEASE][root@xld.noc]/root(11): ps aux | grep varnish
root   57880  0.0  4.1 85968 84512  ??  Ss   12:03PM   0:01.84 varnishd: Varnish-Mgr xld.noc (varnishd)
nobody 58060  0.0  4.1 92664 85452  ??  I    12:03PM   0:24.59 varnishd: Varnish-Chld xld.noc (varnishd)
root   46966  0.0  0.1  3524  1260   0  S+    4:06PM   0:00.01 grep varnish

xudus

Something is really screwed up. I disabled varnish in gui and I'm still able to access the server behind pfs! Mind you, I still have original issue with being presented with the first backend regardless of the domain being access.

marcelloc

Check if there is a remain nat that forward requests to this first server.

With varnish stopped, you can't be able to reach backends.

Also check on system -> advanced if webgui redirect rule is disabled.

xudus

No NAT and webgui is disabled.

As I stated in the first post. Everything is working properly after the reboot. This system was running for almost two weeks w/o issues.

Not sure if it matters, but varnish was installed first then two weeks after squid & squidguard was installed. Could the order of installation make difference?

marcelloc

@xudus:

Not sure if it matters, but varnish was installed first then two weeks after squid & squidguard was installed. Could the order of installation make difference?

Probably not.
There is something really weird on this setup. With varnish stopped, you can get access to port 80 so how could it forward to internal host????

xudus

Had the same behavior on previous box, so I rebuild the whole solution on the new box. The issue showed up on the new build. Two different boxes with this bug.

Just rebooted pfs. Everything is working. The question is for how long. I have a feeling that it has to do with squid/squidguard. I guest next time it happens, I'll remove squid/squidguard and see if it'll make a difference.

xudus

It happen again! 10 days after the reboot varnish is malfunctioning.

canefield

Dear all,

I encounter the same problems…after a couple of days Varnish is stopped and won't come online until a reboot of pfSense. Then again a couple days later the system doesn't respond to my external requests. How come?

Thanks,
Canefield

marcelloc

Is there any log or alert or message during manual service restart to help on identifying this problem?

I`m using it on amd64 for a long time without crashes.

att,
Marcello Coutinho

canefield

Yes, after a couple of tests the following error emerge:

"php: : The command '/usr/local/etc/rc.d/varnish.sh' returned exit code '2', the output was 'kern.ipc.nmbclusters: 65536 sysctl: kern.ipc.nmbclusters: Invalid argument kern.ipc.somaxconn: 16384 -> 16384 kern.maxfiles: 131072 -> 131072 kern.maxfilesperproc: 104856 -> 104856 kern.threads.max_threads_per_proc: 4096 -> 4096 NB: Storage size limited to 2GB on 32 bit architecture, NB: otherwise we could run out of address space. Message from VCC-compiler: Reference to unknown backend 'CANLB' at ('input' Line 55 Pos 28) .backend = CANLB; –-------------------------###########- In director specification starting at: ('input' Line 53 Pos 1) director CA client { ########------------ Running VCC-compiler failed, exit 1 VCL compilation failed'"

Thanks a lot,
Canefield

marcelloc

canefield,

something is messing up config:

Message from VCC-compiler: Reference to unknown backend 'CANLB' at ('input' Line 55 Pos 28) .backend = CANLB; –-------------------------###########-
In director specification starting at: ('input' Line 53 Pos 1) director CA client { ########------------ Running VCC-compiler failed, exit 1 VCL compilation failed'"

xudus

marcelloc, is there any particular command that I could run that would help with finding the root cause?

TIA,
Dave

marcelloc

xudus,

try to run the startup command on console/ssh

/usr/local/etc/rc.d/varnish.sh restart

xudus

This is what I'm getting:


kern.ipc.nmbclusters: 65536
sysctl: kern.ipc.nmbclusters: Invalid argument
kern.ipc.somaxconn: 16384 -> 16384
kern.maxfiles: 131072 -> 131072
kern.maxfilesperproc: 104856 -> 104856
kern.threads.max_threads_per_proc: 4096 -> 4096
storage_malloc: max size 128 MB.
Using old SHMFILE

marcelloc

@xudus:

This is what I'm getting:

There is no varnish fatal errors on this log, so it should be runinng.

xudus

Sorry, it is running as I just restarted pfs. I'll post the output when it'll malfunction next time.

xudus

marcelloc, it did it again.

The output from /usr/local/etc/rc.d/varnish.sh restart is same as before (no errors). Is there any other place that I could poke to see what is braking varnish?

marcelloc

xudus,

check with netstat -an if varnish port are still up
check with ps ax | grep -i varnish if varnish is still running.

You can also create a cron job to restart varnish after two days for example to prevent this random error.

att,
Marcello Coutinho