Intermittent System Freeze After Upgrade to 2.4

  • I recently upgraded my system to 2.4.0 and something strange is happening.  My box is a Netgate SG-4860.  The upgrade went smoothly and I thought everything was running good.  Until the next day when my users started having website connectivity issues.  When trying to log into one of the websites that they use, they will get an

    "… ICAP Protocol Error The system returned: [No Error]…"

    Only some websites would get this error.  The VOIP system seems to be un-affected and all the monitoring tools shows the box is working so the internet connectivity is working.

    I tried to log into Web GUI to check the logs, but I could not connect - it would not load the log in screen.  I then tried to SSH into the unit to reset the PHP and was able log in with the admin account, but it wouldn't pull up the menu and gave me no prompt to type in commands.  Just to try it, I was able to SSH into the box using a created user account and at least got a prompt, but didn't know how to pull up the menu via a command.  Just to get my users working again, I rebooted the box via power and once it came back up, everything was working normally again.  I checked the system logs and around the time that the system froze(?), there was a similar entry to the following entry...

    "nginx: 2017/10/17 07:32:42 [error] 32293#100137: *35 upstream timed out (60: Operation timed out) while reading response header from upstream, client:, server: , request: "GET /widgets/widgets/pfblockerng.widget.php?getNewCounts=1508243383287 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "", referrer: ""

    There were a bunch of these errors up until the system rebooted.  I know that the ICAP message comes from the installed Squid package so I believe that some of the packages just becomes un-responsive.  I am not sure if the above message is relative to what is causing the system to freeze or not or is just a symptom of the problem.  I have since had to reboot the unit several times since yesterday with the most recent being this morning.

    Can someone help to point me in the right direction in resolving this issue?  Thank you in advance.

  • I am also having issues very similar to this. I am also using the pfblocker module on Netgate APU4 hardware.

    After the upgrade, I noticed 30-60% CPU usage vs 3-5% pre-upgrade. Upon looking at the top CPU processes php_fpm:pool nginx kept jumping to the top of the list.

    When I am unable to access the gateway I am getting a "502 bad gateway" error with nginx at the bottom of the page. OpenVPN server also appears to stop, as I have users calling saying they are unable to connect. Upon looking at the CPU usage, it is running at 1.2%.

  • Rebel Alliance Developer Netgate

    Most likely, you are hitting this issue:

    When it stops again, gather the info from and post in that other thread.

    If you are running DNSBL there are already some suggested fixes on the last couple pages of that thread to try.

  • Thank you, I will keep an eye on it and gather the necessary info if it happens again.  As of right now, I have not had a problem since yesterday around noon, but will monitor the situation.

    As a side note, when my Web GUI freezes in my situation, I don't get a "502 Bad Gateway" error, the web page doesn't load and the cursor just keeps spinning.  That doesn't mean it's not the same issue though.

    Thanks again, jimp.

Log in to reply