SG-3100 21.05.1 kern.ipc.maxpipekva exceeded; see tuning(7)
-
Re: kern.ipc.maxpipekva console messages starting with 21.02 upgrade
I upgraded to 21.05 from 2.4.5-p1 on 2021-08-01 and then to 21.05.1 2021-08-05. I am seeing these messages on the console since upgrading to 21.05. When I log in, the GUI shows "Unable to check for updates" and the Package Manager is unable to show me packages.
When I try to enter the shell I get "Can't make pipe." and a reboot does not succeed. I have to power cycle the SG-3100 to get to the shell have the package functions work properly.
Trying to ssh in results in a closed connection after key exchange.
Packages installed:
Avahi 2.2
cron 0.3.7_5
lldpd 0.9.11
openvpn-client-export 1.6_2
sudo 0.3_6I am not using memory file system for /tmp and /var.
Routing does not appear to be affected though.
This box has the 32G SSD option.
[21.05.1-RELEASE][root@gw.home.honig.net]/root: df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/diskid/DISK-AA180129000000000341s2a 29735244 1207580 26148848 4% / devfs 1 1 0 100% /dev /dev/diskid/DISK-AA180129000000000341s1 34442 2003 32439 6% /boot/u-boot /dev/md0 3484 132 3076 4% /var/run devfs 1 1 0 100% /var/dhcpd/dev
Any ideas?
Thanks.
Jeff
-
That is a symptom of something opening a large number of pipes and eventually hitting that limit.
At the command line run:ps -auxwwd
See what is running at that point.
Steve
-
@stephenw10 Thanks for that!
A script I'm using to update the LEDs is leaving processes hanging. Potentially because a sysctl command is hanging, but I wasn't able to do much debugging before I had to reboot. I'll investigate more.
-
Ah, that's interesting. Something that calls
sysctl -a
? -
@stephenw10 This script run from cron as
* * * * * root /root/bin/gw_leds -a WAN_OTTC_DHCP -b WAN_EA_DHCP -C 0,0,16
was showing many of these:
root 3188 0.0 0.1 5104 2332 - I Sat20 0:00.00 | |-- cron: running job (cron) root 3509 0.0 0.1 4960 2580 - Is Sat20 0:00.03 | | |-- /bin/sh /root/bin/gw_leds -a WAN_OTTC_DHCP -b WAN_EA_DHCP -C 0,0,16 root 5720 0.0 0.1 4204 1952 - I Sat20 0:00.00 | | | `-- sysctl dev.gpio.2.led.2.pwm=1 root 5988 0.0 0.0 0 0 - Z Sat20 0:00.00 | | `-- <defunct>
But when I tried to write a pipeline to kill them all I ran out of memory and was kicked off the root shell. I had to power cycle to get it back.
This script runs every minute and it's been up for 30 minutes with no processes hanging around. So I'll need to check periodically to find it in that state and hope I can debug before it runs out of memory.
-
Ah, OK. That explains why I've never hit it I guess.
-
I’ve been having exactly the same problem and looking for the cause. It started about 24 hours after installing the gw_leds script.
https://forum.netgate.com/topic/165701/all-services-fail-to-start-all-packages-gone -
@jchonig I opened an issue on your GitHub without realizing that it was you that had posted this thread. Guess you already knew about the problem.
-
@stephenw10 While my script doesn't do a
sysctl -a
, it does dosysctl dev.gpio
every time it starts. It did not seem to hang there, but in thesysctl dev.gpio.${gdev}.led.${led}.pwm=1
command. I've changed the code around the later to first read the value and only change it if it's incorrect.I've added lockf on the cron job and now I see that it recently hung on gpioctl:
root 6465 0.0 0.1 4956 2584 - I 15:32 0:00.01 |-- /bin/sh /root/bin/gw_leds -a WAN_OTTC_DHCP -b WAN_EA_DHCP root 8106 0.0 0.1 4956 2580 - I 15:32 0:00.00 | `-- /bin/sh /root/bin/gw_leds -a WAN_OTTC_DHCP -b WAN_EA_DHCP root 16079 0.0 0.1 4208 1944 - D 15:32 0:00.26 | `-- /usr/sbin/gpioctl -f /dev/gpioc2 7 duty 16
These hangs are new since my upgrade from 2.4.5.
-
Hmm, the only thing I'm aware of that may be related is the additional gpio device that is detected in 21.0X compared with 2.4.X. We had to make some changes to our driver for that I believe. The active device was incremented. That must be working for you though or it wouldn't work at all. Not something that would fail after some time.
Steve