Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working
-
@free4 said in Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working:
ipfw table all list
Once again issue came after 2 days even loop was disabled. now again network port getting hanged and only way to make it work is disable and enable again from interface and it work. when network get down i keep getting error in. once interface down i get error from WAN interface connecting remotely to WEB GUI saying 502 BAD GATEWAY. I got output from command while it was down "ipfw table all list"
ipfw table all list
--- table(cp_ifaces), set(0) ---
bge2 2100 6824547305 5704924933364 1563210003
--- table(campco_auth_up), set(0) ---
10.10.25.80/32 34:8b:75:d3:8a:5e 0 12580 3288176 1563210003
10.10.25.157/32 a0:cb:fd:0e:04:b8 0 22395 1907512 1563210003
--- table(campco_host_ips), set(0) ---
--- table(campco_pipe_mac), set(0) ---
--- table(campco_auth_down), set(0) ---
10.10.25.80/32 0 13409 3655823 1563210003
10.10.25.157/32 0 35937 45843082 1563210003
--- table(campco_allowed_up), set(0) ---
--- table(campco_allowed_down), set(0) ---I have attached interface screenshot you can see error coming on Portal interface.
-
"3547" => time to get a close look at the NIC - or cable - or the hookup up switch and/or AP because something is faulting.
-
@Gertjan Cable already changed with 3M branded Patch Cord. but still issue. there is one Fiber connectivity from core Zyxel switch to another Zyxel Switch if we plug that then mostly error starts increasing and network get down in few hours. we have removed the cable up to 12 hours its fine but sometimes 2 or 3 error increased. still sorting out what is issue.
-
@Gertjan Right now network port hanged if i try to ping any host from portal interface it gives me this error "ping: sendto: Permission denied" if i disable and enable interface it will work again.
-
Enter console mode, option 8.
The dmesg command (log) tells you something ? -
@Gertjan have a look on dmesg longs
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (232 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (426 occurrences)
arp: 10.10.12.254 moved from 1c:c3:eb:91:d3:f7 to 40:40:a7:5c:55:53 on bge2
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (857 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (945 occurrences)
sonewconn: pcb 0xfffff8035758b1d0: Listen queue overflow: 193 already in queue awaiting acceptance (1086 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (274 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (312 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (649 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (907 occurrences)
sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (107 occurrences) -
Queuing happens if the receiving application cannot process the number of connections fast enough. So you either get too many connections or your application is too slow to handle them
to see what it isnetstat -Aan | grep fffff80130a2f0f0
sockstat -l | grep socketname
netstat -Lan | grep 193
probably will be the captive portal
basically that error means that the NIC can no longer keep up
there are some tunables that you can check to solve the problem
there is a section for bge/bce
https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html
some suggest to increse kern.ipc.soacceptqueue that was previusly named kern.ipc.somaxconn, increase it slowly (default is 128) until problem disappear -
@kiokoman said in Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working:
netstat -Lan | grep 193
I have followed as said and got only output from first command
netstat -Aan | grep fffff80130a2f0f0
fffff80130a2f0f0 stream 0 0 fffff80130715ce8 0 0 0 /var/run/php-fpm.socketothers both command has no output.
secondly i have increased kern.ipc.soacceptqueue to 1024 value and also did tunning for bge NIC as given in docs creating /boot/loader.conf.localkern.ipc.nmbclusters="131072"
hw.bge.tso_enable=0
hw.pci.enable_msix=0i will keep an eye on system if that make any difference.
-
the second command in this case is
sockstat -l | grep /var/run/php-fpm.socket
as an example for me the result is
root pfctl 51866 12 stream /var/run/php-fpm.socket root pfctl 51866 13 stream /var/run/php-fpm.socket root sleep 37398 12 stream /var/run/php-fpm.socket root sleep 37398 13 stream /var/run/php-fpm.socket dhcpd dhcpd 50501 12 stream /var/run/php-fpm.socket dhcpd dhcpd 50501 13 stream /var/run/php-fpm.socket root php-fpm 29258 13 stream /var/run/php-fpm.socket root lighttpd_l 64766 12 stream /var/run/php-fpm.socket root lighttpd_l 64766 13 stream /var/run/php-fpm.socket dhcpd dhcpd 10592 12 stream /var/run/php-fpm.socket dhcpd dhcpd 10592 13 stream /var/run/php-fpm.socket root dpinger 74966 12 stream /var/run/php-fpm.socket root dpinger 74966 13 stream /var/run/php-fpm.socket root dpinger 74410 12 stream /var/run/php-fpm.socket root dpinger 74410 13 stream /var/run/php-fpm.socket root dpinger 74320 12 stream /var/run/php-fpm.socket root dpinger 74320 13 stream /var/run/php-fpm.socket root php-fpm 85281 13 stream /var/run/php-fpm.socket squid squid 37275 12 stream /var/run/php-fpm.socket squid squid 37275 13 stream /var/run/php-fpm.socket root squid 36287 12 stream /var/run/php-fpm.socket root squid 36287 13 stream /var/run/php-fpm.socket root php-fpm 24514 13 stream /var/run/php-fpm.socket root php-fpm 340 13 stream /var/run/php-fpm.socket root php-fpm 339 13 stream /var/run/php-fpm.socket root php-fpm 338 15 stream /var/run/php-fpm.socket
so we can confirm that the socket is used by squid dpinger dhcpd etc etc
well keep us updated
-
@kiokoman said in Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working:
so we can confirm that the socket is used by squid dpinger dhcpd etc etc
Correct.
All these programs use or call scripts that are PHP based. -
@kiokoman said in Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working:
sockstat -l | grep /var/run/php-fpm.socket
got this output
root sleep 99245 12 stream /var/run/php-fpm.socket root sleep 99245 13 stream /var/run/php-fpm.socket root sleep 97728 12 stream /var/run/php-fpm.socket root sleep 97728 13 stream /var/run/php-fpm.socket unbound unbound 20075 12 stream /var/run/php-fpm.socket unbound unbound 20075 13 stream /var/run/php-fpm.socket root php-fpm 83461 13 stream /var/run/php-fpm.socket root php-fpm 83358 13 stream /var/run/php-fpm.socket root php-fpm 83301 13 stream /var/run/php-fpm.socket root php-fpm 83152 15 stream /var/run/php-fpm.socket root nginx 55388 12 stream /var/run/php-fpm.socket root nginx 55388 13 stream /var/run/php-fpm.socket root nginx 55234 12 stream /var/run/php-fpm.socket root nginx 55234 13 stream /var/run/php-fpm.socket root nginx 54950 12 stream /var/run/php-fpm.socket root nginx 54950 13 stream /var/run/php-fpm.socket root nginx 54735 12 stream /var/run/php-fpm.socket root nginx 54735 13 stream /var/run/php-fpm.socket root nginx 54498 12 stream /var/run/php-fpm.socket root nginx 54498 13 stream /var/run/php-fpm.socket root nginx 54344 12 stream /var/run/php-fpm.socket root nginx 54344 13 stream /var/run/php-fpm.socket root nginx 54164 12 stream /var/run/php-fpm.socket root nginx 54164 13 stream /var/run/php-fpm.socket root nginx 53913 12 stream /var/run/php-fpm.socket root nginx 53913 13 stream /var/run/php-fpm.socket root nginx 53902 12 stream /var/run/php-fpm.socket root nginx 53902 13 stream /var/run/php-fpm.socket root nginx 53685 12 stream /var/run/php-fpm.socket root nginx 53685 13 stream /var/run/php-fpm.socket root nginx 53620 12 stream /var/run/php-fpm.socket root nginx 53620 13 stream /var/run/php-fpm.socket root sh 94097 12 stream /var/run/php-fpm.socket root sh 94097 13 stream /var/run/php-fpm.socket root sh 93897 12 stream /var/run/php-fpm.socket root sh 93897 13 stream /var/run/php-fpm.socket root dpinger 83648 12 stream /var/run/php-fpm.socket root dpinger 83648 13 stream /var/run/php-fpm.socket root dpinger 83295 12 stream /var/run/php-fpm.socket root dpinger 83295 13 stream /var/run/php-fpm.socket
-
ok nginx is the webserver for captive portal, check that if the problem still occurs
-
@kiokoman System was working without any problem for 28 days in heavy load 2000 plus users. suddenly started getting errors in ( portal interface ) and three times system goes down " just portal interface stop working for in/out traffic. disable it and enable it start working for sometimes one day or sometimes two days then again it happens.
i am not changing any setting and if one service is working other goes down. now captive portal is stuck on boot screen and not let system to load menu startup. I haven't change any setting in Captive portal.
-
so now we have random services that go down and captive portal stuck? I will Try fsck on the filesystem to see if it helps and a memtest
-
@Gertjan is there a way to reinstall captive portal package only. captive portal serivce show running when started from Dashboard. but if i try to save configuration of captive portal it keep waiting and finally gives error 504 Gateway time out. users are connected directly like on LAN and portal seems to be not working anymore.
-
The captive portal isn't a package.
It's using :
Another instance of nginx running on 127.0.0.1 port 80 (and 443 if uou use https).
This nginx instance uses a landing (php) page.
A helper script /etc/inc/captiveportal.incActivating the portal also activates the firewall program calle "ipfw" that can handle MAC addresses on the captive portal interface..
Actually, a captive portal as implemented by pfSense is pretty simple. No special processes.
You run the captive portal on a dedicated interface called OPTx, right ?
Swap LAN and this OPTx interface and see if the issue persists. -
you can also check Status / System Logs / Captive Portal Auth to see for any errors
-
@Gertjan yes its running on OPT1 interface. i have restarted system 3 times didn't fix the issue so only one thing i have changed in system tuneable mentioned by @kiokoman kern.ipc.soacceptqueue to 1024 . I deleted it and rebooted system and captive portal is back. Not sure if its related to captive portal.
-
@wazim4u said in Dell R430 bge (Built-in Port Hangs ) & Captive Portal Stop working:
Listen queue overflow: 193 already in queue awaiting acceptance (232 occurrences)
-
I would like to update changes i have made in case someone else is facing same issue now or in future to fix similar problems. given below is status after 2 Days System Running.
Captive Portal Status ( 2000+ Users )
Network Status ( so far only 1 Error )
Changes I've Made
1- In /boot/loader.conf.local - Add the following (created new file): for Broadcom NICkern.ipc.nmbclusters="1000000" hw.bge.tso_enable=0 hw.pci.enable_msix=0
2- In Interface/Portal ( Portal Interface bge2 ) Speed & Duplex select
1000baseT full-duplex
( Auto setting was giving some errors on Zyxel switch logs )
3- In /usr/local/etc/php-fpm.d/www.conf
listen.backlog = 511 ( default )
to
listen.backlog = -1
php-fpm related errors gone after changing listen.backlog ( didn't get any till now in two days )
php-fpm errorkernel: sonewconn: pcb 0xfffff80130a2f0f0: Listen queue overflow: 193 already in queue awaiting acceptance (155 occurrences)
now only one issue i am sorting out related to nginx given below.
nginx: 2019/07/20 10:25:19 [alert] 95352#100567: send() failed (40: Message too long)