Pfsense goes down every morning



  • The past three days pfsense has gone down.  Can't access anything outside the LAN.  In order to get access, pfsense has to be rebooted twice.

    Here are some lines from the routing logs

    
    Sep 30 09:31:39	miniupnpd	72618	try_sendto failed to send 33 packets
    Sep 30 09:31:39	miniupnpd	72618	try_sendto(sock=10, len=493, dest=[ff0e::c]:1900): sendto: Permission denied
    Sep 30 09:31:39	miniupnpd	72618	sendto(udp_notify=10, [2600:6c5a:4b7f:ebde:20e:c4ff:fecc:5e43]): Permission denied
    Sep 30 09:31:34	radvd	47121	sendmsg: Permission denied
    Sep 30 09:31:09	miniupnpd	72618	try_sendto failed to send 33 packets
    Sep 30 09:31:09	miniupnpd	72618	try_sendto(sock=10, len=429, dest=[ff02::c]:1900): sendto: Permission denied
    Sep 30 09:31:09	miniupnpd	72618	sendto(udp_notify=10, [2600:6c5a:4b7f:ebde:20e:c4ff:fecc:5e43]): Permission denied
    Sep 30 09:31:07	radvd	47121	sendmsg: Permission denied
    Sep 30 09:30:39	miniupnpd	72618	try_sendto failed to send 33 packets
    Sep 30 09:30:39	miniupnpd	72618	try_sendto(sock=10, len=429, dest=[ff02::c]:1900): sendto: Permission denied
    
    

    Here are some gateway errors:

    
    Sep 30 09:07:28	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.92.36.1 bind_addr 71.92.XX.XX identifier "WAN_DHCP "
    Sep 30 09:07:28	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.92.36.1 bind_addr 71.92.XX.XX identifier "WAN_DHCP "
    Sep 30 09:07:27	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.92.36.1 bind_addr 71.92.XX.XX identifier "WAN_DHCP "
    Sep 30 09:01:20	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.92.36.1 bind_addr 71.92.XX.XX identifier "WAN_DHCP "
    Sep 30 09:01:19	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 71.92.36.1 bind_addr 71.92.XX.XX identifier "WAN_DHCP "
    Sep 30 08:59:38	dpinger		WAN_DHCP 71.92.XX.XX: sendto error: 65
    Sep 30 08:59:38	dpinger		WAN_DHCP 71.92.XX.XX: sendto error: 65
    
    
    
    Sep 30 09:34:35	dhcpleases		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such file or directory.
    
    

    What should I be looking for?

    Runnig: 2.3.4-RELEASE-p1 (amd64)



  • @mecheng70:

    The past three days pfsense has gone down.

    You didn't mention what you are running pfSense on… hardware (which?) or virtualized (how?)
    How are you certain that your ISP (seems to be Charter Communications) hasn't gone down?

    pfsense has to be rebooted twice.

    Twice?  :o :o that makes no sense whatsoever. Post full logs from dmesg, dhcpd.log, gateways.log (not here, use a pastebin)

    
    Sep 30 09:31:39	miniupnpd	72618	try_sendto failed to send 33 packets
    
    

    First thing I would recommend is disabling upnp - unless you absolutely cannot live without it. It's buggy and potentially hazardous to security.



  • Will get those logs when I get back to the house. Is there a better way than cut and paste the logs. I assume that I will have to increase the display limits inorder to get all the information.

    Here is the hardware:
    Qotom Q Series Mini PC Q190S-S02  - j1900 quad core 2.42 GHz, 4GB RAM

    Am I sure about Charter? They said no. When pfsense was down, i could get my laptop to connect to the internet when directly plugged into cable modem.

    I will disable upnp and see what that does tomorrow am.

    Appreciate it.



  • @mecheng70:

    Is there a better way than cut and paste the logs.

    clog /var/system.log >systemlog.txt
    scp that file to your machine, or if your terminal has a big enough buffer you can just dump it to the console and copy/paste

    i could get my laptop to connect to the internet when directly plugged into cable modem.

    so you're double-NAT'ing then? That's not good- your modem should be bridged. If you can plug your laptop right in and get DHCP/NAT then you're not set up properly. Charter (all major residential ISPs really) will lie or make up stories to blame everything on the stupid customer. Which is one of the reasons pfSense is so great (if it's in bridge mode) because it actually monitors the connection and provides visible evidence to show them…



  • Well, for myself I can't say that double NAT is unstable. Works for weeks or months in a row for me.
    Btw, some ISP router can't be put in bridge mode

    It looks to me if the WAN_DHCP interface goes down entirely. What is this interface ? A Realtek thing ? A USB dongle ?

    When the connection (WAN) is down, and you unplug - replug the cable, do you find it signaled in the log ?



  • Here are the files.  they are attachments versus pastebin.  (Is that ok?)

    FYI - turned upnp off

    system.txt
    gateways.txt
    dhcpd.txt
    routing.txt



  • @Gertjan:

    It looks to me if the WAN_DHCP interface goes down entirely. What is this interface ? A Realtek thing ? A USB dongle ?

    When the connection (WAN) is down, and you unplug - replug the cable, do you find it signaled in the log ?

    not sure I understand your question about the interface?  "What is the interface?" I do not know what IP address that is… It is not my public IP address.  (it is out the scope of my knowledge)

    see attached for plug and unpluged the cable... it was unplugged for 10-15 seconds at Sep 30 15:11:09...  it does not seem to indicate in the gateway logs that it was plugged back in.

    [gateways unpluged.txt](/public/imported_attachments/1/gateways unpluged.txt)



  • Now the system slowly went down.  The dashboard is claiming that I am offline. No IP for the WAN.

    I am at a loss.

    Thoughts?



  • He's asking what the physical interface is. What brand NIC and possible which model. A cheap NIC is like using crappy fuel in a high performance car. Just by looking at the forums, the number one cause of issues is crappy NICs.



  • The NIC is physical NIC in the Qotom mini PC.  Not sure of the brand. It is mounted on the mini PC motherboard.



  • That model has Realtek NICs, which may be the cause. Hard to say. realtek is hit and miss. Some work, some don't at all, and some in-between. The NIC is the single most important part of a firewall.



  • The re0 interface is flapping like crazy..
    Is the nic directly connected to a modem? perhaps try putting a little switch in between? Or setting a fixed network link speed? Tried with a different cable?

    Below a subset of the system logging this go's on for a while until the reboot happens it seems.. With this happening there is little chance it will work properly at that time.. And afaik a link state change should normally only be logged due to physically removing/plugging-in the cable, or when there is some big trouble on or close to the physical layer..

    
    	Line 19: Sep 30 08:49:43 SexyEpicRouter kernel: re0: link state changed to UP
    	Line 43: Sep 30 08:49:49 SexyEpicRouter kernel: re0: link state changed to DOWN
    	Line 57: Sep 30 08:49:53 SexyEpicRouter kernel: re0: link state changed to UP
    	Line 150: Sep 30 08:50:04 SexyEpicRouter kernel: re0: link state changed to DOWN
    	Line 163: Sep 30 08:50:08 SexyEpicRouter kernel: re0: link state changed to UP
    	Line 229: Sep 30 08:50:17 SexyEpicRouter kernel: re0: link state changed to DOWN
    	Line 248: Sep 30 08:50:21 SexyEpicRouter kernel: re0: link state changed to UP
    	Line 287: Sep 30 08:50:28 SexyEpicRouter kernel: re0: link state changed to DOWN
    	Line 294: Sep 30 08:50:32 SexyEpicRouter kernel: re0: link state changed to UP
    

    My 2 cents..



  • @PiBa:

    The re0 interface is flapping like crazy..
    Is the nic directly connected to a modem? perhaps try putting a little switch in between? Or setting a fixed network link speed? Tried with a different cable?

    i did try a different cable and there wasn't any difference.

    changed both WAN and LAN to "1000baseT <full-duplex>" from autoselect… maybe that will calm things down. although had to reboot the router becuase it lost the WAN gateway.

    before rebooting, Also noticed that under services "unbound" is not started... there were errors regarding the unbound.  tried to restart the service but to no avail.

    I went ahead and purchased a dual intel nic mini pc to replace this one since the wife works from home and i need to get this up and running.. we will see what that does the network on monday night.</full-duplex>



  • Sunday morning and with the cable change and other items mentioned about (upnp and Nat), the network is at a crawl.
    The status dashboard says that it is online.  An public IP address is displayed. Oh and zero loss.

    All web pages are timing out if they can get through.



  • What happens with a 'ping 8.8.8.8 -t' on a workstation command-line. And a 'ping google.com' ?
    What do the logfiles say today? Are there still link UP/DOWN or other messages in system log? Or unbound showing errors in dns log?
    Are all status/services running?



  • @mecheng70:

    I went ahead and purchased a dual intel nic mini pc to replace this one since the wife works from home and i need to get this up and running.. we will see what that does the network on monday night.

    Hope that solves this for you. Let us know

    I came up with this filter to sift thru your system log a bit…

    cat system.txt | cut -b 32- | sed -E -e 's/\[[0-9]+(:[0-9]+)?]//g' -e 's/cookie\ is\ [0-9]+/cookie/g' -e 's/PIDS?:\ [0-9]+/PID_XXX/g' | sort | uniq -c | sort -rn
    

    Shows 75 link down/up events on the NIC in a 6 hour timespan. That is definitely going to cause major issues no matter what, so you've got to stabilize that.

    other comments – do you absolutely need avahi? If not, disable it. Same goes for pfBlocker. It's a great package but, at least until you have this problem sorted out, may make it harder to troubleshoot.



  • @luckman212:

    I came up with this filter to sift thru your system log a bit…

    cat system.txt | cut -b 32- | sed -E -e 's/\[[0-9]+(:[0-9]+)?]//g' -e 's/cookie\ is\ [0-9]+/cookie/g' -e 's/PIDS?:\ [0-9]+/PID_XXX/g' | sort | uniq -c | sort -rn
    

    Shows 75 link down/up events on the NIC in a 6 hour timespan. That is definitely going to cause major issues no matter what, so you've got to stabilize that.

    other comments – do you absolutely need avahi? If not, disable it. Same goes for pfBlocker. It's a great package but, at least until you have this problem sorted out, may make it harder to troubleshoot.

    thanks for the script… I will not get to it tonight.  funny thing is that Charter admitted that the area was having problems.  Your handy script had the 75 up/downs to 8 in the last 12 hours.  so that is an improvement.  I am going to switch the hardware (with intel NICs) over this weekend when the wife is out of town.. :)

    thanks...



  • What should I be looking for?

    In Germany it is common that many of the ISPs are cutting the Internet connection once a day, could this be the
    point you should also looking for?

    If there is a double NAT situation you could try out to set at the pfSense WAN settings a satic IP address from the
    network of the router in front of that pfSense box. Because the DHCP lease will be out after xyz minutes/days/weeks
    or so on.