Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Lost LAN to Internet connectivity

    2.2 Snapshot Feedback and Problems - RETIRED
    3
    13
    4.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      rnmixon
      last edited by

      We are running pfSense 2.2-BETA, Nov 17 snapshot - as a guest under Hyper-V 2012 R2.

      This morning we were working just fine until just after 8:30am we noticed web pages were not responding. Eventually I looked at the pfSense console and saw these messages:

      (ada0:ata1:0:1:0): RES: 51 00 00 00 00 00 00 00 00 00 00
      (ada0:ata1:0:1:0): Error 5, Retries exhausted
      (ada0:ata1:0:1:0): WRITE_DMA. ACB: ca 00 8f 01 00 40 00 00 00 00 40 00
      (ada0:ata1:0:1:0): CAM status: Command timeout
      (ada0:ata1:0:1:0): Retrying command
      (ada0:ata1:0:1:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
      (ada0:ata1:0:1:0): CAM status: ATA Status Error
      (ada0:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 00 ()
      (ada0:ata1:0:1:0): RES: 51 00 00 00 00 00 00 00 00 00 00
      (ada0:ata1:0:1:0): Retrying command
      (ada0:ata1:0:1:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
      (ada0:ata1:0:1:0): CAM status: ATA Status Error
      (ada0:ata1:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 00 ()
      (ada0:ata1:0:1:0): RES: 51 00 00 00 00 00 00 00 00 00 00
      (ada0:ata1:0:1:0): Error 5, Retries exhausted
      
      [2.2-BETA][root@pfSense.custco.local]/root:
      
      

      A while back when I was testing on an old computer with  fussy hard drive (OK - it was going bad) I had rebooted and it seemed to temporarily fix things. So I did.

      But when pfSense came back we still had no connection. I reviewed our rules and they did not seem to have changed. The dashboard showed all interfaces up and traffic coming to each.

      I looked at the routing table and it seemed fine. I did not capture it at that point, but I'm pretty sure it was exactly like this:

      [2.2-BETA][root@pfSense.custco.local]/var/log: netstat -nrW
      Routing tables
      
      Internet:
      Destination        Gateway            Flags       Use    Mtu      Netif Expire
      default            98.172.69.113      UGS    61142076   1500        hn0
      98.172.69.112/28   link#5             U       1325733   1500        hn0
      98.172.69.117      link#5             UHS           2  16384        lo0
      98.172.69.125      link#5             UHS           6  16384        lo0
      127.0.0.1          link#3             UH     31505123  16384        lo0
      192.168.1.0/24     link#6             U      90839193   1500        hn1
      192.168.1.1        link#6             UHS           0  16384        lo0
      
      Internet6:
      Destination                       Gateway                       Flags       Use    Mtu    Netif Expire
      ::1                               link#3                        UH            0  16384      lo0
      fe80::%lo0/64                     link#3                        U             0  16384      lo0
      fe80::1%lo0                       link#3                        UHS           0  16384      lo0
      fe80::%hn0/64                     link#5                        U             0   1500      hn0
      fe80::215:5dff:fe62:e311%hn0      link#5                        UHS           0  16384      lo0
      fe80::%hn1/64                     link#6                        U            15   1500      hn1
      fe80::215:5dff:fe62:e312%hn1      link#6                        UHS           0  16384      lo0
      ff01::%lo0/32                     ::1                           U             0  16384      lo0
      ff01::%hn0/32                     fe80::215:5dff:fe62:e311%hn0  U             0   1500      hn0
      ff01::%hn1/32                     fe80::215:5dff:fe62:e312%hn1  U             0   1500      hn1
      ff02::%lo0/32                     ::1                           U             0  16384      lo0
      ff02::%hn0/32                     fe80::215:5dff:fe62:e311%hn0  U             0   1500      hn0
      ff02::%hn1/32                     fe80::215:5dff:fe62:e312%hn1  U             0   1500      hn1
      [2.2-BETA][root@pfSense.custco.local]/var/log:
      
      

      From a PC/Mac on the LAN we could:

      • Ping pfSense's LAN or WAN IP addresses.

      • Could NOT Ping 8.8.8.8 , Google.com or other well known host that accepts ICMP requests.

      • Could NOT ping the default gateway for our WAN IP on the ISP network.

      • Could ssh into pfSense

      When I connected to pfSense using ssh I could:

      • Ping any PC on the LAN

      • Ping or telnet to any external IP I wanted.

      I spent a number of hours working through the pfSense 2.1 draft guide and similar topics on the forum. I even updated to the snapshot released this morning. That did not help and we were really in need of getting connectivity back.

      So thinking the original messages might have indicated a disk problem I restored

      Finally I restored the pfSense VM from our 8am Hyper-V backup and now everything works fine as before.

      I still see the "(ada0:ata1:0:1:0)" messages on the console, but when I look in system.log the most recent one is from Nov 28, so the console display may just not have rolled.

      Any ideas or assistance on what might have caused this would be appreciated. Just tell me what information I need to provide.

      Thank you - Richard

      1 Reply Last reply Reply Quote 0
      • C
        cmb
        last edited by

        In this case, I'd guess that disk error isn't related.

        Guessing you're not getting any ARP replies for your gateway IP? Check Diag>ARP. Sounds like the most likely cause is your Hyper-V or Windows config got broken so your WAN NIC is no longer attached to your Internet connection.

        1 Reply Last reply Reply Quote 0
        • R
          rnmixon
          last edited by

          OK. Does the jive with the fact that once I was ssh'd into pfSense I could get to outside web sites?

          I'm pretty new to pfSense and did not find wget or curl, so I just tried "telnet xxxxxxx.com 80"  for google and a couple of other sites I knew.

          Thank you again - Richard

          1 Reply Last reply Reply Quote 0
          • C
            cmb
            last edited by

            I mis-read that as you couldn't ping the gateway from that host itself. Since that VM can get out, clearly you have connectivity at the host level. It's obviously routing correctly as well. Next most likely cause is the NAT or firewall config was broken. Check Diag>Backup/restore, Config History, see what changed.

            1 Reply Last reply Reply Quote 0
            • R
              rnmixon
              last edited by

              Hmm - thought this was a thing of the past, but it happened again today. This time after I tried to do an update to the latest (12/19/2014) snapshot.

              The update appeared to go OK. I watched/waited for the package updates for optional packages that has been previously installed (ntopng, darkstat, etc) to complete. Afterwards not internet connection.

              When I tried to go reply my LAN interface settings I got a message box with the following message at the top of the page:

              Packages are currently being reinstalled in the background.
              Do not make changes in the GUI until this is complete.

              I finally tried reapplying the interface config (WAN hn0, LAN hn1) using the console. Still no luck.

              I restored my VM from this morning's backup and all was well again.

              I then went through the snapshot upgrade once again - same exact results.

              Thank goodness for backups - but I'm a bit concerned about not being able to upgrade.

              Any thoughts or ideas on why this is happening? I made and copied over some of the folders (/etc, /var/log,, /root) before I last restored - if that info might help.

              Thank you - Richard

              1 Reply Last reply Reply Quote 0
              • C
                cmb
                last edited by

                Exactly which packages do you have loaded? Does Internet work until the package reinstall finishes, or?

                1 Reply Last reply Reply Quote 0
                • R
                  rnmixon
                  last edited by

                  The packages I had loaded were:

                  • AutConfigBackup

                  • darkstat

                  • ntopng

                  • pfBlocker

                  I did not notice if the Internet was working before the packages re-installed. I will try to test this sometime later today or tomorrow

                  Or should I uninstall the packages before doing the update?

                  Thanks - Richard

                  1 Reply Last reply Reply Quote 0
                  • C
                    cmb
                    last edited by

                    I figured you had some package installed that would have an impact on Internet connectivity, like maybe Squid. I guess pfblocker could fall into that category, though I'd expect anything it seriously broke would have broken filter reloads, which would have been spewing alerts at you.

                    @rnmixon:

                    I did not notice if the Internet was working before the packages re-installed.

                    Ah, in that case don't worry about it, I thought the way you worded part of it you were stating that things were fine until packages were reinstalled.

                    No need to do anything beyond the normal upgrade process and let the packages handle themselves.

                    Try the upgrade again, once it's booted back up, start a packet capture on WAN with count 0 and all else at defaults. Try to ping out to IPs on the Internet, try to load web pages, attempt a variety of things then stop the capture. Download the resulting pcap. The summary text may suffice to see something, can paste that here.

                    1 Reply Last reply Reply Quote 0
                    • R
                      rnmixon
                      last edited by

                      Hi,

                      I have not had time to try the upgrade again - but will do it as soon as I have a chance and report back.

                      But in the meantime, we lost Internet connectivity again. Again - no workstations from the LAN can get out, but if I'm ssh'd into pfSense I can ping 8.8.8.8, do DNS resolution and connect to anyone on the LAN.

                      Based on some experience from a different install over the weekend I tried "pfctl -s nat" and sure enough no output at all for the nat configuration.

                      I next completely disabled NAT reflection, but that did not do any good.

                      I then restored a configuration backup from "2014-12-19 12:41:38" that I had made on Friday just after restoring a Hyper-V image and getting the system back working. But this did not fix things.

                      So I'm not sure what's going on if restoring a working config does not fix things.

                      I was able to restore from this morning's image backup of the virtual machine again … for the time being all is working fine again.

                      Does this suggest anything? If not I'll try the 2.2. upgrade as soon as I can, probably the next day or two.

                      Thank you - Richard

                      1 Reply Last reply Reply Quote 0
                      • C
                        cmb
                        last edited by

                        How is your outbound NAT configured?

                        1 Reply Last reply Reply Quote 0
                        • R
                          rnmixon
                          last edited by

                          I think this is what  your asking for - so I think the answer is "automatic".

                          
                          	 <nat><outbound><mode>automatic</mode></outbound> 
                          		...</nat> 
                          
                          

                          Let me know if you need all of the NAT or other rules.

                          • R
                          1 Reply Last reply Reply Quote 0
                          • C
                            cmb
                            last edited by

                            Yeah should be fine.

                            When "pfctl -sn" is empty, what do you get for "grep nat /tmp/rules.debug"? Are there any of the disk errors happening around the same time? I'm wondering if somehow it's failing to read the config, or failing to read the raw ruleset, because of the disk error. That's seemed to be cosmetic-only on my Hyper-V systems, but it's possible that's causing the problem. The NAT ruleset being empty is definitely the source of the issue, it's just not clear how it ends up that way. I'm strongly suspecting something specific to your Hyper-V environment like disk reads failing, as if that were a general issue, we'd have hit it internally in our testing and hundreds of people would be on this board griping.

                            1 Reply Last reply Reply Quote 0
                            • M
                              mevans336
                              last edited by

                              I am also seeing this under Hyper-V 3.0 and the 2.2 RC. It seems that every so often, apinger marks the WAN interface as down.

                              
                              Dec 29 09:14:59	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 09:15:21	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 09:20:15	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 09:20:31	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 09:35:07	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 09:35:28	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:38:15	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:38:35	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:39:14	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:39:30	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:52:38	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 14:52:54	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 15:12:31	apinger: ALARM: WAN_DHCP(68.67.x.x) *** down ***
                              Dec 29 15:12:48	apinger: alarm canceled: WAN_DHCP(68.67.x.x) *** down ***
                              

                              It also seems to be related to traffic or packet load, as the frequency is greatly diminished overnight when I am not using my network.

                              FWIW, I ran Smoothwall under this same Hyper-V config until a few days ago when I noticed that pfSense 2.2 went RC and it did not experience these issues, so this is something unique to Hyper-V + pfSense or Hyper-V + FreeBSD.

                              I'm going to disable gateway monitoring and see if that at least masks the underlying issue. Note, state killing on gateway failure is not enabled (the box is checked) so I don't think that's the cause.

                              I know Hyper-V is probably a low priority, but I am extremely excited to be able to run it with non-legacy NICs, so I'd really like to get this resolved and will help in any way possible. This is perfect for my 1Gbps connection at home (under Hyper-V I can hit 850Mbps, my Atom D2500 couldn't manage more than 500Mbps) and I'd like to start using it in our Hyper-V environment for my business in addition to our physical installations.

                              If you guys would like me to open a paid support case, I'd be more than happy. I can also provide you with access to pfSense installed in a Hyper-V VM if that would help.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.