Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Back to odd problem -- lose WAN at random points with a week or more between events

    Scheduled Pinned Locked Moved General pfSense Questions
    42 Posts 5 Posters 5.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sergei_ShablovskyS
      Sergei_Shablovsky
      last edited by

      Sorry to insert my 5c on this, but:

      BUY used/new Intel-based NIC FROM WELL-KNOWN DEVELOPER (Intel, IBM preferable) on eBay/local store.

      on pfSense

      • RESET BIOS/UEFI to default
        UPGRADE both(!, upper and lower) IMAGE OF BIOS/UEFI from official MB’s manufacturer web;
      • PULL all PSU OUT (if they are hot swap), internal CR2032 battery, wait 1min, take it all in place back;
      • DISABLE ALL POWER MANAGEMENT ON pfSense’s MOTHERBOARD (especially for CPU, PCI and NICs)!
        COLD Restart.

      REPLACE all NICs that installed in main pfSense.
      Starting Kali Linux (or Win10/11) from USB-drive, and:

      • check that ALL NICs able to receive IPs from uplink ISP;
      • check line rate/error free on each NIC by iperf3 public servers;

      As a result on this point You are known that NICs and cables are on good working order.

      INSTALL FRESH pfSense with option “Use the previous configuration file”.

      RE-ASSIGN INTERFACES to new in local-attached keyb/monitor, COM-port terminal, or WebGUI.

      COLD RESTART.

      And see if issue still exist.

      P.S.
      You spend so much time on searching on this forum, replying, googling the same issue, so MUCH FASTER would be buying the few new NICs (ok, but it used;) check hardware first.
      Anyway, You not be disappointed by this NICs upgrade in a future.

      —
      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
      Help Ukraine to resist, save civilians people’s lives !
      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

      W 1 Reply Last reply Reply Quote 0
      • bmeeksB
        bmeeks @Wylbur
        last edited by bmeeks

        @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 38 fc 03 40 09 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 10 f8 17 e9 40 08 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 28 e0 19 ab 40 0a 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 10 2a 28 40 00 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 10 2c cf 40 1d 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 10 2e cf 40 1d 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 30 40 fc 03 40 09 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 18 70 fc 03 40 09 00 00 00 00 00
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): CAM status: Auto-Sense Retrieval Failed
        Mar 10 01:55:17 kernel (ada1:ahcich1:0:0:0): Error 5, Unretryable error
        Mar 10 09:08:42 syslogd kernel boot file is /boot/kernel/kernel
        Mar 10 09:08:42 kernel ---<<BOOT>>---

        These errors indicate a failing disk drive (whether it's an SSD or an old spinning surface, it is failing).

        You need to be sure you have a backup of the firewall configuration on separate media (such as a USB stick), then replace the failing drive and reinstall pfSense from an install image restoring you config during the install process.

        W 1 Reply Last reply Reply Quote 1
        • keyserK
          keyser Rebel Alliance @Wylbur
          last edited by

          @Wylbur I think you need to have a look in the DHCP log and see if the issue arises when DHCLIENT (WAN DHCP client) tries to renew the DHCP lease. Some ISP’s are quite picky with other hardware on their infrastructure, and require a quite strict DHCP client configuration.
          You know that your DISCOVER/OFFER/REQUEST/ACK (new DHCP lease works), but does a renewal of an existing lease?

          Love the no fuss of using the official appliances :-)

          W 1 Reply Last reply Reply Quote 1
          • W
            Wylbur @Sergei_Shablovsky
            last edited by

            @Sergei_Shablovsky

            Thank you for your input. But, I've already done that. This is what the WAN port is running with. The LAN port is whatever the MOBO has and I never seem to have problems with that port. The weirdness is, this MOBO will not accept connections on both the Intel ports of the dual Intel port ethernet adapter that I'm using on this machine. Yet when I ran that adapter in another machine, both ports were usable so one was for WAN and the other for LAN.

            But we have to ask this question: Why was I able to run for months on end when using Realtek ports with Untangle, or IPfire with this ISP? That is the thing that puzzels me.

            Now, if I were a "c" (or assembly language) programmer and knew the x86 architecture as well as I do z/Architecture machines (IBM Mainframes), I could probably code a trap to capture this failure and know why it was happening. Or I could run a trace of it that we could examine once it failed. But I don't know this architecture at that level. So unfortunately, I'm more of a knowledgeable user that knows enough to just be dangerous.

            Wylbur

            Sergei_ShablovskyS 1 Reply Last reply Reply Quote 0
            • W
              Wylbur @keyser
              last edited by

              @keyser

              I see no failures that indicate a problem with renewal of lease with the ISP. What I do see are some changes where the fiber optic modem may get its IPv4 IP address changed and then the WAN is given a new IPv4 address. And then some 8.8.8.8 pings take place and some latency is noted.

              Now and then I see alerts for latency with the ISP against 8.8.8.8 and the system recovers.

              What should I be looking for in that would show me I have the problem you are suspecting? I've been scanning logs off and on for weeks looking for anomalies that would tell me something. Meanwhile on these latency issues, we know that the ISP has the ability to run Gigabit connections. What we have is 200/200 Mbs. And I generally have no stuttering within my Lan with this. And I have multiple devices streaming. I am constantly listening to European radio via tunein (old iphone) which tells me pretty quicly if I've just lost connections.

              Wylbur

              keyserK Sergei_ShablovskyS 2 Replies Last reply Reply Quote 0
              • keyserK
                keyser Rebel Alliance @Wylbur
                last edited by

                @Wylbur You would know from the logs if renew was failing, because the logs would fill with a lot of renew attempts (with an increasing timer). So thats not the root of your problem.

                Love the no fuss of using the official appliances :-)

                Sergei_ShablovskyS 1 Reply Last reply Reply Quote 1
                • W
                  Wylbur @bmeeks
                  last edited by

                  @bmeeks

                  This is rather disconcerting for a referbished machine that is less than 30 days old.

                  I had to put in a second SSD because the system would not install for some reason. So that has me wondering of the referbish didn't detect a bad HDD.

                  I really hate to pull this right now and run Knoppix to do diagnostics, because it takes me about 30 minute to get the INTEL ethernet adapter out of this box and into the back up unit so I can start that whole process.

                  Have any thoughts what diagnostics I can run with pfSense in order to capture this?

                  I had thought this was related to the time change since it happened right abou that time. But, this clock should be GMT/UTC, so only the offset would/should have changed.

                  Since this box is under warranty, I would like to be able to demonstrate this to the entity where I go it.

                  Wylbur

                  Sergei_ShablovskyS 1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

                    Mar 10 09:08:42 kernel hdacc0: <Realtek ALC221 HDA CODEC> at cad 0 on hdac0
                    Mar 10 09:08:42 kernel hdaa0: <Realtek ALC221 Audio Function Group> at nid 1 on hdacc0
                    Mar 10 09:08:42 kernel pcm0: <Realtek ALC221 (Analog)> at nid 23 and 26,27 on hdaa0
                    Mar 10 09:08:42 kernel pcm1: <Realtek ALC221 (Analog 2.0+HP)> at nid 20,33 on hdaa0
                    Mar 10 09:08:42 kernel hdacc1: <Intel Skylake HDA CODEC> at cad 2 on hdac0
                    Mar 10 09:08:42 kernel hdaa1: <Intel Skylake Audio Function Group> at nid 1 on hdacc1
                    Mar 10 09:08:42 kernel pcm2: <Intel Skylake (HDMI/DP 8ch)> at nid 3 on hdaa1

                    Try disabling all that in the BIOS. And anything else you're not using there. Some of those things could be conflicting with the addon NIC preventing it being detected.

                    W 2 Replies Last reply Reply Quote 1
                    • Sergei_ShablovskyS
                      Sergei_Shablovsky @Wylbur
                      last edited by Sergei_Shablovsky

                      @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

                      @Sergei_Shablovsky

                      Thank you for your input. But, I've already done that. This is what the WAN port is running with. The LAN port is whatever the MOBO has and I never seem to have problems with that port. The weirdness is, this MOBO will not accept connections on both the Intel ports of the dual Intel port ethernet adapter that I'm using on this machine. Yet when I ran that adapter in another machine, both ports were usable so one was for WAN and the other for LAN.

                      But we have to ask this question: Why was I able to run for months on end when using Realtek ports with Untangle, or IPfire with this ISP? That is the thing that puzzels me.

                      In case everyone ISP use Juniper, Extreme (and other not-so-bad) hardware on aggregate level, and every user use Intel, Melannox (and other bug-free) hardware and well-writed & tested drivers,- we all wouldn’t have any puzzle-problem like this anymore.

                      So, just “catch, fix and forgot”,- best strategy in this hardware-mixed world. ;)

                      Now, if I were a "c" (or assembly language) programmer and knew the x86 architecture as well as I do z/Architecture machines (IBM Mainframes), I could probably code a trap to capture this failure and know why it was happening. Or I could run a trace of it that we could examine once it failed. But I don't know this architecture at that level. So unfortunately, I'm more of a knowledgeable user that knows enough to just be dangerous.

                      Just change the SSD, choose NICs that ISP recommend to work better WITH HIS APPLIANCE, make backups regulary (both config.xml and ZFS snapshots) and be happy until next device upgrade/change.

                      —
                      CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                      Help Ukraine to resist, save civilians people’s lives !
                      (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                      1 Reply Last reply Reply Quote 0
                      • Sergei_ShablovskyS
                        Sergei_Shablovsky @Wylbur
                        last edited by

                        @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

                        @keyser

                        I see no failures that indicate a problem with renewal of lease with the ISP. What I do see are some changes where the fiber optic modem may get its IPv4 IP address changed and then the WAN is given a new IPv4 address. And then some 8.8.8.8 pings take place and some latency is noted.

                        Why exactly the IP on “fiber optic modem” are changed?
                        This is very rare situation in fiber nets in Europe, as I know.

                        What is this device exactly? (Manufacturer and model)

                        Meanwhile on these latency issues, we know that the ISP has the ability to run Gigabit connections. What we have is 200/200 Mbs.

                        From which country You are, and ISP ?

                        —
                        CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                        Help Ukraine to resist, save civilians people’s lives !
                        (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                        W 1 Reply Last reply Reply Quote 0
                        • Sergei_ShablovskyS
                          Sergei_Shablovsky @keyser
                          last edited by

                          @keyser said in Back to odd problem -- lose WAN at random points with a week or more between events:

                          @Wylbur You would know from the logs if renew was failing, because the logs would fill with a lot of renew attempts (with an increasing timer). So thats not the root of your problem.

                          Agree!!!

                          —
                          CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                          Help Ukraine to resist, save civilians people’s lives !
                          (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                          1 Reply Last reply Reply Quote 0
                          • Sergei_ShablovskyS
                            Sergei_Shablovsky @Wylbur
                            last edited by

                            @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

                            @bmeeks

                            This is rather disconcerting for a referbished machine that is less than 30 days old.

                            I had to put in a second SSD because the system would not install for some reason. So that has me wondering of the referbish didn't detect a bad HDD.

                            Since this box is under warranty, I would like to be able to demonstrate this to the entity where I go it.

                            Save Your time: not spending time on “demonstrations”, return the box, buy something more powerful or from well-known & reputable brand.

                            —
                            CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                            Help Ukraine to resist, save civilians people’s lives !
                            (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                            1 Reply Last reply Reply Quote 0
                            • W
                              Wylbur @stephenw10
                              last edited by

                              @stephenw10

                              I captured a packet trace when I ran into another loss of the system (DHCP working just fine, no ISP access). Unfortuneatley, I lost that text file. The funning thing is, it appeared that there were packets were passing through the WAN. So it seems that something causes communications to fail which is why a reboot clears the issue.

                              Meanwhile, I have to find a point where I can take the system down, and come back up on the backup machine, while I figure out how to make the BIOS changes. Hopefully this will be simple and not get blocked by the built in security so I can make the changes to the BIOS.

                              Wylbur.

                              1 Reply Last reply Reply Quote 1
                              • W
                                Wylbur @Sergei_Shablovsky
                                last edited by

                                @Sergei_Shablovsky

                                I am in the USA. The ISP is a company called Metronet. The fiber optic interface system is by Nokia, and is an Intertek unit.

                                Metronet, Spectrum, AT&T, ComCast, etc. all change your IP address whenever they feel like it so you can't have a static address and host a web site unless you pay them for a static address.

                                Sergei_ShablovskyS 1 Reply Last reply Reply Quote 1
                                • Sergei_ShablovskyS
                                  Sergei_Shablovsky @Wylbur
                                  last edited by

                                  @Wylbur said in Back to odd problem -- lose WAN at random points with a week or more between events:

                                  @Sergei_Shablovsky

                                  I am in the USA. The ISP is a company called Metronet. The fiber optic interface system is by Nokia, and is an Intertek unit.

                                  Ok, thanks. Just to know.

                                  Metronet, Spectrum, AT&T, ComCast, etc. all change your IP address whenever they feel like it so you can't have a static address and host a web site unless you pay them for a static address.

                                  Did DynDNS (or any other services) give You ability to having remote access?

                                  —
                                  CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                                  Help Ukraine to resist, save civilians people’s lives !
                                  (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                                  1 Reply Last reply Reply Quote 0
                                  • W
                                    Wylbur @stephenw10
                                    last edited by

                                    @stephenw10

                                    I have swapped systems so that the backup is running and the new system is out for me to change BIOS settings.

                                    So with the new machine that had the log errors below, I do not see any correlation of the following to anything I can change in the BIOS.

                                    Mar 10 09:08:42 kernel hdacc0: <Realtek ALC221 HDA CODEC> at cad 0 on hdac0
                                    Mar 10 09:08:42 kernel hdaa0: <Realtek ALC221 Audio Function Group> at nid 1 on hdacc0
                                    Mar 10 09:08:42 kernel pcm0: <Realtek ALC221 (Analog)> at nid 23 and 26,27 on hdaa0
                                    Mar 10 09:08:42 kernel pcm1: <Realtek ALC221 (Analog 2.0+HP)> at nid 20,33 on hdaa0
                                    Mar 10 09:08:42 kernel hdacc1: <Intel Skylake HDA CODEC> at cad 2 on hdac0
                                    Mar 10 09:08:42 kernel hdaa1: <Intel Skylake Audio Function Group> at nid 1 on hdacc1
                                    Mar 10 09:08:42 kernel pcm2: <Intel Skylake (HDMI/DP 8ch)> at nid 3 on hdaa1


                                    I got into the BIOS and did not find anything for changing any of the above.
                                    However, I did find where the system can "sleep" or change to low power for several items. I set all that off.

                                    I also, ran the I/O tests while I had the opportunity on the SSDs and the initial tests came back good. Ran the extended tests and they are also good.

                                    Then I ran the RAM tests and they came out with no errors detected.

                                    The big question I have is, what would cause pfSense to "fail" and stop responding to ping, not respond to its website/page(s), but yet allow an iPhone 5 attached via an adapter that accepts RJ45 (eithernet), and continue streaming data via TuneIn (out of Europe in this case) while not responding to keyboard/mouse attached to the server via USB. Oh, and causing a Roku box to lose its connections (this by Wifi) so that TV(s) so attached lose connections. In otherwords, what makes that iPhone5 special that it did not lose its connections?

                                    And I think this has happened now 3 times.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      If the firewall is unable to open new states it would present like that. Existing states stay open so traffic continues. I would expect to see that logged though. Especially if it actually ran out of states.

                                      You should be able to disable on-board audio in the BIOS unless it's significantly locked down.

                                      W 1 Reply Last reply Reply Quote 0
                                      • W
                                        Wylbur @stephenw10
                                        last edited by

                                        @stephenw10 said in Back to odd problem -- lose WAN at random points with a week or more between events:

                                        You should be able to disable on-board audio in the BIOS unless it's significantly locked down.

                                        I've swapped it back in this morning. And that unit doesn't have a speaker, it has connections.... But I saw nothing relative to audio that I could kill.

                                        BTW this is an HP box and they don't make a lot of doc available -- security by obscurity.

                                        So now waiting to see if it has this lack of connections problem again, or the loss of WAN issue.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          The chipset has that audio hardware in it though and it's consuming resources. We have seen that cause conflicts with other hardware.

                                          W 1 Reply Last reply Reply Quote 1
                                          • W
                                            Wylbur @stephenw10
                                            last edited by

                                            @stephenw10

                                            Would this cause the system to run out of space in the "states table" and is that where I should look to see if we are headed into problems? I've been looking in the doc trying to figure this out. <big interruption> Had the system get locked up and had to swap the backup unit in.

                                            I do not know why, but it is not taking it very long to run into the situation of not being able to handle any new traffic and breaks connections for some currently running things (such as my connection to a mainframe where I was working on a product), and others were still running (like the iPhone streaming music out of Germany). Everything else got stopped such that I could not ping the server from inside the LAN with either W11 laptop that was connected by wire.

                                            Wylbur.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.